Main Next
1. Sequence similarity search using BLAST
query sequence:
APVKSQESIN
QKLALVIKSG KYTLGYKSTV KSLRQGKSKL IIIAANTPVL
RKSELEYYAM
LSKTKVYYFQ GGNNELGTAV GKLFRVGVVS ILEAGDSDIL
TTLA
The first thing you might want to do is to search if there
is any similar sequences in the databases. One of the database searching
program available on the web is BLAST (Basic Local Alignment Search Tool).
1. Go to the BLAST home page at the National Center for Biotechnology
Information (NCBI):
http://www.ncbi.nlm.nih.gov/blast/
3. Select database. Two commonly used databases for protein sequence
searching:
- swissprot - SWISS-PROT is a high quality
protein sequence database with excellent annotations. Minimal redundancy
but poorer sequence coverage.
- nr - Non-Redundant
database. It is a composite protein sequence database consisted of
non-identical sequences derived from several databases: GenPept, PDB,
Swiss-Prot, PIR, PRT. Although its name is called 'non-redundant', it may
contain redundant sequences as a result of polymorphisms, sequencing
errors and sequences of protein fragment. Comprehensive sequence coverage
but poorer quality of the database.
We will use swissprot in this example:
4. Copy the query sequence and paste it into the window:
APVKSQESIN
QKLALVIKSG KYTLGYKSTV KSLRQGKSKL IIIAANTPVL
RKSELEYYAM
LSKTKVYYFQ GGNNELGTAV GKLFRVGVVS ILEAGDSDIL
TTLA

Then press the 'BLAST!'
button.
5. You will receive a notification saying your search has been submitted
and put into a queue. Press the 'Format results' button to check your results.

6. You should now see a new window of your BLAST search results. Scroll
down the window until you see the following:

7. Check if the matches are significant or not.
The BLAST results report a list of sequences that may
be similar to your query sequence. The statistical significance of the matches
is measured by E-value (Expect value).
Highly significant matches should have E-values very close to zero.

By default, BLAST reports all sequences that have an E-value
<= 10.
8. Firstly, we use a conservative threshold E-value. Take a look at the
proteins with E-value < 0.001. What generalization can you make?

9. We can obtain more information about the matches by clicking on the bits
score. Try click on the highest bits score '177':

Notice the query sequence exactly matched (identities=100%) the sequence of
RL30_YEAST. What does it tell you about the query sequence?
10. Now look at sequences that have E-values > 0.001. The next several
matches are L7Ae ribsosomal proteins. The ribsomoal protein L30 may be also
related to the ribosomal protein L7Ae.

11. A summary of what you have found for your query sequence:
- It is the yeast ribosomal
protein L30.
- Ribosomal protein L30 is conserved
among different organisms.
- Ribosomal protein L30 may be
related to ribosomal protein L7Ae.
Main Next