NCBI BLAST allows you to input a sequence from DNA, RNA or protein residues (amino acids) and find sequences that are identical or similar.

To get to BLAST from the NCBI home page, click BLAST from the Popular Resources menu bar on the right of the page.

NCBI home page

[Click on image above to expand]

You can also get to BLAST directly by going to http://blast.ncbi.nlm.nih.gov/

For this simple exercise we will give you a nucleotide sequence to identify. Click Nucleotide BLAST on the left of the page.

Link to Nucleotide BLAST

There are many options on the Standard Nucleotide BLAST page. For example, you can select different databases to search; you can exclude certain data sources; and you can select a specific algorithm by which to search.

For your first BLAST, we will keep this very basic. We will mostly use the default options to enter a sequence string, and we'll use BLAST to identify the organism it came from, and see what else we can learn about it.

Copy and paste the entire string of nucleotide symbols, below, into the box under Enter Query Sequence.

Copy this:

ATGGCACATGCAGCGCAAGTAGGTCTACAAGACGCTA

CTTCCCCTATCATAGAAGAGCTTATCACCTTTCATGATC

ACGCCCTCATAATCATTTTCCTTATCTGCTTCCTAGTCC

TGTATGCCCTTTTCCTAACACTCACAACAAAACTAACTA

ATACTAACATCTCAGACGCTCAGGAAATAGAAACCGTC

TGAACTATCCTGCCCGCCATCATCCTAGTCCTCATCGC

CCTCCCATCCCTACGCATCCTTTACATAACAGACGAGG

TCAACGATCCCTCCCTTACCATCAAATCAATTGGCCAC

CAATGGTACTGAACCTACGAGTACACCGACTACGGCG

GACTAATCTTCAACTCCTACATACTTCCCCCATTATTC

CTAGAACCAGGCGACCTGCGACTCCTTGACGTTGACA

ATCGAGTAGTACTCCCGATTGAAGCCCCCATTCGTATA

ATAATTACATCACAAGACGTCTTGCACTCATGAGCTGT

CCCCACATTAGGCTTAAAAACAGATGCAATTCCCGGAC

GTCTAAACCAAACCACTTTCACCGCTACACGACCGGGG

GTATACTACGGTCAATGCTCTGAAATCTGTGGAGCAAA

CCACAGTTTCATGCCCATCGTCCTAGAATTAATTCCCCT

AAAAATCTTTGAAATAGGGCCCGTATTTACCCTATAG

to here:

standard blast page

Uncheck this box labeled "Align two more sequences:"

A screenshot showing the "Align two or more sequences" checkbox.

then scroll down and click the BLAST button:

blast button

You may need to be patient.

BLAST is crunching a huge amount of data.

You will see a screen like this for a while during processing:

processing screen

Once your results are displayed, you will see a header followed by the results of your search. The results can be displayed in several different views, including a list of sequence "Descriptions," via a "Graphic Summary," and via a more detailed "Alignments" view.

Select the Graphic Summary by clicking on this tab:

A screenshot of a BLAST results page, indicating the "Graphic Summary" tab.

to see a graphic summary of the top 100 results.

graphic summary of results

Each bar in this graph represents a match with another sequence in the database. The color of each line represents the extent to which the sequence in the database aligns with the sequence you input (the "Query" sequence). See the color key:

color key

Of the top 100 results for this BLAST, how many sequences in the database align very well with yours?

What are these highly aligned sequences? Where did they come from?

One way to find out is to click on one of the bars in the graphic summary. Try that now.

first BLAST result

 

What species is your query sequence from?

Click on the Descriptions tab to learn more about each of the sequences that aligned with yours.

Click on the description of the sequence to see the alignment.

For this exercise, select one of the sequences labeled, "Homo sapiens isolate PNG## haplogroup...mitochondrion, complete genome." 

sequence description links

Clicking on a sequence will bring you to the Alignments view.

You can now see all the nucleotide base matches between your sequence (the "query" sequence) and the sequence from the database (the "subject" sequence).

PNG98 haplogroup

This particular alignment isn't very interesting to look at because the two sequences match perfectly. In the next example we'll look at two sequences that do not perfectly align so that you can look at differences.

Our goal right now is simply to identify the sequence.

What chromosome is the subject ("Sbjct") sequence (this one on the database that matched your query) from?

The first base in your query ("Query") sequence aligns with approximately which base in the Subject ("Sbjct") sequence?

To go to the subject sequence in the Nucleotide database, there are several links from the alignment.

The first two: one in the header next to Download labeled GenBank, and another link from the Sequence ID, take you to the record for the full sequence as it was submitted (or created). Remember that our match started around base 7585. The third link, adjacent to the range (also labeled GenBank), takes you to a record displaying just the range of interest (around 7585 to 8268).

Either record might be useful, but let's look at the record for the entire sequence that was submitted, and look at our query sequence in that context.

Follow the link to the GenBank record in the Nucleotide database from your Sequence ID (MN849867.1 in this example):

MN849867

[If this page insists on opening in a new browser tab, use this link instead to go to MN849867.1]

A GenBank Record

1 of 3

You should now be in the NCBI Nucleotide database, looking at a record labeled, "Homo sapiens isolate PNG98 haplogroup B4a1a1a mitochondrion, complete genome." (You may be looking at a different haplogroup record.)

What is a haplogroup?

Many of the records we look at in this course are Reference Sequences or "RefSeq" records, which are curated by NCBI. But this is an "original" sequence record submitted by a GenBank participant.

Approximately how many bases does this record include?

In what section of the record can you find the name of the affiliation of the researcher or organization that submitted the record?

A GenBank Record

2 of 3

An interesting part of a Nucleotide record is the section labeled "FEATURES." Called the "feature table," this is the part that reflects scientists' annotations -- notes on what biological features of interest are known about a sequence.

Scroll down the feature table of this mitochondrial DNA record. Definitions of some of the feature labels can be found in the GenBank Sample Record.

Two features of major interest include:

 

CDS = a coding sequence, or region of nucleotides that corresponds with amino acids in a protein.

gene = a region identified as a gene. A gene may include multiple sections of coding sequences, so the same nucleotide sequence (shown in a number range) may be labeled as CDS and gene.

A GenBank Record

3 of 3

 

In the feature table, each labeled feature is hyperlinked to the sequence itself, which is at the bottom of the record. Click on the first instance of a "gene" label in this feature table.

gene link from feature table

The tools that appear at the bottom provide a useful way to learn and navigate your way around the features.

 

For example, since you clicked on a gene, you can now toggle through all the genes in this record using the tool in the lower left.

toggling between features in a Nucleotide record

 

How many genes have been labeled in this human mitochondrial DNA record?

Click around this feature table for a few minutes to get more accustomed to looking at this data.

Our query BLAST sequence aligned with bases 7585 through 8268 on this record (or within a few bases, depending on what record you chose from BLAST).  What is this sequence?

When you're ready, move on to the next part to explore these mitochondrial sequences in an interesting way using BLAST.

Comparing Sequences with BLAST

1 of 10

You have now used BLAST to identify an unknown sequence of nucleotides.

Now let's compare sequences.

For this example we will again look at human mitochondrial DNA, but this time we will compare three different homo sapiens:

  • "Modern" human
  • Neanderthal
  • Denisova

We will use the reference sequences for the mitochondria for these three organisms and compare them using BLAST.

Comparing Sequences with BLAST

2 of 10

In the Nucleotide database, search for:

mitochondrion[ti] AND human[orgn]

mitochondrion[ti] AND human[orgn]

Then, because we are looking for the best quality sequences we can find, use the Source databases limit on the left of your screen to limit to RefSeq (curated) records.

refseq limit

Please use the Sort by Default order to follow along with the exercise.

Comparing Sequences with BLAST

3 of 10

The results are 3 RefSeq records for human mitochondria sequences: One from modern humans, one from Neanderthal, and one from Denisova ("Homo sp. Altai").

You can click on each record to learn more about each, but, when you are ready, return to the summary display of all three sequences, then use the link under Analyze these sequences in the right panel to Run BLAST.

Run BLAST

Comparing Sequences with BLAST

4 of 10

This link feature from Nucleotide copies the record numbers into the BLAST query box.

Because we want to align these sequences, click the box, "Align two or more sequences" and move the accession numbers "NC_011137.1" "NC_013993.1" to the new box.

This will compare the sequence in the first box (the modern human - NC_012920.1) with the other two (Neanderthal - NC_011137.1 and Denisova "Homo sp. Altai" - NC_013993.1).

BLAST compare sequences

Leave the other options at their default settings and click BLAST.

Comparing Sequences with BLAST

5 of 10

Hopefully you won't be surprised that the results page shows two alignments against the modern human sequence: one for "Homo sapiens neanderthalensis mitochondrion, complete genome" and one for "Homo sp. Altai mitochondrion, complete genome."

What is interesting is in the Alignments view, which you can see by clicking on the Alignments tab. There are two tables: The first comparing modern human with Neanderthal, and the second comparing modern human to Denisova ("Homo sp. Altai").

Comparing Sequences with BLAST

6 of 10

Before going further, please check that you have the Alignment view options at the top set to Pairwise with dots for entities, and that the CDS feature is selected:

Alignment view display options

With these display options, where there is a difference between the two sequences, the different base(s) are shown in the Subject line. Where there is no difference, a dot appears in the Subject line. 

highlighting where alignments don't match

Comparing Sequences with BLAST

7 of 10

The CDS display option means that, directly from this screen, you can see the sequence changes that may result in amino acid changes.

Scroll down and find a section with CDS data:

amino acid change

The CDS data appears as two lines: One above, showing the amino acid translation for the Query (modern human) sequence; and one below, showing the amino acid translation for the Subject (ancient human) sequence.

Comparing Sequences with BLAST

8 of 10Note the section starting around base 3301 in the Query (modern human) sequence as compared to Neanderthal. this section shows CDS data.

Comparing the modern human mitochondrial sequence to the Neanderthal sequence, answer these questions:

A base difference around 3832 in the modern human sequence (an "a" versus a "g") results in a difference in amino acids. In modern humans, it is a T (threonine). In Neanderthal, it was a(n):

There is a base difference around 3308 in the Query (modern human) sequence that has an effect. What seems to be the effect?

Comparing Sequences with BLAST

9 of 10Scroll down to the second table to view this same section comparing modern human to  the Denisova ("Homo sp. Altai") sequence.

Looking at this same region around 3308 in the human (query) sequence, do modern humans differ from Denisova here, like the difference we saw with Neanderthal?

BONUS!: To explore what differences in amino acids might make, biologically, go to the Amino Acid Explorer and use the "Compare" feature on the left side of the screen.

For example, compare isoleucine to valine.

Comparing Sequences with BLAST

10 of 10

You have reached the end of this tutorial on BLAST.

You might have guessed, from all of the options on the screens, that there's a lot more to learn about BLAST. This exercise gave you a taste of a couple of things BLAST can do. To learn more about BLAST, check out the BLAST playlist on the NCBI YouTube channel.

Want more examples? See Teaching NCBI Resources Through Case Studies "Five Examples for NCBI BLAST."

Continue to the optional extra BLAST exercise

Return to the course homepage

Powered by Guide on the Side from the University of Arizona Libraries
Developed resources reported in this site are supported by the National Library of Medicine (NLM), National Institutes of Health (NIH) under cooperative agreement number UG4LM012344 with the University of Utah Spencer S. Eccles Health Sciences Library. The content is solely the responsibility of the authors and does not necessarily represent the official views of NIH..