NCBI Taxonomy

Open NCBI Taxonomy

in another browser window to work through this tutorial side by side.

Introduction: NCBI Taxonomy

Linking the correct organism names with genetic and genomic data is foundational to nearly every aspect of biomedical, agricultural and ecological research.

The NCBI Taxonomy Database is a curated classification and nomenclature for all of the organisms in the NCBI public sequence databases.

Taxonomy Database home page

You will not find all known species in the taxonomy database. The scope of the Taxonomy database reflects the data that has been submitted by researchers.

In this exercise, you’ll explore how organisms are grouped using the Taxonomy Browser and find the links to sequences in NCBI databases.

Exploring Species

1 of 5

Under Taxonomy Tools, click Browser, to view the top level of the taxonomy database.

Taxonomy link to Browser

Exploring Species

2 of 5

Click on Eukaryota and take a few minutes to explore this domain.

Link to Eukaryota

Note that the classified species take up about half of the list, and unclassified eukaryotes make up the rest.

Exploring Species

3 of 5

Unless you are very familiar with the classification, you may have difficulty finding a particular species by browsing. Therefore, we’ll walk you through an example to explore the taxonomic trees a bit.

Click on Opisthokonta. The opisthokonts are a broad group of eukaryotes including amoebae, fungi and animals.

link to Opisthokonta

Scroll all the way down (or use your browser's search function to skip down the page) to Metazoa. Metazoans include jellyfish, sponges and vertebrates like humans. Before clicking on the link, hover your mouse over the terms Metazoa, Eumetazoa and Bilateria. Notice the labels that pop up.

link to Metazoa

What is the label given to “Bilateria?”

Exploring Species

4 of 5

Let’s try a search of the Taxonomy Browser to look at some records more closely.

Use the search box at the top and search for: human.

Search for human

Notice that you have three results.

Homo sapients and two subspecies

Notice also the Lineage shown at the top. You should recognize the lineage from the top through Bilateria from our previous browsing.

human lineage

Hover your mouse over Homo sapiens neaderthalensis. What label is this term given?

Exploring Species

5 of 5

You’ve now seen that there are two subspecies under Homo sapiens. Find Homo sapiens neaderthalensis and click on it to get to the full taxonomy record.

link to Homo sapiens neanderthalensis

Note that in the Taxonomy record for two of these, under Comments and References it says, “This taxon is extinct.”

Exploring Links to Other Databases

1 of 5

Let’s take a moment to explore the links from Taxonomy records to other NCBI databases. These links are shown in a table on the right side of the screen from any Taxonomy record.

Links to Entrez records from Taxonomy

Each database that has links related to the organism or group of organisms is listed. The number reflects the number of records in that other database that relate to the organism(s). You can click on the number to jump to the records in the other database.

In this class you will explore a number of these databases in detail. For now, we’ll take a quick look at the nucleotide sequence databases.

Exploring Links to Other Databases

2 of 5

See the Nucleotide database link at the top of the table on the right.

Nucleotide link

The Nucleotide database is comprised of nucleotide sequence records. This screen is telling us that there are approximately 1,400 DNA or RNA sequences in the Nucleotide database that have been identified as being sequences from Homo sapiens neanderthalensis. Clicking on the number would take us to these sequence records in Nucleotide.

Note that NCBI will only have an organism in the Taxonomy database if we have at least one sequence record for it. The volume of sequence data depends on the organism. For well-studied organisms there could be millions of records from hundreds of studies.

Click the link to the Nucleotide database to get a sense of what the results look like.

Exploring Links to Other Databases

3 of 5

Note the second line under each entry tells you the number of nucleotide base pairs (bp) in a given record. In Nucleotide, records generally have hundreds to thousands of base pairs.

base pairs in Nucleotide results

When you’re done, use your browser to click back to the Homo sapiens neanderthalensis record in the Taxonomy Browser.

Exploring Links to Other Databases

4 of 5

Another database that has sequence data is the SRA database.

SRA experiments

SRA archives “reads” or “runs” from next generation sequencing technologies. These are high throughput sequences from one specific sample. These generally include a lot of sequence data. In this case, we have reads from more than 1,000 sequenced samples.

Click the link to SRA Experiments to get a sense of what the results look like.

Do you remember how many base pairs that records in NCBI Nucleotide generally have?

How many base pairs do records in SRA Experiments generally have? 

When you’re done, use your browser to click back to the Homo sapiens neanderthalensis record in Taxonomy.

Exploring Links to Other Databases

5 of 5

Other NCBI databases (that you will explore later in the class) contain varying numbers of organism-related records.

We have one Genome record associated with this subspecies (Neanderthals). This is the current human reference genome.

link to Genome

The BioProject and BioSample databases contain metadata that accompany SRA experiments – they describe the research project and the specific sample.

BioProject and BioSample links

The Protein database contains sequences for amino acids that correspond to coding nucleotide sequences in the Nucleotide records.

Protein links

Related records in NCBI databases are linked to each other. You can think of each of these databases as one of many “doorways” (or Entrez) into the data in the NCBI databases. We’ll be highlighting the links between the databases throughout this course.

Other and Unclassified Taxonomy Records

1 of 4

Let’s go back to the top level of the Taxonomy Browser to look at the “other” and “unclassified” categories.

Find the black menu at the top of the page and click the link to Taxonomy.

Taxonomy link in menu

Then click Browser under Taxonomy Tools.

link to taxonomy browser

From the top page of the Taxonomy Browser, click Other.

Other category in Taxonomy Browser

Other and Unclassified Taxonomy Records

2 of 4

Which of the following types of sequences can you find under “Other?” 

What is a vector?

What is a plasmid?

Other and Unclassified Taxonomy Records

3 of 4

Return to the top level of the Taxonomy Browser and this time, click Unclassified.

Unclassified category in Taxonomy Browser

Which of the following types of sequences can you find under “Unclassified?” 

Other and Unclassified Taxonomy Records

4 of 4

Metagenomic studies take samples from different environments to characterize the distribution of species in the sample, for comparison. For example, this could be a sample of water for an environmental study. It also could be a sample from a human gut, to learn about the microbiome and its relationship to health and disease. A sample like this, because it contains many, many species, isn’t necessarily linked from a specific species in the Taxonomy database. So we have some contrived records in Taxonomy to reflect these kinds of samples.

Conclusion

This is the end of the tutorial on Taxonomy.

Close both windows to end the Guide.

Powered by Guide on the Side from the University of Arizona Libraries
Developed resources reported in this site are supported by the National Library of Medicine (NLM), National Institutes of Health (NIH) under cooperative agreement number UG4LM012344 with the University of Utah Spencer S. Eccles Health Sciences Library. The content is solely the responsibility of the authors and does not necessarily represent the official views of NIH..