NCBI Nucleotide

Open Nucleotide

in another browser window to work through this tutorial side by side.

About NCBI Nucleotide

About NCBI Nucleotide

The Nucleotide database is a database of nucleic acid sequences. These sequences come from laboratories around the world that submit their data to one of a set of repositories, including GenBank, which is maintained by NCBI. Other records are "Reference Sequences," which are representative (model) examples of sequences, curated by NCBI.

Where possible, the sequences are annotated so that you can find the strings of sequences that may be functional.

Let's look at two examples so that you can become familiar with the parts of a Nucleotide record.

Example 1: Giant Viruses

1 of 6

You might have read in the news or heard on the radio or TV about the new giant viruses that have been discovered.

Skim this Science article abstract and find the name for some of the viruses.

Example 1: Giant Viruses

2 of 6

Now try searching the NCBI Nucleotide database for klosneuvirus.

Results appear in the middle of the page. Search filters and other discovery tools appear on either side of results. 

Each result in the database represents a record for one sequence or string of nucleic acids. The strings may overlap.

Click on the record for accession number KY684123.1 and view the full record.

Record for accession number KY684123.1

[Note: Sidebar images like the above can be expanded by clicking on them.]

Example 1: Giant Viruses

3 of 6

Look at the record for Klosneuvirus 16 and answer the following questions.

For help, view a Sample GenBank record.

 

What is the sequence length or number of base pairs?

 HINT: This information is stored in the Locus field. 

Locus

[Click image to expand.]

Example 1: Giant Viruses

4 of 6

Scroll down the feature table and click on the first link for "gene." 

feature table gene link

[Click image to expand.]

This link "jumps" you to the raw sequence data at the bottom of the record, highlighting the part of the sequence identified as a gene.

klosneuvirus sequence

New navigation features appear at the bottom of your screen, allowing you to jump from feature to feature.

gene feature of klosneuvirus

How many genes are annotated (‘defined’) on this accession number?

Hint: This information is included on the bottom of the page:

feature segment annotation

Example 1: Giant Viruses

5 of 6

Now scroll back up to the feature table and notice how the different features ("gene," "CDS," "ncRNA") are annotated. See what features have products.

What are the first ten nucleotides in the HNH endonuclease gene?

 Hint: Look through the feature table to find a part of the sequence with a product labeled, "HNH endonuclease." Follow the link in the feature table to the sequence at the bottom of the record to see the first ten nucleotides.

Example 1: Giant Viruses

6 of 6

Are any RNA molecules annotated (‘defined’) on this record (accession number KY684123)?

Hint: Either look through the feature table for features marked "RNA" or check the drop down menu at the bottom of the screen.

ncRNA

 What is ncRNA?

Example 2: Mitochondrial Transporter SLC25A3

1 of 5

Start a new search at the top of the screen

Nucleotide search

Search for SLC25A3.

What does this gene do?

Use your search results from Nucleotide to answer the following three questions about SLC25A3:  

Approximately how many sequences exist in the NCBI Nucleotide database for the mitochondrial transporter SLC25A3?

Example 2: Mitochondrial Transporter SLC25A3

2 of 5

Approximately how many sequence records in your results are from the INSDC (GenBank)?

HINT: Find your filter options on the left of your screen.

What is INSDC?

Example 2: Mitochondrial Transporter SLC25A3

3 of 5

Approximately how many of your search results are RefSeq records?

What is RefSeq?

Example 2: Mitochondrial Transporter SLC25A3

4 of 5Now start a new search using the Advanced page 

Advanced page

Search for SLC25A3 in the Title field.

Advanced search for SLC25A3 using Title field

You can use the Add to history link to stay on the Advanced page and compare your results.

How much of a difference is there between the results for this search and your last one? 

Hint: Compare the following searches:

SLC25A3

SLC25A3[title]

Example 2: Mitochondrial Transporter SLC25A3

5 of 5

Return to your search results for your SLC25A3 search (WITHOUT the [title] field restriction).

Are there any SLC25A3 sequences in plants?

Hint: Check the Species filters in the left column of your results.

Example 3: Human tyrosine hydroxylase

1 of 4

Start a new search at the top of the screen.

Nucleotide search

Find human tyrosine hydroxylase gene sequence records.

What does this gene do?

Answer

 

Example 3: Human tyrosine hydroxylase

2 of 4

Look at the following accession records, all of which are in your search results for the tyrosine hydroxylase (TH) gene:

L15440.1

X05290.1

M17589.1

How are these records different from each other? Look at their sizes, their sources, any descriptive notes, and their feature tables.

HINT: These three records are in your search results, but for easier comparison search Nucleotide for:

L15440.1 OR X05290.1 OR M17589.1

 

Example 3: Human tyrosine hydroxylase

3 of 4

Each of the records you looked at included a sequence that codes for the TH gene. 

L15440.1 : The record title says it all. This records a human DNA sequence that includes several genes, not just TH. And it only has the 3' end of the TH gene. See the top of the feature table, and you can see that base pairs 1 through 1482 are part of the TH gene.

X05290 : This records an mRNA sequence that encodes a single protein product, identified as “tyrosine hydroxylase (HTH-1)”. Note that it includes only about 1,800 base pairs. In the CDS portion of the feature table, note that the protein product is called HTH-1.

M17589.1 : The feature table on this record includes only one gene and one CDS region. Note that the literature reference says that "alternative RNA splicing produces four kinds of mRNA from a single gene" and the annotation in the feature table says that this one is type 4.

Example 3: Human tyrosine hydroxylase

4 of 4

Extra tip:

To find records with variation features, use the Advanced Search function, 

advanced search in Nucleotide

select Feature key, and search 'variation':

Variation feature key

 

Conclusion

1 of 2

The Nucleotide database is a database of nucleic acid sequences. These sequences come from laboratories around the world that submit their data to one of a set of repositories, including GenBank, which is maintained by NCBI. Other records are "Reference Sequences," which are representative (model) examples of sequences, curated by NCBI. As you have seen in the tutorial, you can look up nucleotide sequences a variety of ways. 

Conclusion

2 of 2

You have reached the end of the NCBI Nucleotide tutorial. 

Close both windows to end the Guide.

Powered by Guide on the Side from the University of Arizona Libraries
Developed resources reported in this site are supported by the National Library of Medicine (NLM), National Institutes of Health (NIH) under cooperative agreement number UG4LM012344 with the University of Utah Spencer S. Eccles Health Sciences Library. The content is solely the responsibility of the authors and does not necessarily represent the official views of NIH..