Guide on the Side: NCBI Protein: Simple Search and Record Structure

About NCBI Protein

The NCBI Protein database is a database of protein (amino acid) sequences. Some of these sequences come from laboratories that have done protein sequencing (called "primary" data) and some - like the NCBI RefSeq records - are derived from genetic sequences (called "derived" data).

Let's look at an example to explore the Protein database.

Finding a Protein Sequence Record

1 of 4

We will start with a simple search: How many sequences exist in the Protein database for the mitochondrial transporter SLC25A3?

Check that Protein is selected from the database selection menu, enter SLC25A3 and click Search.

Finding a Protein Sequence Record

2 of 4

You should have retrieved more than 1200 records.

Explore the filters on the left to learn what kinds of sequences the Protein database contains.

Use the Customize link under the Source databases menu to view all of the sources. You can select those you wish to view, then click Show to have them appear on your results page.

PDB records and UniProt records may be primary sequence data. Most of the records for SLC25A3 are RefSeq records, which are derived from nucleotide sequence data.

Finding a Protein Sequence Record

3 of 4

At the top of your results you can see the NCBI gene sensor in action.

Under the link to the Gene database there are quick links to the most relevant reference (RefSeq) transcript, protein and gene sequences.

Let's explore the three Protein reference sequences.

Click the link to RefSeq proteins (3) from the gene sensor.

Finding a Protein Sequence Record

4 of 4

You should now see three records that represent reference sequences in the Protein database. Different splice variants will be represented by different protein records. For SLC25A3, there is one sequence for isoform a (with 362 amino acids) and two sequences that result in isoform b (with 361 amino acids).

Click to view the record for isoform a.

Understanding a Protein Sequence Record

Take a moment to familiarize yourself with this protein reference sequence record. This is a record that is curated by NCBI.

Note:

The label of "NCBI Reference Sequence" and record prefix "NP_" below the title, identifying this as an NCBI protein reference sequence record.
The DBSOURCE field, which tells you what genomic reference sequence this protein sequence is derived from (NM_005888.3).
The organism and complete taxonomic hierarchy of that organism.
The list of literature references, noting the specific sequence of residues discussed; the full citation data, link to PubMed, and "REMARK," which notes the findings of the research cited.
The COMMENT field, which summarizes what is known about the function of this protein, and relevant information about transcript variants.
The FEATURES table, which points out (or "annotates") specific segments, sites or regions of the sequence and their known products or functions. Clicking on a link in this table will highlight the amino acids (or "residues") involved in the sequence at the bottom of the record.
The ORIGIN section at the bottom of the record is the actual sequence of amino acids or "residues."

Explore this record to your satisfaction.

You have reached the end of this tutorial on the NCBI Protein database search and record structure.

Continue to Chapter 6. Identical Sequences.

Powered by Guide on the Side from the University of Arizona Libraries
Developed resources reported in this site are supported by the National Library of Medicine (NLM), National Institutes of Health (NIH) under cooperative agreement number UG4LM012344 with the University of Utah Spencer S. Eccles Health Sciences Library. The content is solely the responsibility of the authors and does not necessarily represent the official views of NIH..