The NCBI Gene database includes and links to information about genetic variation, including the results of studies that link diseases and conditions (phenotypes) to specific genetic variations.

This tutorial demonstrates how to answer the following question using the NCBI Gene database:

What variations are present in the gene and are they associated with disease?

Example 1: Finding Diseases Caused by a Gene

1 of 7

What diseases can be caused by variations in the tyrosine hydroxylase gene?

Example 1: Finding Diseases Caused by a Gene

2 of 7

Step 1: In the NCBI Gene Database, search for human tyrosine hydroxylase:

tyrosine hydroxylase AND human[orgn]

Sort by Relevance.

tyrosine results page

Example 1: Finding Diseases Caused by a Gene

3 of 7

When you sort by relevance, usually the record of interest appears at the top. In this case, we're interested in the human tyrosine hydroxylase (TH) gene.

Step 2: Click on the TH (ID: 7054) record link.

link to TH gene

Example 1: Finding Diseases Caused by a Gene

4 of 7

Now that we've found the right gene, we want to find the diseases associated with it.

Step 3: Go to the "Phenotypes" section in the Table of Contents.

table of contents

What is a phenotype?

Example 1: Finding Diseases Caused by a Gene

5 of 7

There are two parts to the Phenotypes section: Associated Conditions and NHGRI GWAS Catalog.

Let's look at each of these.

phenotypes on gene record

Example 1: Finding Diseases Caused by a Gene

6 of 7

Look at the Associated Conditions section.

The Associated Conditions may include diseases directly caused by changes in this gene. In this case this gene is associated with Segawa syndrome which includes Parkinson-like symptoms due to a deficiency of L-Dopa.

associated conditions section

The MedGen article (link for C1854299) provides a literature review of disease characteristics plus summaries from other sources such as GeneReviews, OMIM and Genetics Home reference.

Example 1: Finding Diseases Caused by a Gene

7 of 7

Look at the NHGRI GWAS (Genome-Wide Association Study) Catalog section.

gwas studies

What is a genome-wide association study?

The GWAS Catalog results describe association or linkages of phenotypes with variants in or near the gene. These may or may not be causative. For example, the publication "Genome-wide association scan for variants associated with early-onset prostate cancer" (PMID: 24740154) reports on a significant association between early-onset prostate cancer and a variant in the chromosome 11p region near the TH gene.

Example 2: Finding Variants that Affect the Coding Region

1 of 6

Can I get a list of all disease-causing single nucleotide variants that affect the coding regions?

Example 2: Finding Variants that Affect the Coding Region

2 of 6

To find gene variants associated with disease, you can again start in the NCBI Gene database. From this Gene record for the TH (tyrosine hydroxylase) gene, let's find the single nucleotide variants associated with disease:

Step 1: Use the table of contents to jump to the Variation section of the record.

table of contents

Example 2: Finding Variants that Affect the Coding Region

3 of 6

The Variation section of a Gene record links to several different databases. To find the variants affecting the coding region of this gene that are associated with disease, we'll explore ClinVar.

Step 2: Follow the link to "See variants in ClinVar"

Example 2: Finding Variants that Affect the Coding Region

4 of 6

You are now in the ClinVar database, looking at all ClinVar records with TH listed as a gene (the database search is: TH[gene]). Note that these records do not represent all known variants. (We'll get back to that, later.)

We're looking for single nucleotide variants that cause disease.

Step 3: Use the filters on the left hand side under Clinical significance to limit to Pathogenic:

Clinical significance Pathogenic

and, under Variation type, select Single nucleotide:

single nucleotide

Example 2: Finding Variants that Affect the Coding Region

5 of 6

The filters are useful, here, for understanding your results. See, in particular, the "Molecular consequence" filters.

molecular consequence

A molecular consequence may be that the coding region is affected. We can see two types of coding consequences, here:

  • The "Missense" variants change the codon to be for a different amino acid.
  • The "Nonsense" variants change the codon to a codon that ends translation.

Example 2: Finding Variants that Affect the Coding Region

6 of 6

You have now found what you were looking for: A list of all disease-causing single nucleotide variants of the TH gene that affect the coding regions.

Each item in this table of results from ClinVar represents a known variation. The variation is described in a standardized format called the Human Genome Variation Society (HGVS) Nomenclature.  There is also brief information here about what conditions the variation is associated with and the strength of evidence on that association. We'll return to this topic in the Clinical section of the course.

ClinVar results

For now, let's return to the Gene database record to find variants that may not be in ClinVar.

Return to the human TH record in the Gene database.

Example 3: Finding Common Variants

1 of 9

Are there any common protein variants in this gene?

We define "common variants" as those that have a minor allele frequency (MAF) of 1% or more in the population.

You can find the common variants for a gene from the NCBI gene record.

From this TH Gene record, go to the Variation section using the table of contents.

table of contents

Example 3: Finding Common Variants

2 of 9

This time we'll look at the Variation Viewer. Note here that there are two links: One for GRCh37.p13 and one for GRCh38.

variation viewer link

These are two different assemblies. Many researchers and labs continue to use an older genome assembly’s coordinates for a long time after a new genome assembly becomes available.

Follow the link "See Variation Viewer (GRCh38)."

Example 3: Finding Common Variants

3 of 9
You are now in the Variation Viewer. The Variation Viewer shows you the genomic context of the variations.

We won't be exploring the Variation Viewer in detail right now, but there is a 4-minute tour of the Variation Viewer available, if you would like to learn more.

Example 3: Finding Common Variants

4 of 9

Scroll down to the "Molecular consequence" filters in the left menu.

You can filter the table by missense type of "Molecular consequence” to get coding region variants.

missense variant

What was a missense variant, again?

Example 3: Finding Common Variants

5 of 9Scroll further to the "1000 Genomes MAF” section to filter for only common variants.

What is "1000 Genomes?"

What was MAF, again?

Remember that a "common" variant has a frequency of greater than or equal to 1% in the general population.

We'll use >=0.05 for this example.

1000 genomes

Example 3: Finding Common Variants

6 of 9

One variant remains, rs6356. You may need to scroll back up the page to view the result.


This Variant ID is an "rs" number or "Reference SNP" from the Single Nucleotide Polymorphism database (dbSNP).  Following the link from rs6356 takes you to dbSNP to learn more about this variant. (Use this link to stay in this tutorial.)

Example 3: Finding Common Variants

7 of 9

You're now in the SNP (single nucleotide polymorphism) database.

What is a single nucleotide polymorphism?

Earlier we used the ClinVar filters to find single nucleotide variants that cause disease. dbSNP contains information about human single nucleotide variations as well as other short genetic variations. In this class we'll generally be following links from other databases to dbSNP, but keep in mind that you can also search dbSNP directly with an rs number. [HINT: You might be doing this in a future exercise.]

Back to our example, we learn from the dbSNP record for rs6356 that this variation, a change from C to T at this position, results in changing the reference amino acid valine in the protein product to a methionine.

Example 3: Finding Common Variants

8 of 9Now let's go back to our example in the Variation Viewer. If you've wandered off, follow this link to get back.

Remember that we got here from the Gene database link from the TH gene record to the Variation Viewer, applying some filters to find missense variations that we've found with >= 0.05 frequency in the 1000 Genomes project.

We could also have found this variant in ClinVar. However, many common variants (especially non-coding ones) are not in ClinVar. You can still find and filter for ClinVar variants in the Variation Viewer. Deselect the earlier missense and MAF >= 0.05 choices to see the number of variants in ClinVar.

Example 3: Finding Common Variants

9 of 9

You have reached the end of the tutorial for the question:

What variations are present in the gene and are they associated with disease?

Continue to Chapter 9: In what tissues and under what conditions is a gene expressed?

Powered by Guide on the Side from the University of Arizona Libraries
Developed resources reported in this site are supported by the National Library of Medicine (NLM), National Institutes of Health (NIH) under cooperative agreement number UG4LM012344 with the University of Utah Spencer S. Eccles Health Sciences Library. The content is solely the responsibility of the authors and does not necessarily represent the official views of NIH..