The NCBI Gene database includes and links to information about genetic variation, including the results of studies that link diseases and conditions (phenotypes) to specific genetic variations.
This tutorial demonstrates how to answer the following question using the NCBI Gene database:
What variations are present in the gene and are they associated with disease?
Example 1: Finding Diseases Caused by a Gene6 of 7
Look at the Associated Conditions section.
The Associated Conditions may include diseases directly caused by changes in this gene. In this case this gene is associated with Segawa syndrome which includes Parkinson-like symptoms due to a deficiency of L-Dopa.
The MedGen article (link for C1854299) provides a literature review of disease characteristics plus summaries from other sources such as GeneReviews, OMIM and Genetics Home reference.
Example 1: Finding Diseases Caused by a Gene7 of 7
Look at the NHGRI GWAS (Genome-Wide Association Study) Catalog section.
What is a genome-wide association study?
The GWAS Catalog results describe association or linkages of phenotypes with variants in or near the gene. These may or may not be causative. For example, the publication "Genome-wide association scan for variants associated with early-onset prostate cancer" (PMID: 24740154) reports on a significant association between early-onset prostate cancer and a variant in the chromosome 11p region near the TH gene.
Example 2: Finding Variants that Affect the Coding Region4 of 6
You are now in the ClinVar database, looking at all ClinVar records with TH listed as a gene (the database search is: TH[gene]). Note that these records do not represent all known variants. (We'll get back to that, later.)
We're looking for single nucleotide variants that cause disease.
Step 3: Use the filters on the left hand side under Clinical significance to limit to Pathogenic:
and, under Variation type, select Single nucleotide:
Example 2: Finding Variants that Affect the Coding Region5 of 6
The filters are useful, here, for understanding your results. See, in particular, the "Molecular consequence" filters.
A molecular consequence may be that the coding region is affected. We can see two types of coding consequences, here:
- The "Missense" variants change the codon to be for a different amino acid.
- The "Nonsense" variants change the codon to a codon that ends translation.
Example 2: Finding Variants that Affect the Coding Region6 of 6
You have now found what you were looking for: A list of all disease-causing single nucleotide variants of the TH gene that affect the coding regions.
Each item in this table of results from ClinVar represents a known variation. The variation is described in a standardized format called the Human Genome Variation Society (HGVS) Nomenclature. There is also brief information here about what conditions the variation is associated with and the strength of evidence on that association. We'll return to this topic in the Clinical section of the course.
For now, let's return to the Gene database record to find variants that may not be in ClinVar.
Return to the human TH record in the Gene database.
Example 3: Finding Common Variants1 of 9
Are there any common protein variants in this gene?
We define "common variants" as those that have a minor allele frequency (MAF) of 1% or more in the population.
You can find the common variants for a gene from the NCBI gene record.
From this TH Gene record, go to the Variation section using the table of contents.
Example 3: Finding Common Variants2 of 9
This time we'll look at the Variation Viewer. Note here that there are two links: One for GRCh37.p13 and one for GRCh38.
These are two different assemblies. Many researchers and labs continue to use an older genome assembly’s coordinates for a long time after a new genome assembly becomes available.
Follow the link "See Variation Viewer (GRCh38)."
Example 3: Finding Common Variants3 of 9
You are now in the Variation Viewer. The Variation Viewer shows you the genomic context of the variations.
We won't be exploring the Variation Viewer in detail right now, but there is a 4-minute tour of the Variation Viewer available, if you would like to learn more.
Example 3: Finding Common Variants4 of 9
Scroll down to the "Molecular consequence" filters in the left menu.
You can filter the table by missense type of "Molecular consequence” to get coding region variants.
What was a missense variant, again?
Example 3: Finding Common Variants5 of 9
Scroll further to the "1000 Genomes MAF” section to filter for only common variants.
What is "1000 Genomes?"
What was MAF, again?
Remember that a "common" variant has a frequency of greater than or equal to 1% in the general population.
We'll use >=0.05 for this example.
Example 3: Finding Common Variants6 of 9
One variant remains, rs6356. You may need to scroll back up the page to view the result.
This Variant ID is an "rs" number or "Reference SNP" from the Single Nucleotide Polymorphism database (dbSNP). Following the link from rs6356 takes you to dbSNP to learn more about this variant. (Use this link to stay in this tutorial.)
Example 3: Finding Common Variants7 of 9
You're now in the SNP (single nucleotide polymorphism) database.
What is a single nucleotide polymorphism?
Earlier we used the ClinVar filters to find single nucleotide variants that cause disease. dbSNP contains information about human single nucleotide variations as well as other short genetic variations. In this class we'll generally be following links from other databases to dbSNP, but keep in mind that you can also search dbSNP directly with an rs number. [HINT: You might be doing this in a future exercise.]
Back to our example, we learn from the dbSNP record for rs6356 that this variation, a change from C to T at this position, results in changing the reference amino acid valine in the protein product to a methionine.
Example 3: Finding Common Variants8 of 9
Now let's go back to our example in the Variation Viewer. If you've wandered off, follow this link to get back
Remember that we got here from the Gene database link from the TH gene record to the Variation Viewer, applying some filters to find missense variations that we've found with >= 0.05 frequency in the 1000 Genomes project.
We could also have found this variant in ClinVar. However, many common variants (especially non-coding ones) are not in ClinVar. You can still find and filter for ClinVar variants in the Variation Viewer. Deselect the earlier missense and MAF >= 0.05 choices to see the number of variants in ClinVar.