The Genographic Project
Genographic Project Web Site

This tool was developed as part of National Geographic's Genographic Project. A detailed description and explanation is available in a published report on public participants' mitochondrial DNA data collected during the first 18 months of the project.

Get Scientific Paper (PDF) | Get Supplemental Data (XLS)

This publication describes a new nearest-neighbor based methodology developed for Haplogroup assignment from HVS-1 sequence data, suggests it as a haplogroup prediction tool for validation of both new and previously reported databases, and demonstrates its superior performance over rule-based approaches, given a sufficiently large reference database. This analytical tool allows the comparison of any comparable data to the entire expanding Genographic dataset for quality control and predictive purposes.

This tool can be used in two modes:

1. Classifying the samples into haplogroups by comparing them against the Genographic mtDNA database with our nearest neighbor algorithm.

2. Classifying the samples into haplogroups by comparing them against THE USER'S OWN mtDNA reference database, of samples already classified into haplogroups.

In both modes, the user inputs a list of mtDNA HVS-I (16023-16569) samples to classify, described by their mutations relative to the Cambridge Reference Service (rCRS) (see example in the tool input area). Each line should contain one sample. The list may be given as a text file or it may be copy-pasted into the text window.

In the second mode, the user should also input another text file, containing samples characterized in the same way, except that each line starts with the Haplogroup label of the sample, e.g.: V 16039A 16188T 16189C 16223T 16290T 16362C 16519C

The output shows a line for each sample, with its list of mutations and the suggest haplogroup classification.

1. Upload the samples file:

2. Or copy and paste its contents into the area below
(One sample per line, separate the mutations with space)
Example: 16039A 16188T 16189C 16223T 16290T 16319A 16356C 16362C 16519C



3. Or you may upload your reference table to check your samples against it. If you don't upload your table, we'll use ours.