Genographic Project Web Site
Haplogroup Prediction Tool
This tool was developed as part of National Geographic's Genographic Project. A detailed description and explanation is available in a published report on public participants' mitochondrial DNA data collected during the first 18 months of the project.
Get Scientific Paper (PDF) | Get Supplemental Data (XLS)
This publication describes a new nearest-neighbor based methodology developed for Haplogroup assignment from HVS-1 sequence data, suggests it as a haplogroup prediction tool for validation of both new and previously reported databases, and demonstrates its superior performance over rule-based approaches, given a sufficiently large reference database. This analytical tool allows the comparison of any comparable data to the entire expanding Genographic dataset for quality control and predictive purposes.
This tool can be used in two modes:
1. Classifying the samples into haplogroups by comparing
them against the Genographic mtDNA database with our nearest neighbor
2. Classifying the samples into haplogroups by comparing
them against THE USER'S OWN mtDNA
reference database, of samples already classified into haplogroups.
In both modes, the user inputs a list of mtDNA HVS-I (16023-16569)
samples to classify, described by their mutations relative to the Cambridge Reference Service (rCRS)
(see example in the tool input area). Each line should contain one
sample. The list may be given as a text file or it may be copy-pasted
into the text window.
In the second mode, the user should also input another text file,
containing samples characterized in the same way, except that each
line starts with the Haplogroup label of the sample, e.g.:
V 16039A 16188T 16189C 16223T 16290T 16362C 16519C
The output shows a line for each sample, with its list of mutations
and the suggest haplogroup classification.