Reference genome and builds
Every variant is described relative to a reference genome, a standardised sequence. The two common versions, GRCh37 (hg19) and GRCh38 (hg38), number positions differently, so the same variant has different coordinates. Knowing the build is essential to read a position correctly.
Why a reference
For findings to be comparable, a shared coordinate system is needed: the reference genome. It is an assembled standard sequence, not the genome of a single person. Against it every position is named and every variant described as a deviation, with a reference allele and an alternative allele.
GRCh37 and GRCh38
The reference genome is improved over time. Two versions are common: the older GRCh37, also hg19, and the newer, more complete GRCh38, also hg38. Because sequence was inserted and corrected in between, the numbers shift: the same variant has a different coordinate in each build. Converting between them is called liftover.
Why rsIDs are more robust
A bare numeric coordinate is ambiguous without the build label and a common source of error. An rsID, by contrast, points to the variant itself, regardless of where a build places it. That is why this wiki names markers by their rsID. Anyone working with coordinates should always state the build alongside.
What Genome measures. Genome works in a defined build. A position like chr19:44,908,822 only makes sense together with its build. The rsIDs the wiki names are build-independent.
Related topics
Sources
- 1Church et al., 2011 Modernizing reference genome assemblies. PLoS Biology 9:e1001091. doi.org/10.1371/journal.pbio.1001091
- 2Schneider et al., 2017 Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly. Genome Research 27:849–864. doi.org/10.1101/gr.213611.116