Dec 6, 2018 | Nature Biotechnology editorial staff

The Vertebrate Genome Project provides a new benchmark for those seeking to build reference genomes.

The first set of reference genomes recently released by the Vertebrate Genome Project (VGP) represents a watershed for genome sequencing. The VGP intends to generate reference genomes of species from all 260 vertebrate orders; ultimately, it plans to identify the nucleotide sequences of all 66,000 vertebrates. These reference genomes are notable not only for their breadth but also for their completeness, accuracy and haplotype-phased, chromosome-by-chromosome assemblies. The project employs state-of-the-art sequencing, mapping and computational technologies and draws on expertise worldwide to create a resource that promises unprecedented insights into vertebrate diversity and evolution. It is also establishing a benchmark of quality standards and best practices for the genome-sequencing field.

Reference genomes are the cornerstone of modern genomics. These high-quality genomes are differentiated from draft genomes by their completeness (low number of gaps), low number of errors, and high percentage of sequence assembled into chromosomes. Although the genomes of viruses and some prokaryotes have complete end-to-end sequence information, nearly all eukaryotic genomes do not. Indeed, even the latest (19th) version of the high-quality human reference has hundreds of gaps, mostly in or near centromeres, telomeres, segmental duplications and ribosomal DNA arrays.

The VGP’s first release of reference genomes includes four mammals, three birds, one reptile, one amphibian and five fish, many of which are endangered species. Remarkably, according to the VGP, the September release almost doubles in one swoop the number of high-quality vertebrate reference genomes available to the research community.

The effort, part of the Genome 10K Consortium, uses several technologies for genome sequencing and assembly: PacBio long reads, 10x Genomics linked Illumina reads, Hi-C chromatin mapping data and Bionano Genomics optical maps. The consortium aims to standardize the assembly and validation process to avoid systematic biases introduced by any one strategy. The results will provide a unique opportunity to discover what combination of technologies yields the best outcomes.

[