Abstract

The draft assembly of the bovine genome was first released in 2004. Sanger sequencing was used to assemble the genome of the Hereford cow L1 Dominette 01449. The assembly has seen vast improvements over the years but the limitations of the Sanger reads that form the basis of the assembly mean that many issues remain. Recent advances in long-read sequence technology, combined with new scaffolding technologies, have made it possible to create a completely new de novo assembly of the Dominette genome. Approximately 80x genome coverage of PacBio sequence was de novo assembled with Falcon, this was followed by scaffolding with Dovetail Genomics Chicago data, the BtOM1.0 Optical Map and a recombination map of 59K autosomal SNPs yielding chromosome length scaffolds. The scaffolded assembly was then refined with independent de-novo assemblies from CANU and MaSuRCA, error corrected with an independent genetic map, and polished with 50x Illumina reads. Assembly statistics include an N50 contig size of 26 Mb with 393 gaps representing many fold improvements over UMD3.1 (contig N50=0.97Mb, 72,051 gaps). Additionally, full-length transcripts from 28 Dominette tissues have been sequenced with PacBio using the Iso-Seq method to support improved annotation. A public version of the new ARS-UCD assembly is expected to be available before the start of the conference. Keywords: cattle, assembly, scaffolding, next-generation sequencing, PacBio, Dovetail

Benjamin Rosen, Derek Bickhart, Robert Schnabel, Sergey Koren, Christine Elsik, Aleksey Zimin, Christian Dreischer, Sebastian Schultheiss, Richard Hall, Steven Schroeder, Curtis Van Tassell, Timothy Smith, Juan Medrano

Proceedings of the World Congress on Genetics Applied to Livestock Production, Volume Molecular Genetics 3, , 802, 2018
Download Full PDF BibTEX Citation Endnote Citation Search the Proceedings



Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.