Abstract

The primary reference genome assemblies for Bos taurus cattle were derived from a cow named L1 Dominette 01449, a Hereford beef cow born in eastern Montana. While Dominette's genome assembly ushered in the genomics era in cattle research, there are many genetic differences between Hereford and other cattle breeds, such as Holstein dairy cattle, due to genetic drift and selection since breed divergence. Zoetis has a substantial portfolio devoted to the health and wellness of Holstein dairy cattle, therefore we generated a complete Holstein reference genome to better understand the genetic basis of dairy cattle phenotypes. Semen samples were obtained from a Holstein bull, and high molecular weight DNA extracted. This DNA was sequenced using multiple approaches to support a robust genome assembly: (A) 174 SMRT cells of PacBio RSII, (B) Dovetail Chicago libraries, (C) 2kb Illumina Nextera matepair libraries sequenced 2x75 on a NextSeq 500, and (D) 300 and 500bp Illumina paired end libraries sequenced 2x150 on a NextSeq 500. PacBio reads were assembled using the PacBio FALCON assembler and polished using Quiver. Scaffolding was performed using the Dovetail data, combined with the Bos taurus linkage map and Hereford optical map BtOM1.0. The Illumina mate pair and paired end data were used for polishing and closing remaining gaps. Leftover contigs were aligned to the Btau5.0.1 Y reference chromosome using MUMmer and either placed on a putative Y chromosome contig list or an unplaced contig list. The final assembly constitutes 30 scaffolds (BTA1-29, X), 271 contigs assigned to the Y chromosome, and 2,958 unplaced contigs. After addition of the mitochondrion, the complete genome size is 2,772,068,867 bp, an increase in size relative to the Btau 5.0.1 genome at 2,724,980,740 bp. The scaffold N50 is 103.87 Mb with an L50 scaffold count of 11. This genome is 94.2% complete when assessed using Benchmarking Universal Single-Copy Orthologs (BUSCO), which is comparable to the Dominette UMD3.1 assembly. A comparative alignment of the UMD3.1, ARS-UCD1.1, and this assembly, while generally parsimonious, identified numerous rearrangements. Preliminary LD analysis using the Illumina Bovine SNP50 Bead Chip suggests that the distribution of pairwise LD for Holsteins agrees better with intermarker distances on the Holstein genome relative to the UMD3.1 and ARS-UCD1.1 positions. This is the first Holstein assembly derived from long read data, and will provide a useful tool for understanding economically relevant traits in dairy cattle. Keywords: dairy cattle, Holstein, assembly, next-generation sequencing, PacBio, Dovetail

Kristina Weber, Rakesh Ponnala, Christian Dreischer, Xi Zeng, Avinash Baktula, Sarah Corum, Prerak Desai, Shannon Smith, Natascha Vukasinovic, Juan Fernando Medrano, Benjamin D Rosen, Timothy P Smith, Russell Golson, Gonzalo Rincon, Sue DeNise

Proceedings of the World Congress on Genetics Applied to Livestock Production, Volume Electronic Poster Session - Molecular Genetics 3, , 235, 2018
Download Full PDF BibTEX Citation Endnote Citation Search the Proceedings



Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.