Abstract

In the age of big data, biological databases must undergo rapid development of their data infrastructure in order to effectively accommodate abundant data collection in a better structured manner in order to improve metadata analysis with better power. The genetic and phenotypic correlation data from studies carried out in the past 70+ years, and the quantitative trait loci (QTL) mapping results from studies over the past 25+ years, provide a huge amount of information to add new types of annotations to the genomes. The growth of Animal QTLdb and CorrDB over the past decade provides valuable tools for researchers to utilize a wealth of historical and future phenotype/genotype data to elucidate the genetic mechanisms behind livestock production improvements. Our recent efforts in extensive data curation, data quality maintenance, new web tool developments, and collaborative database expansions provide convenient platforms for data queries and analysis to serve the needs of phenotype/genotype data collection for the livestock genetics/genomics community. Through the course of over 13 (QTLdb) and 5 (CorrDB) years of development, applications developed for Animal QTLdb and CorrDB have embraced the big data era when metadata analysis started to demonstrate its power and utility in terms of resynthesis of metadata for updated genetic analysis. To date, there have been 136,137 QTL/associations curated from 1,881 journal articles that represent 1,890 different traits in 6 livestock animal species. We use a strategy to map all QTL/correlation trait data to VT/PT/CMO ontologies so that they can be linked in the sense of information networks. By developing a trait-centric and a gene-centric view of the QTL/association data, vast amount of phenotype/genotype data can now be summarized in ways easier for human consumption. In addition, we have been expanding the type of data for collections over the years. The most recent addition is to include “supplementary data”, i.e. original genotypes, phenotypes, near-significant or other association/QTL data from the same experiment that are not part of official publications. The inclusion of such data may add value to the “big data” pool when meta-analysis will kick in. The most critical, yet less transparent to the public developmental work is the improvement of curation work flow for better data quality controls and maintenance. In addition to existing data status such as “temporary”, “initial”, “new”, “reviewed”, we added “re-track”, “suspend”, “obsolete”, “on-hold” status, and corresponding procedures to better manage the data flow within the curator/editor pipelines. The goals of our database development are not only to facilitate data collection, curation, and annotation, but also to provide mechanisms to support innovated data structure for new types of data reanalysis, combined analysis, and data mining that may lead to new discoveries. Keywords: livestock, qtldb, corrdb, phenotype, genotype, trait, ontology, database, big data, curation

Zhiliang Hu, Carissa Park, James Reecy

Proceedings of the World Congress on Genetics Applied to Livestock Production, Volume Molecular Genetics 2, , 954, 2018
Download Full PDF BibTEX Citation Endnote Citation Search the Proceedings



Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.