- Open Access
Current status and future perspectives for sequencing livestock genomes
Journal of Animal Science and Biotechnology volume 3, Article number: 8 (2012)
Only in recent years, the draft sequences for several agricultural animals have been assembled. Assembling an individual animal's entire genome sequence or specific region(s) of interest is increasingly important for agricultural researchers to perform genetic comparisons between animals with different performance. We review the current status for several sequenced agricultural species and suggest that next generation sequencing (NGS) technology with decreased sequencing cost and increased speed of sequencing can benefit agricultural researchers. By taking advantage of advanced NGS technologies, genes and chromosomal regions that are more labile to the influence of environmental factors could be pinpointed. A more long term goal would be addressing the question of how animals respond at the molecular and cellular levels to different environmental models (e.g. nutrition). Upon revealing important genes and gene-environment interactions, the rate of genetic improvement can also be accelerated. It is clear that NGS technologies will be able to assist animal scientists to efficiently raise animals and to better prevent infectious diseases so that overall costs of animal production can be decreased.
1. Current Status of Domestic Animal Reference Sequences
As the new genomics era matures, with large-scale genome research and the development of sophisticated bioinformatics tools that can be applied to the agricultural field, agricultural researchers should take advantage of and benefit from new sequencing and mapping technologies. In recent years, the genomes of several domesticated livestock animals (chicken, pig, cow, sheep, and horse) have been partially or completely sequenced. In this review, we first examine the current sequencing status for several sequenced agricultural species. Next, we discuss the different platforms used for genome sequencing, tools available for mapping sequences to the genome, and several additional applications for which next generation sequencing can be used. We also list tools available for analyzing data from these additional applications.
Due to the high recombination rate of its micro-chromosomes, the chicken is an ideal model for studying genetic linkage . The chicken genome sequence of Red Junglefowl (RJF) was the first livestock species to be sequenced. The first draft of the chicken genome was built from an assembly with 6.6-fold whole-genome shotgun coverage, although sex chromosomes were poorly annotated in the initial assembly [1, 2]. The updated version of NCBI build 2.1 was released recently with a significant improvement on the annotation of the sex chromosomes. Roughly 2.8 million SNPs for chicken were identified [1, 3, 4] between the base (wild type) RJF sequence assembly and a partial genome scan of three chicken breeds: a female layer (White Leghorn); a male broiler (Cornish); and a female Silkie. A moderate density (60 k) Illumina SNP BeadChip for commercial chicken (broilers and layers) containing 352,303 SNPs was designed and additional SNPs not covered by the current chicken genome assembly (Gallus_gallus-2.1) were identified and selected recently . The BBSRC ChickenEST Database (http://www.chick.manchester.ac.uk/) provides the most comprehensive database [6, 7] of ESTs/cDNAs for the chicken genome. Chicken Variation Database (ChickVD) (http://chicken.genomics.org.cn/) was released in 2005  for geneticists to use, and contains the genes, variants, chicken orthologs of human disease genes, and QTLs which are stretches of DNA containing or linked to the genes that underlie a quantitative trait. Large scale breeding research projects are still needed (http://www.nih.gov/science/models/gallus/).
In November 2009 the first draft (98% complete) of the pig genome (Sus scrofa) assembled from global collaborative efforts was released. The diploid pig genome has 38 chromosomes (including meta- and acrocentric ones) and is roughly 2.7 × 109 bp. Both high-throughput fingerprinting and BAC (bacterial artificial chromosome) end sequencing over 600,000 BAC end sequences) were used as templates for sequencing the whole swine genome. Specifically, the restriction enzyme fingerprinting method  was used to construct a physical map through bacteria-based clones for the swine genome. The sequence will be used as the basis to identify genes that are important to pork production and/or are involved in immune or physiological processes (http://www.sanger.ac.uk/about/press/2009/091102.html). The finished pig assembly will not only help researchers to understand its genetic complexity, but it will also change pork production and breeding technology. The completed swine genome is critical to helping researchers study human nutrition and disease, due to these animals' similar physiology and nutritional needs to humans (http://www.sanger.ac.uk/).
The genome sequence of Taurine cattle was initially sequenced and assembled with approximately 7-fold coverage and was published by the Bovine Genome Sequencing and Analysis Consortium in April 2009. This initial assembly reported roughly 22,000 genes and 14,345 orthologs shared among seven mammalian species . Bovine Genome Sequencing Projects led by the Baylor College of Medicine Human Genome Sequencing Center in Houston, Texas released an improved assembly version (Btau_4.2) for the cow genome in 2009. The BCM4 assembly was constructed using the Atlas assembly program . The assembly of UMD2 from Steven Salzberg and his colleagues in Baltimore, Maryland was constructed using NCBI traces and strengthened using several modified, powerful assembly and mapping tools. Roughly 24 million reads from whole genome sequencing and 11 million reads from BACs were used to create the UMD2 assembly . The Salzberg lab recently created an updated assembly (UMD3.1) of 2.86 billion base pairs with 9.5x coverage of the genome . Even with all of these efforts that researchers have invested, the cow genome is still not completely assembled. The Illumina BovineSNP50 is a high-density, genome-wide genotyping array. The v2 Bead Chip contains 54,609 SNPs of major breed types. The probes were validated in 19 common beef and dairy breeds. This makes certain types of research, such as QTL discovery and genetic improvement possible (http://www.illumina.com/products/bovine_snp50_whole-genome_genotyping_kits.ilmn). Although BovineSNP50 was successfully used, several new chips have been designed and/or are being designed. Besides keeping BovineSNP50 SNPs, Bovine High-Density (HD) Bead Chip (778K SNP) includes some Y-specific and mitochondrial SNPs. Other chips, such as Bovine Low-Density (3K) Bead Chip, 96 SNP parentage chip, 384 SNP chip, and 700 K SNP Affymetrix chip were designed to use for different genetic purposes (http://www.slideserve.com/Download/143258/Walking-the-Cattle-Continuum-Moving-From-the-BovineSNP50-to-Higher-and-Lower-Density-SNP-Panels). A new collaborative project between Australian beef and dairy industries and international partners is constructing a database of functional polymorphisms and sequence information on 1,000 cattle. This will facilitate research on identifying features in the genome that are related to economically important traits (http://www.beefcrc.com.au/Assets/819/1/BeefBulletin-September20117-9-11webspreads.pdf). Given the importance of the Bovine sequence in impacting the dairy industry's genetic gains, future technology and novel assembly methods are desired to bring the cow genome annotation to a more complete state and to provide a faster, cost-efficient way of sequencing other cattle breeds. Such sequencing projects could help understand variation in resistance to disease and lead to improved breeding programs.
The interim assembly version OARv2.0 for sheep was released recently  with the goal of identifying genes associated with production, quality, and disease traits in sheep (http://www.sheephapmap.org/). The OARv3.0 is projected to be released in late 2011 with the expected improvement that chromosomal gaps will be filled and many of the unassigned sequences in v2.0 will be correctly assigned to chromosomes. In addition, transcriptomic and SNP datasets are expected in the new release as well (http://sheephapmap.org/news/Scheduled_OARv3.php).
The horse is a model organism for studying biomechanics and exercise physiology (http://www.ncbi.nlm.nih.gov/projects/genome/guide/horse/). The sequence of the horse is also important to help veterinarians study new therapies for horse laminitis and respiratory diseases. In recent years, there has been progress in the identification of mutations in genes related to morphology, immunology, and metabolism in the horse .
The detailed sequencing description for the above mentioned domestic animals is listed in Table 1.
2. Next Generation Sequencing Technologies
Next generation sequencing technologies (NGS), using modern methods/platforms to produce significant numbers of sequence fragments, have revolutionized research in genetic and biomedical fields and have become increasingly popular in recent years. Several massively parallel platforms are in widespread use by sequencing centers or laboratories at present. These include the Illumina (former Solexa) Genome Analyzer, HiSeq (http://www.illumina.com), Roche/454 FLX (http://www.454.com), and the Applied Biosystems SOLiD™ System (http://www.appliedbiosystems.com). These platforms can generate millions to billions of reads in a single run with the read length in the range of 50 to 500 bp. The difference between these technologies is embodied in many parameters such as clonal amplification method, instrument used, sequencing enzyme/method used, and read length generated. Since the number of reads produced and sequencing speed differ among technologies, the generation rate is also different among these technologies. Current Illumina HiSeq technology can generate 150 to 200 Gb data for paired-end 100 bp read length in 8 days. The base call accuracy also varies between these platforms (http://kevin-gattaca.blogspot.com/2010/04/comparing-ngs-platforms-454-solexa.html).
Several cutting-edge biological applications such as targeted exome capture or exome sequencing, Chromatin Immunoprecipitation sequencing (ChIP-Seq), and whole transcriptome shotgun sequencing technology or RNA-Seq have been developed to fulfill different biological purposes. Exome-sequencing  overcomes the drawback of the high cost of sequencing the whole genome by excluding intronic regions and selectively sequencing the exonic regions that might be of more immediate interest. ChIP-Seq  is used to identify genome-wide binding patterns of a protein of interest such as a transcription factor and is a powerful approach to study protein-DNA/RNA interactions. RNA-Seq [16, 17] or transcriptome-wide sequencing is used to exploit NGS technologies to sequence cDNAs from RNA samples.
To reveal variations among different strains or large populations of related samples, one of the above NGS techniques can be employed because of its advantages, such as a high efficiency to cost ratio (according to the National Human Genome Research Institute (NHGRI) (http://genome.gov/sequencingcosts)). The cost per megabase of DNA sequencing was under 50 cents and cost per genome was estimated at $11,000 in March 2011. Sequence mutation and structure variations are commonly searched in the targeted sequencing (exome or whole genes). Popular SNP detection tools are SNVMix , SAMtools , and GATK application package [20, 21]. Structure variation (copy number variation) detection tools/methods, such as CNV-seq , SLOPE , SVDetect , and associated statistical methods have been developed in recent years to identify INDELs, tandem duplications, and other genetic variations.
RNA-Seq technology is being used as a popular method for quantitative gene expression studies . However, accurate gene expression estimation requires accurate genome annotation . By utilizing complete or nearly completely annotated reference genomes, RNA-Seq can assist researchers to identify differentially expressed genes and novel transcripts for agricultural animals in a quantitative and efficient way. The power of RNA-Seq is not only in helping agricultural researchers to select differentially expressed genes between samples under different treatment condition(s) that could be crucial for certain traits or disease resistance, but it can also reveal multiple isoforms that template assembly does not possess in its annotation. There are several popular differential expression testing tools for RNA-Seq data, such as edgeR  and DEGSeq . Powerful splice junction sites identification tools are represented by Cufflinks /TopHat  and Supersplat . RNA-Seq technology can also assist researchers in annotating transcription of the genome in a complete manner at different developmental stages .
A collection of current popular NGS tools/algorithms and their description in fulfilling the goals for different biological applications is listed in Table 2.
3. Challenges and Perspectives for Livestock Sequencing Research
From raw draft assembly to full length cDNA/EST resources and BAC libraries, livestock species have undergone significant annotations in recent years. The consequence of sequencing agricultural animals has expanded far beyond the original goals of serving as a model for studying human health issues and physiological phenomena, to increasing our understanding of the human genome, and to studying traits of economic and biological interest to raising livestock production. We are now at the beginning of an era where genome sequencing analysis of livestock will allow study of domestication, selection of better breeds (e.g. high fertility) and understanding of quantitative differences due to environmental factors (e.g. nutrition). Gene-gene and gene-environment interactions related to environmental conditions could be studied quantitatively using modern bioinformatics tools. It can clearly be seen that sequencing individual animal genomes or interesting regions under different treatment conditions will benefit the agricultural community by providing guidance for experimental design and animal disease control and prevention. Livestock animals serve as a major meat/egg/dairy (protein) source for human beings. The need to reduce the use of chemicals/antibiotics and improve genetic resistance to pathogens is becoming increasingly important to human beings and agricultural scientists . These new goals are too time consuming and/or costly to be achieved using traditional genetic approaches. NGS technologies will enable a breakthrough in genetics studies by shortening the sequencing time and decreasing the cost. NGS technologies will reveal more genetic diversity for many commercial breeds with short turnaround time. For example, NGS can help to sequence mutant lines in a much more efficient manner. By identifying genes/proteins with desirable traits (disease resistance and/or high milk/egg/meat production), researchers could better control selection, and this will in turn improve both productivity and animal welfare. Sequencing individual agricultural animals will increase opportunities for resisting animal pathogens that can challenge meat/egg/dairy production. Since domestic animals are the leading source of animal protein for human beings, the sequencing research will provide valuable information for efficient production of a leaner, healthier and more economical source of animal protein for human consumption.
The breeding of farm animals is entering the post-genome era . Despite some deficiencies of NGS, e.g. poor coverage of GC rich areas and the challenges in the assembly when a good reference genome is not available, NGS technologies (RNA-Seq, Chip-Seq, and Genome-resequencing) are still able to help animal scientists study individual genomes at a pace far quicker than previously could be achieved. We believe that sequencing individual animals treated with different conditions shows great promise. Sequencing micro-organisms and parasites in agricultural animals' organs can also help veterinarians develop new vaccines and therapeutics . NGS will also facilitate the study of gene expression and regulatory mechanisms of milk production and egg/meat flavor in animals. By utilizing NGS approaches/tools, researchers can identify and further analyze individual genes controlling/affecting economic traits in agricultural animals, which will eventually benefit the consumers.
- This is a list of abbreviations used in the text:
NGS: Next Generation Sequencing
National Center for Biotechnology Information
Single Nucleotide Polymorphism
Biotechnology and Biological Sciences Research Council
Expressed Sequence Tag
Complementary Deoxyribonucleic Acid
Quantitative Trait Loci
Bacteria Artificial Chromosomes
University of Maryland
Whole Genome Shotgun.
Burt DW: Chicken genome: Current status and future opportunities. Genome Res. 2005, 15 (12): 1692-1698. 10.1101/gr.4141805.
Hillier LW, et al: Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature. 2004, 432 (7018): 695-716. 10.1038/nature03154.
Wong GK, et al: A genetic variation map for chicken with 2.8 million single-nucleotide polymorphisms. Nature. 2004, 432 (7018): 717-722. 10.1038/nature03156.
Wang J, He X, Ruan J, Dai M, Chen J, Zhang Y, Hu Y, Ye C, Li S, Cong L, Fang L, Liu B, Burt DW, Wong GK, Yu J, Yang H: Chickvd: A sequence variation database for the chicken genome. Nucleic Acids Res. 2005, 33 (Database): D438-441.
Groenen MA, Megens HJ, Zare Y, Warren WC, Hillier LW, Crooijmans RP, Vereijken A, Okimoto R, Muir WM, Cheng HH: The development and characterization of a 60 k snp chip for chicken. BMC Genomics. 2011, 12 (1): 274-10.1186/1471-2164-12-274.
Boardman PE, Sanz-Ezquerro J, Overton IM, Burt DW, Bosch E, Fong WT, Tickle C, Brown WR, Wilson SA, Hubbard SJ: A comprehensive collection of chicken cdnas. Curr Biol. 2002, 12 (22): 1965-1969. 10.1016/S0960-9822(02)01296-4.
Hubbard SJ, Grafham DV, Beattie KJ, Overton IM, Mclaren SR, Croning MD, Boardman PE, Bonfield JK, Burnside J, Davies RM, Farrell ER, Francis MD, Griffiths-Jones S, Humphray SJ, Hyland C, Scott CE, Tang H, Taylor RG, Tickle C, Brown WR, Birney E, Rogers J, Wilson SA: Transcriptome analysis for the chicken based on 19,626 finished cdna sequences and 485,337 expressed sequence tags. Genome Res. 2005, 15 (1): 174-183. 10.1101/gr.3011405.
Marra MA, Kucaba TA, Dietrich NL, Green ED, Brownstein B, Wilson RK, Mcdonald KM, Hillier LW, Mcpherson JD, Waterston RH: High throughput fingerprint analysis of large-insert clones. Genome Res. 1997, 7 (11): 1072-1084.
Elsik CG, et al: The genome sequence of taurine cattle: A window to ruminant biology and evolution. Science. 2009, 324 (5926): 522-528.
Havlak P, Chen R, Durbin KJ, Egan A, Ren Y, Song XZ, Weinstock GM, Gibbs RA: The atlas genome assembly system. Genome Res. 2004, 14 (4): 721-732. 10.1101/gr.2264004.
Zimin AV, Delcher AL, Florea L, Kelley DR, Schatz MC, Puiu D, Hanrahan F, Pertea G, van Tassell CP, Sonstegard TS, Marcais G, Roberts M, Subramanian P, Yorke JA, Salzberg SL: A whole-genome assembly of the domestic cow, bos taurus. Genome Biol. 2009, 10 (4): R42-10.1186/gb-2009-10-4-r42.
Archibald AL, Cockett NE, Dalrymple BP, Faraut T, Kijas JW, Maddox JF, Mcewan JC, Hutton Oddy V, Raadsma HW, Wade C, Wang J, Wang W, Xun X: The sheep genome reference sequence: A work in progress. Anim Genet. 2010, 41 (5): 449-53.
Wade CM, et al: Genome sequence, comparative analysis, and population genetics of the domestic horse. Science. 2009, 326 (5954): 865-867. 10.1126/science.1178158.
Mamanova L, Coffey AJ, Scott CE, Kozarewa I, Turner EH, Kumar A, Howard E, Shendure J, Turner DJ: Target-enrichment strategies for next-generation sequencing. Nat Methods. 2010, 7 (2): 111-118. 10.1038/nmeth.1419.
Johnson DS, Mortazavi A, Myers RM, Wold B: Genome-wide mapping of in vivo protein-DNA interactions. Science. 2007, 316 (5830): 1497-1502. 10.1126/science.1141319.
Mortazavi A, Williams BA, Mccue K, Schaeffer L, Wold B: Mapping and quantifying mammalian transcriptomes by rna-seq. Nat Methods. 2008, 5 (7): 621-628. 10.1038/nmeth.1226.
Wang Z, Gerstein M, Snyder M: Rna-seq: A revolutionary tool for transcriptomics. Nat Rev Genet. 2009, 10 (1): 57-63. 10.1038/nrg2484.
Goya R, Sun MG, Morin RD, Leung G, Ha G, Wiegand KC, Senz J, Crisan A, Marra MA, Hirst M, Huntsman D, Murphy KP, Aparicio S, Shah SP: Snvmix: Predicting single nucleotide variants from next-generation sequencing of tumors. Bioinformatics. 2010, 26 (6): 730-736. 10.1093/bioinformatics/btq040.
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R: The sequence alignment/map format and samtools. Bioinformatics. 2009, 25 (16): 2078-2079. 10.1093/bioinformatics/btp352.
Mckenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, Depristo MA: The genome analysis toolkit: A mapreduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010, 20 (9): 1297-1303. 10.1101/gr.107524.110.
Depristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, Philippakis AA, Del Angel G, Rivas MA, Hanna M, Mckenna A, Fennell TJ, Kernytsky AM, Sivachenko AY, Cibulskis K, Gabriel SB, Altshuler D, Daly MJ: A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet. 2011, 43 (5): 491-498. 10.1038/ng.806.
Xie C, Tammi MT: Cnv-seq, a new method to detect copy number variation using high-throughput sequencing. BMC Bioinformatics. 2009, 10: 80-10.1186/1471-2105-10-80.
Abel HJ, Duncavage EJ, Becker N, Armstrong JR, Magrini VJ, Pfeifer JD: Slope: A quick and accurate method for locating non-snp structural variation from targeted next-generation sequence data. Bioinformatics. 2010, 26 (21): 2684-2688. 10.1093/bioinformatics/btq528.
Zeitouni B, Boeva V, Janoueix-Lerosey I, Loeillet S, Legoix-Ne P, Nicolas A, Delattre O, Barillot E: Svdetect: A tool to identify genomic structural variations from paired-end and mate-pair sequencing data. Bioinformatics. 2010, 26 (15): 1895-1896. 10.1093/bioinformatics/btq293.
Blow N: Transcriptomics: The digital generation. Nature. 2009, 458 (7235): 239-242. 10.1038/458239a.
Roberts A, Pimentel H, Trapnell C, Pachter L: Identification of novel transcripts in annotated genomes using rna-seq. Bioinformatics. 2011, 27 (17): 2325-2329. 10.1093/bioinformatics/btr355.
Robinson MD, Mccarthy DJ, Smyth GK: Edger: A bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010, 26 (1): 139-140. 10.1093/bioinformatics/btp616.
Wang L, Feng Z, Wang X, Zhang X: Degseq: An r package for identifying differentially expressed genes from rna-seq data. Bioinformatics. 2010, 26 (1): 136-138. 10.1093/bioinformatics/btp612.
Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg SL, Wold BJ, Pachter L: Transcript assembly and quantification by rna-seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol. 2010, 28 (5): 511-515. 10.1038/nbt.1621.
Trapnell C, Pachter L, Salzberg SL: Tophat: Discovering splice junctions with rna-seq. Bioinformatics. 2009, 25 (9): 1105-1111. 10.1093/bioinformatics/btp120.
Bryant DW, Shen R, Priest HD, Wong WK, Mockler TC: Supersplat--spliced rna-seq alignment. Bioinformatics. 2010, 26 (12): 1500-1505. 10.1093/bioinformatics/btq206.
Hiendleder S, Bauersachs S, Boulesteix A, Blum H, Arnold GJ, Frohlich T, Wolf E: Functional genomics: Tools for improving farm animal health and welfare. Rev Sci Tech. 2005, 24 (1): 355-377.
We thank Dr. Dan Schmiesing, who gave valuable suggestions on the manuscript. This work was supported by the National Institutes of Health Grant #U54 DA021519.
The authors declare that they have no competing interests.
YB carried out the review studies and drafted the manuscript. MS and JC participated in drafting the manuscript. All authors read and approved the final manuscript.