Candidate genes for male and female reproductive traits in Canchim beef cattle

Background Beef cattle breeding programs in Brazil have placed greater emphasis on the genomic study of reproductive traits of males and females due to their economic importance. In this study, genome-wide associations were assessed for scrotal circumference at 210 d of age, scrotal circumference at 420 d of age, age at first calving, and age at second calving, in Canchim beef cattle. Data quality control was conducted resulting in 672,778 SNPs and 392 animals. Results Associated SNPs were observed for scrotal circumference at 420 d of age (435 SNPs), followed by scrotal circumference at 210 d of age (12 SNPs), age at first calving (six SNPs), and age at second calving (four SNPs). We investigated whether significant SNPs were within genic or surrounding regions. Biological processes of genes were associated with immune system, multicellular organismal process, response to stimulus, apoptotic process, cellular component organization or biogenesis, biological adhesion, and reproduction. Conclusions Few associations were observed for scrotal circumference at 210 d of age, age at first calving, and age at second calving, reinforcing their polygenic inheritance and the complexity of understanding the genetic architecture of reproductive traits. Finding many associations for scrotal circumference at 420 d of age in various regions of the Canchim genome also reveals the difficulty of targeting specific candidate genes that could act on fertility; nonetheless, the high linkage disequilibrium between loci herein estimated could aid to overcome this issue. Therefore, all relevant information about genomic regions influencing reproductive traits may contribute to target candidate genes for further investigation of causal mutations and aid in future genomic studies in Canchim cattle to improve the breeding program. Electronic supplementary material The online version of this article (doi:10.1186/s40104-017-0199-8) contains supplementary material, which is available to authorized users.


Background
Cattle breeding programs in Brazil have given greater emphasis to the study and selection of reproductive traits due to their economic importance for the production system. In males, scrotal circumference traits are related to the reproductive potential of bulls, because testis size is associated with the production and quality of sperm and the production of sex hormones [1].
In general, female reproductive traits are difficult to measure and, in some cases, are strongly influenced by environmental factors. The reproductive performance of heifers depends on the age at which they calve for the first time; the ones that calve earlier have a more productive life [2]. In addition to first calving, another important factor is that the cow continues producing calves regularly to maintain its productivity and diminish calving interval [3]. Studies have reported that indirect selection of females based on the performance of bulls is possible, considering the favorable genetic correlations between scrotal circumference measures and age at calving [2,4,5].
Many achievements in animal breeding were obtained based on the classical approach, using information from phenotypes and pedigree. Nowadays, molecular data analyses are bringing new insights to the genetic architecture of species. Regarding reproductive traits of beef cattle, genome-wide association studies (GWAS) are useful tools for the identification of candidate genes that could be used to classify precocious or more fertile individuals [6,7].
The identification of candidate genes provides a better understanding of the distribution of genes that affect traits of economic interest, as well as a basis for further studies to identify causal mutations. Despite its potential, important observations are needed for this approach. According to Tabor et al. [8], the difficulty of replication over time or across populations in this approach might indicate that more detailed studies are needed to certify its causality. Therefore, the aim of this study was to perform a GWAS to identify genomic regions and candidate genes to uncover the genetic architecture of scrotal circumference at 210 and 420 d of age and age at first and second calving and aid in the breeding process in Canchim cattle.

Canchim breed
The Canchim is a composite breed (62.5% Charolais and 37.5% Zebu) developed in the 1940s by the Brazilian Agricultural Research Corporation (Embrapa), located in São Carlos city, SP, Brazil [9]. Different crossbreeding schemes have been studied to produce Canchim cattle with different Charolais-Zebu proportions and to achieve greater genetic variability in the population [10][11][12][13]. One of these schemes was used to produce animals of the "MA" genetic group, which was derived from mating Canchim-Zebu animals (F1) with Charolais animals, resulting in approximate contributions of 65.6% Charolais and 34.4% Zebu [14].
The Canchim cattle represents about 3% of the beef cattle produced in Brazil [15]. Indicine breeds are much more representative as purebreds or crossbred animals, being responsible for 80% of the beef cattle industry in the country [16].

Traits analyzed
The estimated breeding values (EBVs) used in our study considered the following traits: scrotal circumference at weaning adjusted for 210 d of age (SC210), scrotal circumference adjusted for 420 d of age (SC420), age at first calving (AFC), and age at second calving (ASC). Linear interpolation was previously used to adjust the scrotal circumferences for 210 and 420 d of age.
The EBVs were obtained from the genetic evaluation carried out by the Brazilian Agricultural Research Corporation (Embrapa) for the Canchim breed which considered 318,307 animals in the relationship matrix and 267,002 animals with valid records. In general, the traits analyzed by Embrapa for the genetic evaluation of Canchim that could be highlighted are body weight traits measured at various ages, reproductive traits of males and females, carcass quality, navel visual score of males and females, and hair coat. The EBVs were estimated by multi-trait analysis using the REMLF90 program [17], under an animal model. For the studied traits, the fixed effects considered in the contemporary groups were year and season of birth (January to March; April to September; and October to December), farm of birth, genetic group, and feeding system.
As described by Mokry et al. [18], the genotyped animals were chosen according to the EBV for some traits (ribeye area, back fat thickness, and productive and reproductive traits), accuracy, family size, and proportion of males and females. The mean EBV values for SC210, SC420, AFC, and ASC were 1. 44  De-regressed proofs were not used due to limited data. Furthermore, de-regressed proofs calculated with low accuracies are expected to have a smaller genomic contribution due to Mendelian sampling and will have more noise added in de-regressed calculations [19].

Genotypes and quality control
The BovineHD BeadChip SNP panel (Illumina Inc., San Diego, CA) was used to genotype 194 males and 205 females: 285 animals of the Canchim breed and 114 animals of the MA genetic group, calves of 49 bulls and 355 cows. The animals were born between 1999 and 2005 and come from seven farms in the States of São Paulo and Goiás. A detailed description of these animals was previously presented by Buzanskas et al. [20] and Mokry et al. [18]. Genotype quality control excluded SNPs with significant deviations (P < 10 -5 ) from the Hardy-Weinberg equilibrium, SNPs with minor allele frequencies of less than 0.05, and call rate for SNPs and animals lower than 0.90. Only autosomal chromosomes and SNPs with known positions, according to the UMD_3.1 bovine genome assembly [21], were used for GWAS.

Genome-wide association study
The Generalized Quasi-Likelihood Score (GQLS) method [22] was used for GWAS. In this method, a logistic regression was used to associate the EBVs (treated as a covariate) to the genotypes (treated as the response variable), one SNP at a time. The EBVs were represented by X i = (X 1 , … , X n ) ' , in which Xi represents the EBVs for the ith animal. The genotype of the animals was coded as "0", "1" or "2" and, as the genotypes were represented by Y i = (Y 1 , … , Y n ) ' , in which Y i = ½ * (number of alleles for animal genotype i), the respective proportions would be equal to 0, ½ and 1. The expected SNP allele frequency is represented by μ, in which μ = (μ 1 , … , μ n ) ' = E(X| Y), thus 0 < μ i < 1.
To associate μ i with X i , the following logistic regression was defined: in which β 0 is the constant term and β 1 is the slope.
Under the null hypothesis, μ i can be interpreted by 1þe β 0 , for all i = 1 , … , n. The mean vector of Y i no longer depends on X i , which becomes μ i = E(Y) = μ1, and "1" is a vector. The solution of the "quasi-likelihood score" results in an estimate for μ, such that To obtain the GQLS, the statistic WG was calculated as in Feng et al. [22], as follows: Under the null hypothesis, WG follows a Chi-squared distribution with one degree of freedom [23], resulting in P-values for each SNP. This method does not account for the genetic variance explained by the marker. The GQLS provides the advantage of accounting for the population structure by means of the pedigree-based relationship among animals (A − 1 ).
The false discovery rate (FDR) was used for multiple testing corrections [24] to verify significant SNPs. The P-values of each SNP were sorted in ascending order and the following formula applied: in which q is the desired level of significance, m is the total number of SNPs, and P is the P-value of the ith SNP. To account for multiple comparisons, a genomewide and chromosome-wise FDR of 5% (significant association) and 10% (suggestive association) was applied. The main reason for considering a minimum FDR of 10% was to obtain a comprehensive number of SNPs, which might assist to comprehend the genetic architecture of the studied traits. Pairwise linkage disequilibrium was estimated using the r 2 measure [25] for the SNPs with significant and suggestive associations. The software Haploview [26] was used to estimate r 2 and plot the results. Due to the high linkage phase consistency between Canchim and MA genetic group (above 0.80 up to 100 kb) [27], analyses were conducted considering Canchim and MA as one population.

Gene mapping and in silico functional analyses
The SNPs associated with the EBVs for SC210, SC420, AFC, and ASC traits were surveyed to their corresponding genes or to surrounding genes. The National Center for Biotechnology Information (NCBI) database [28] and Ensembl Genome Browser [29] were accessed to proceed with in silico functional analyses of the genes. If the SNP was located in the intergenic region (i.e. not assigned to any gene), we observed, through the integrated maps of the NCBI variation viewer, for the closest gene(s) and calculated the distance(s).
The PANTHER tool [30] was used to access and gather biological processes and pathway associations. The AnimalQTLdb database [31] was used to verify previous reports of quantitative trait loci (QTL) in the surroundings of significant SNPs. The main reason to use these tools was to validate our findings.

Results
A total of 672,778 SNPs and 392 animals were used for GWAS. The Manhattan plots for chromosome-wise (in blue) and genome-wide (in red) significantly associated SNPs after FDR correction for SC210, SC420, AFC, and ASC, respectively, are presented in Fig. 1. The greatest number of significant SNPs was observed for SC420 (435 SNPs), followed by SC210 (12 SNPs), AFC (six SNPs), and ASC (four SNPs) when considering FDR 10% for all traits.
In Table 1 are presented SNPs and genes identified for SC210, AFC, and ASC. For SC420, SNPs and genes are presented in Table 2. Due to the large number of SNPs associated with SC420, full information regarding genes, pseudogenes, and non-coding RNA are presented in Additional files 1, 2, and 3. For FDR 5%, a total of 249 (chromosome-wise) and 50 (genome-wide) SNPs were observed for SC420 (Additional file 1). Considering FDR 10%, the regions observed in Table 1 presented a suggestive association; therefore, the genes identified in these regions were surveyed.
For SC210, chromosome-wise significantly associated SNPs (P ≤ 0.00001) were located on chromosomes 20 and 28. Five SNPs were located in the intergenic or intragenic regions of SMIM23, PAPD7, ICE1, and EDARADD genes, and non-coding RNA LOC101907249 (Table 1 and Fig. 1a). On chromosome 28, seven SNPs were located in the EDARADD gene.
For SC420, chromosome-wise and genome-wise associations were observed on chromosomes 5, 9, 13, 14, 18, and 21 (Fig. 1b). The SNPs were close or within a total of 64 genes, 10 non-coding RNAs, 13 pseudogenes, and two transfer RNAs. Full information on SNPs, position, and genes are presented in the Additional files 1, 2, and 3.
Six significantly associated SNPs (P ≤ 0.00001) for AFC (Table 1 and Fig.1c) were located on chromosomes   Fig. 2. For SC420, biological processes and pathway associations are presented in Figs. 3 and 4, respectively. PANTHER tool was able to account for 62 genes overrepresented among pathways when using Bos taurus as background. Most of the biological processes were involved in metabolic (32 genes), cellular (22 genes), developmental (14 genes), biological regulation (13 genes), and localization (11 genes) processes. Biological processes were observed for the immune system (seven genes), multicellular organismal process (seven genes), response to stimulus (seven genes), apoptotic process (six genes), cellular component organization or biogenesis (six genes), biological adhesion (four genes), and reproduction (one gene). Pathway analysis showed that the RAP1B gene is involved in four pathways, while the IFNG, COL12A1, and SRGAP1 genes are involved in two pathways.

Discussion
The suggestive associations observed for SC210, AFC, and ASC, could provide an insight over potential regions responsible for the genetic variability of the traits. The main reason to consider FDR 10% was to study and describe, in a broader point of view, the biological   Fig. 2 Biological processes for scrotal circumference (SC210) and age at first (AFC) and second (ASC) calving pathways to aid in the comprehension of these complex traits. Furthermore, due to the polygenic inheritance of these traits, it was expected that many loci with small effects were responsible for expressing the phenotype (or EBV). In some cases, we observed the lack of information regarding the SNP or gene identified.
We observed high average linkage disequilibrium among significantly or suggestively associated SNPs for most of the traits studied; therefore, there is evidence of direct or indirect associations that could be affecting the traits. Mokry et al. [27], when studying this Canchim population, observed that the linkage disequilibrium estimated could  For SC210, we have found that the SMIM23 gene is located inside QTL regions associated with the rate of nonreturn to estrus in Swedish Red and Swedish Holstein breed animals [32]. In this same region, McClure et al. [33] found QTL for scrotal circumference in American Angus cattle. Studies reported a QTL in the PAPD7, ICE1, and LOC101907249 gene regions associated with percentage abnormal sperm in Holstein bulls [34] and calving ease in Angus breed cows [33]. The EDARADD gene participates in cell differentiation and mutations in this gene can result in abnormal development of tissues and organs of ectodermal origin [35]. In this region, a QTL associated with pregnancy rate trait have been reported by Boichard et al. [36]. No specific biological process for SMIM23, ICE1, and LOC101907249 genes was observed in the literature.
Few suggestive associations were observed for SC210 and none of the SNPs associated with SC210 presented pleiotropic effect with SNP associated with SC420 (discussed below) or other studied traits (discussed as follows), which could be due to low genetic correlation among EBVs for these traits in our dataset.
Regarding the SC420 trait, we have identified the STAU2 gene, which participates in multiple biological processes (Table 2 and Fig. 3), highlighting reproduction, developmental and immune system. This gene was involved in muscle development [37] in Bos taurus indicus, Bos taurus taurus, and Bos taurus taurus x Bos taurus indicus. According to Ramayo-Caldas et al. [37], the STAU2 gene was present in two main networks of PPARGC1A and HNF4G genes, which acts as transcription factors that activate a variety of hormone receptors [38] and binding to fatty acids [39]. Peddinti et al. [40] observed, through genetic network analysis, that one of the main biological networks for bovine germinal vesicle and bovine cumulus granulosa cell proteomes contain the SRGAP1 and TOP1 genes, respectively.
The RAP1B gene is involved in four pathways (Fig. 4), including gonadotropin-releasing hormone receptor pathway. This gene was also observed associated with cell-tocell signaling network (integrin, ephrin receptor, and mitogen-activated protein kinase signaling network) [40]. The MED30 and TRHR genes are strongly related to thyroid hormone, triggering hormonal processes associated with reproductive systems in males and females [41,42].
Genes related to fatty acid, cholesterol, and triacylglycerol, such as FABP5 [43], FABP12 [44], PEX2 [45], and MED30 [46] are critical for energy and hormone production in males and females. Cholesterol is a precursor of steroid hormone production, such as testosterone, and therefore it is involved in male growth and reproductive development.
Important genome-wide associations on chromosome 14 for scrotal circumference were observed by Fortes et al. [5], demonstrating that this chromosome presents regions of interest that could be explored to identify sexually precocious animals. These authors observed associations with age at first corpus luteum in the same regions as scrotal circumference and have attributed their results to the genetic correlation between these traits, thus genes and SNPs associated with puberty in heifers were likely to be relevant for puberty in bulls, and vice versa. On chromosome 14, a large number of SNPs associated with puberty were identified in both bulls and heifers. Urbinati et al. [47] found important selection signatures in Canchim cattle on chromosomes 5 and 14, which were related to pigmentation (strongly selected trait in Charolais and Canchim), productive and reproductive traits.
Reports of QTL associated with reproductive traits of interest may give support to the results found in our study; however, most of the QTL were reported for female traits. On chromosome 5, QTL associated with the concentration of follicle-stimulating hormone in Brahman and Hereford crosses [48], have been described. The QTL were observed as associated with interval to first estrus after calving [7] and dystocia in dairy cattle [49] on chromosome 13. On chromosome 14, QTL associated with gestation period [50], number of inseminations per conception [51], and ovulation rate [52] have been reported.
As favorably correlated responses between scrotal circumference measures and age at first calving traits are expected through selection, whereas the genes previously reported could be highlighted as a candidate to, directly and indirectly, improve the reproductive performance of males and females, respectively. Moreover, fat deposition in cattle could be reflected in sexual precocity and carcass finishing, traits that have become the main concern of beef cattle breeders.
Few genes were observed across the positions of significant SNPs for AFC (Table 1 and Fig. 1c). According to Blaschek et al. [53], an association of the SNP rs110984522 (178.73 kb apart from the SNP rs133411648) was observed for sire fertility in Holstein breed. The EXOC4 gene plays a role in insulin processing, metabolism of proteins, and peptide hormone metabolism pathways. Reports of QTL associated with scrotal circumference [33] have been described in this region. The ZMAT4 gene, located on chromosome 27, participates in the apoptotic, biological, developmental, and metabolic processes. QTL associated with dystocia [49] and calving ease [54] in Holstein cattle and non-return rate [55] in Angus cattle have been described.
Significant SNPs for ASC are presented in Table 1 and Fig. 1d. The FMN1 gene located on chromosome 10 participates in actin cytoskeleton organization and is of great importance for cell and muscle movements [56]. QTL associated with calving ease have been observed in this region [33].
A QTL associated with scrotal circumference was observed in the region of the TMEM182 gene [33]. No function or biological processes have been described for the LOC790871 and TMEM182 genes in the literature or databases consulted. In the region comprising the LOC790871 gene, QTL regions have been reported associated with the following traits: scrotal circumference [33], subcutaneous fat [57], and sperm motility [34] in Angus, Holstein, and Charolais × Holstein crossbred cattle, respectively.
It has been verified that the UBQLN3 gene is expressed in the testes, acts in spermatogenesis in humans and rats [58] and is conserved in mammals (Homo sapiens, Mus musculus, Rattus norvegicus, Canis lupus familiaris, and Bos taurus). This gene is involved in the protein processing in endoplasmic reticulum pathway. A QTL associated with weight and carcass traits has been described [33] in the region in which the UBQLN3 gene is located.

Conclusions
Few associations were observed for SC210, AFC, and ASC, reinforcing their polygenic inheritance and the complexity of understanding the genetic architecture of reproductive traits. Finding many associations for SC420 in various regions of the Canchim genome also reveals the difficulty of targeting specific candidate genes that could act on fertility; nonetheless, the high linkage disequilibrium between loci herein estimated could aid to overcome this issue. Therefore, all relevant information about genomic regions influencing reproductive traits may contribute to target candidate genes for further investigation of causal mutations and aid in future genomic studies in Canchim cattle to improve the breeding program.

Additional files
Additional file 1: Significantly associated single nucleotide polymorphism (SNP) with scrotal circumference at 420 days of age (SC420) after Bonferroni (B) and false discovery rate (FDR) correction at a chromosome-wise (CW) and genome-wide (GW) level. (DOCX 95 kb) Additional file 2: Genome-wide (in bold) and chromosome-wise associations for scrotal circumference at 420 days of age. Gene symbols, SNP reference number, and chromosomes (Chr) and positions (Pos, in megabase) were obtained from NCBI website. Distances to gene (kilobase) are presented from 5′ to gene and 3′ to gene directions. If distance equals zero (0.00), the SNP is on intragenic region. P-values are presented as the minimum (Min) and maximum (Max) significance obtained from the generalized quasilikelihood method. (DOCX 34 kb) Additional file 3: Candidate genes identified for scrotal circumference at 420 days of age. (DOCX 26 kb) Additional file 4: Fig. S1. Linkage disequilibrium for scrotal circumference at 420 d of age on chromosome 5. Fig. S2. Linkage disequilibrium for scrotal circumference at 420 d of age on chromosome 9. Fig. S3. Linkage disequilibrium for scrotal circumference at 420 d of age on chromosome 13. Fig. S4. Linkage disequilibrium for scrotal circumference at 420 d of age on chromosome 14. Fig. S5. Linkage disequilibrium for scrotal circumference at 420 d of age on chromosome 18. Fig. S6. Linkage disequilibrium for scrotal circumference at 420 d of age on chromosome 21. (ZIP 1985 kb) Abbreviations AFC: Age at first calving; ASC: Age at second calving; BTA: Bos taurus chromosome; EBV: Estimated breeding value; FDR: False discovery rate; GQLS: Generalized Quasi-Likelihood Score; GWAS: Genome-wide association study; QC: Quality control; QTL: Quantitative trait loci; r 2 : Linkage disequilibrium measure; SC210: Scrotal circumference at 210 days of age; SC420: Scrotal circumference at 420 days of age; SNP: Single nucleotide polymorphism