Skip to main content

The impact of genotyping strategies and statistical models on accuracy of genomic prediction for survival in pigs

Abstract

Background

Survival from birth to slaughter is an important economic trait in commercial pig productions. Increasing survival can improve both economic efficiency and animal welfare. The aim of this study is to explore the impact of genotyping strategies and statistical models on the accuracy of genomic prediction for survival in pigs during the total growing period from birth to slaughter. 

Results

We simulated pig populations with different direct and maternal heritabilities and used a linear mixed model, a logit model, and a probit model to predict genomic breeding values of pig survival based on data of individual survival records with binary outcomes (0, 1). The results show that in the case of only alive animals having genotype data, unbiased genomic predictions can be achieved when using variances estimated from pedigree-based model. Models using genomic information achieved up to 59.2% higher accuracy of estimated breeding value compared to pedigree-based model, dependent on genotyping scenarios. The scenario of genotyping all individuals, both dead and alive individuals, obtained the highest accuracy. When an equal number of individuals (80%) were genotyped, random sample of individuals with genotypes achieved higher accuracy than only alive individuals with genotypes. The linear model, logit model and probit model achieved similar accuracy.

Conclusions

Our conclusion is that genomic prediction of pig survival is feasible in the situation that only alive pigs have genotypes, but genomic information of dead individuals can increase accuracy of genomic prediction by 2.06% to 6.04%.

Background

Survival from birth to slaughter is an important economic trait in commercial pig productions. Increased survival also improves the welfare in pigs. According to productivity data, the cumulative survival rate from birth to slaughter is lower than 70% [1], and in addition there has been a downward trend for piglet pre-weaning survival in the past ten years [2]. Use of genomic information in the selection program will be a sustainable and effective way to reduce pig mortality. As a powerful genetic improvement tool, genomic selection has been widely used in animal breeding, such as in cattle [3,4,5], pig [6,7,8], and chicken [9,10,11]. Genomic selection is especially beneficial for the traits with low heritability that have slow genetic progress when using traditional pedigree-based methods [12,13,14]. Guo et al. [15] studied the accuracy of estimated breeding values for piglet survival rate from birth to day 5 and reported that the accuracy for the single-step method was higher than for pedigree-based method by 14.2% for Landrace, and by 7.2% for Yorkshire. In a crossbred pig population, Leite et al. [16] compared the accuracies of the estimated breeding values of mortality at five stages from birth to slaughter, and reported that the accuracy for the single-step method was 16.7%–78.9% higher than for pedigree-based method, with the largest improvement of accuracy for lactation mortality and smallest improvement for postweaning mortality.

Usually, like litter size, piglet survival is recorded as a trait of the sow or the service sire [15, 16]. However, survival is a complex trait that is also affected by the pig’s own genotype. It may therefore be more appropriate to assess genetic merit of survival at individual level [17]. However, evaluating survival at individual level will introduce problems with genotyping strategies in the sense that, generally, dead individuals do not have genotypes. Using only the genotype data of alive individuals may lead to biased genomic predictions. The influence of the genotype of the dead individuals on the accuracy and unbiasedness of genomic prediction needs to be studied.

Finally, survival at individual level is a binary trait which does not obey a normal distribution, and thus conventional statistical analysis methods may not be suitable [18]. Therefore, when estimating the breeding value, a logit model or a liability threshold model could be more appropriate. However, Koeck et al. [19] evaluated the performance of a linear model and a logit model for genetic analyses of clinical mastitis in Austrian Fleckvieh dual purpose cows and found that there was no difference in the predictive ability between the linear model and the logit model. In the Norwegian Red cows population, Vazquez et al. [20] also compared the genetic evaluation of a liability threshold model with a linear model for clinical mastitis, where the results also showed that there was no difference in the predictive capabilities of the two models. It is necessary to investigate if a logit or a liability threshold model is better than a linear model for predicting breeding value of survival in pig populations.

We hypothesized that different genotyping strategies affect accuracy and unbiasedness in the breeding value estimation. Furthermore, we hypothesized that logit or liability threshold models are more suitable for predicting threshold traits as well for genomic prediction as without genomic information. Therefore, this study has two objectives: (1) explore the impact of genotyping scenarios, especially no genotypes of dead individuals on genomic prediction of mortality; (2) assess linear versus logit and liability threshold models in estimation of breeding value.

Materials and methods

Data simulation

The data were simulated using QMSim software [21] mimicking a pig population. In this study, we simulated 18 chromosomes, each chromosome was 100 cM, had 3100 markers and 50 QTLs. It was assumed that the QTL effects had a normal distribution. The simulation started with a founder population of 200 males and 200 females, and went through 300 non-overlapping historical generations to generate linkage disequilibrium between markers and QTLs. In total, about 45,000 markers and 730 QTLs were segregating in the genome for the last historical population, with slight differences in the number of markers and QTLs of each repetition. After historical generations, 30 boars randomly selected from the last history generation and all 200 sows in the generation were used to create a base population. After this, the population went through eight non-overlapping generations. In each generation, 30 sires and 300 dams were randomly selected from alive animals (see below on how survival/death of animals was simulated), a sire mated 10 dams randomly, and each dam produced one litter. The litter sizes were 10, 12, 14, 16, or 18 with the probabilities 0.02, 0.14, 0.68, 0.14, 0.02, respectively, and sex ratio of piglets was 1:1. The data from generations 5 ~ 8 were used in the analysis.

The phenotypic liability of an individual to be alive was generated as the sum of direct additive genetic effect of the individual, maternal additive genetic effect of the dam, litter effect and random residual. Fixed effects (such as herd-year-month) were not considered. In this study, three survival traits with different variances and covariances were simulated, i.e., direct heritability and maternal heritability were set as 0.04 and 0.04 (T4/4), 0.02 and 0.04 (T2/4), or 0.02 and 0.02 (T2/2), respectively. The genetic correlation between direct and maternal additive genetic effects was 0.30. The variance of the litter effect was the same as the maternal additive genetic variance. The direct and maternal QTL allele effects were sampled from a bivariate normal distribution with the specified correlation. The true breeding values (TBVs) of direct and maternal additive genetic effect were defined as the sum of the QTL allele effects, and these TBVs were scaled to have the variances as the designed values [22]. The other random effects were sampled from normal distributions with the corresponding variance. The phenotype in observed scale was scored as 1 if the liability to survival was the top 80%, and otherwise 0, i.e., the mortality rate was 20%. Each of the three traits with different heritability was simulated with 40 replicates.

Four genotyping scenarios were studied: (1) all pigs were genotyped (G_all); (2) 80% of pigs randomly selected from the whole population were genotyped (G80_ran); (3) only alive pigs (80%) were genotyped (G_alive); (4) no pig was genotyped (G_none).

Statistical analysis

A linear, a logit and a probit model (i.e., a liability threshold model) were used for estimation of genetic parameters and breeding values. The models were as follows:

The linear model (LM) is,

$${\varvec{y}}=\boldsymbol{1}{\mu}+{{\varvec{W}}}_{{\varvec{l}}}{\varvec{l}}+{{\varvec{Z}}}_{{\varvec{a}}}{\varvec{a}}+{{\varvec{Z}}}_{{\varvec{m}}}{\varvec{m}}+{\varvec{e}}$$

where y is the vector of binary observations of pig survival with 0 and 1 representing dead and alive, respectively; µ is the overall mean; 1 is the vector of ones; l is the vector of litter effects; a is the vector of direct additive genetic effects; m is the vector of maternal additive genetic effects; and e is the vector of residual effects. The matrices Wl, Za, Zm are incidence matrixes associating l, a, m with y. In the model, direct and maternal additive genetic effects are correlated, and the other effects are independent of each other. Thus, it is assumed that l, e, a and m have the following distributions:\({\varvec{l}} \sim N\left(0,{\varvec{I}}{\sigma }_{l}^{2}\right)\)\({\varvec{e}} \sim N\left(0,\mathbf{I}{\sigma }_{e}^{2}\right)\)\(\left[\begin{array}{c}{\varvec{a}}\\ {\varvec{m}}\end{array}\right]\sim N\left(0,\left[\begin{array}{c}{\sigma }_{a}^{2} {\sigma }_{am}\\ {\sigma }_{am} {\sigma }_{m}^{2}\end{array}\right]\otimes {\varvec{K}}\right)\), where \({\sigma }_{l}^{2}\)\({\sigma }_{e}^{2}\)\({\sigma }_{a}^{2}\)\({\sigma }_{m}^{2}\) and \({\sigma }_{am}\) are litter variance, residual variance, direct additive genetic variance, maternal additive genetic variance, and covariance between direct and maternal additive genetic effects, respectively, and K is an additive genetic relationship matrix based on pedigree and/or genomic information. When using the pedigree-based method for the scenario of no genotyping, K was constructed from pedigree information [23]. When using the single-step GBLUP model (ssGBLUP), K represents the H matrix constructed from pedigree and genome information [24]. The H matrix is as follows,

$${\varvec{H}}=\left[\begin{array}{cc}{{\varvec{G}}}_{{\varvec{\omega}}}& {{\varvec{G}}}_{{\varvec{\omega}}}{{\varvec{A}}}_{11}^{-1}{{\varvec{A}}}_{12}\\ {{{\varvec{A}}}_{21}{{\varvec{A}}}_{11}^{-1}{\varvec{G}}}_{{\varvec{\omega}}}& {{{\varvec{A}}}_{21}{{\varvec{A}}}_{11}^{-1}{\varvec{G}}}_{{\varvec{\omega}}}{{\varvec{A}}}_{11}^{-1}{{\varvec{A}}}_{12}+{{\varvec{A}}}_{22}-{{{\varvec{A}}}_{21}{\varvec{A}}}_{11}^{-1}{{\varvec{A}}}_{12}\end{array}\right]$$

where A11 and A22 are the sub-matrixes of pedigree-based relationship matrix (A) for relationships between genotyped individuals and between non-genotyped individuals, respectively, A12 or A21 are the sub-matrixes for relationships between genotyped and non-genotyped individuals and \({{\varvec{G}}}_{{\varvec{\omega}}}=\left(1-\omega \right){{\varvec{G}}}^{\boldsymbol{*}}+\omega {{\varvec{A}}}_{11}\). In this study, ω is set to 0.2. G was the marker-based genomic relationship matrix [25], G* is the adjustment matrix of G, which is calculated by the following formula [8],

$${{\varvec{G}}}^{\boldsymbol{*}}={\varvec{G}}\beta +\alpha$$
$$\mathrm{Avg}.\mathrm{diag}\left({\varvec{G}}\right)\beta +\alpha =\mathrm{Avg}.\mathrm{diag}\left({{\varvec{A}}}_{11}\right)$$
$$\mathrm{Avg}.\mathrm{offdiag}\left({\varvec{G}}\right)\beta +\mathrm{\alpha }=\mathrm{Avg}.\mathrm{offdiag}\left({{\varvec{A}}}_{11}\right)$$

In the scenario where all animals are genotyped, \({\varvec{K}}\boldsymbol{ }=\boldsymbol{ }{\varvec{G}}\_{\varvec{\omega}}\).

The logit model and probit model (also called liability threshold model) are described as,

$${\varvec{\eta}}=\boldsymbol{1}{{\mu}}+{{\varvec{W}}}_{{\varvec{l}}}{\varvec{l}}+{{\varvec{Z}}}_{{\varvec{a}}}{\varvec{a}}+{{\varvec{Z}}}_{{\varvec{m}}}{\varvec{m}}$$

For the logit model (LG), η is the vector of log-odds of the expected pig survival, \({\eta }_{i}={\mathrm{log}}_{\mathrm{e}}\frac{{\upsilon }_{i}}{1-{\upsilon }_{i}}\), where υi is the expected value of yi. For the probit model (PM), η is the vector of expected liability, \({\eta }_{i}={\upphi }^{-1}\left({\upsilon }_{i}\right)\), where \({\upphi }^{-1}\left( .\right)\) is the inverse cumulative standard normal distribution function. The vectors µ, l, a, m, and the matrixes Wl, Za, Zm are defined similar to those in the linear model.

The variance components were estimated using AI-REML method [26]. The AI-REML procedure for some ssGBLUP model did not converge. Therefore, variance components estimated from pedigree-based models were used in estimation of breeding values in all models. The estimation of variance components and breeding values was performed using the DMU software [27].

Validation of genomic predictions

To validate genomic prediction, the 5 ~ 7th generations were used as reference population, and the 8th generation was used as validation population. In this study, genomic predictions were evaluated using the following criteria: 1) The correlation between the estimated breeding value (EBV) and the true breeding value (TBV, i.e., a, m or a + m in liability scale in the simulation) to assess the accuracy of genomic prediction; 2) Average true breeding value of the top 1%, 30% of all individuals in EBVs to assess the realized selection differential, where 1% can be considered as selection intensity for boars and 30% for sows; 3) Regression of EBV from whole data with genotypes of all animals on the EBV from reference data for each genotyping scenario, similar to Legarra and Reverter's study [28], to evaluate dispersion bias of a particular model and genotyping scenario. Note that dispersion bias was assessed by comparing the EBV using full data information instead of true breeding value. The reason was that the true BV in the simulation was BV of liability, but the EBV from linear model was in observed scale and EBV from logit model was in logit scale. Even for probit model, the scale of EBV was also different from simulated TBV, before a restriction of residual variance being 1 in the probit model. Thus, the expected regression of true BV on EBV was not equal to one even in the case of unbiased prediction. Paired t-test was used to test the difference between accuracies of EBV from the four genotyping strategies and from the three models.

Results

The variance components estimated from the model with pedigree-based relationship matrix were used for estimation of breeding values. Heritabilities estimated using pedigree information are shown in Table 1. Proportions of variances and heritabilities were different among the three models due to different scales. For traits T4/4 and T2/2, when using the logit model and the probit model, the estimated direct heritability ranged from 0.011 to 0.22 and was lower than the estimated maternal heritability, which ranged from 0.019 to 0.039. This was unexpected since direct and maternal heritabilities were the same in the simulation for the two traits. For the three models, the estimates of correlation coefficients between the direct and maternal additive effects ranged from 0.286 to 0.523, and had large standard errors.

Table 1 Estimates of proportion of litter variance (lit2), direct heritability (\(h_a^2\)), maternal heritability (\(h_m^2\)), and correlation between direct and maternal additive genetic effects (ram) using models incorporating pedigree-based relationship matrix1

Accuracies of EBV were measured as correlation coefficients between EBV and TBV. Accuracies of estimated direct (a), maternal (m) and total (a + m) breeding values are shown in Table 2. Models using genomic information achieved up to 59.2% higher accuracy of estimated breeding value than models using pedigree information, dependent on genotyping scenarios. Accuracies of EBV for a from the three models using only pedigree-based relationship matrix (scenario G_none) ranged from 0.287 to 0.288 for trait T4/4, 0.242 to 0.245 for T2/4 and 0.224 to 0.226 for T2/2. When using genomic data across the three scenarios (G_all, G80_ran, G_alive), the accuracies ranged from 0.375 to 0.459 for T4/4, 0.293 to 0.352 for T2/4 and 0.286 to 0.340 for T2/2. Accuracies of EBV for the maternal effect, m using only pedigree-based relationship matrix ranged from 0.247 to 0.251 for trait T4/4, 0.264 to 0.270 for T2/4 and 0.196 to 0.197 for T2/2. When using genomic data and across all scenarios, the accuracies of maternal effect ranged from 0.385 to 0.409 for T4/4, 0.397 to 0.418 for T2/4 and 0.310 to 0.325 for T2/2. Accuracies of EBV for total genetic effect, a + m using pedigree-based models without genomic information ranged from 0.314 to 0.315 for trait T4/4, 0.310 to 0.311 for T2/4 and 0.249 for T2/2. Across all scenarios with genomic data, the accuracies ranged from 0.447 to 0.500 for T4/4, 0.428 to 0.458 for T2/4 and 0.359 to 0.391 for T2/2.

Table 2 Correlation coefficient between estimated breeding values and true breeding values

As expected, for the three types of EBV (a, m, and a + m), the scenario of all individuals, including dead individuals, being genotyped (G_all) had the highest accuracy. The composition of genotyping individuals affected the accuracies of EBV for a and a + m, but not for m. In scenario of G_alive, the accuracies of EBV for a were 0.375 to 0.378 for trait T4/4, 0.293 to 0.299 for T2/4 and 0.286 to 0.288 for T2/2. With the same size of genotyped pigs, the accuracies of G80_ran were higher than those in G_alive by 12.70% ~ 13.76% for trait T4/4, 10.92% ~ 12.20% for T2/4 and 10.14% ~ 11.46% for T2/2. The trend of accuracies for a + m was the same as that for a. Thus, the accuracies of EBV for a + m in G_alive were 0.447 to 0.449 for trait T4/4, 0.428 to 0.429 for T2/4 and 0.359 to 0.360 for T2/2, and the accuracies of G80_ran were higher than those in G_alive by 5.35% ~ 6.04% for trait T4/4, 2.56% ~ 2.57% for T2/4 and 3.06% ~ 3.34% for T2/2. However, the trend of accuracies for m was different from those for a and a + m in terms of composition of genotyped individuals. The accuracies of EBV for m in G80_ran were similar to those in G_alive, and the differences among them were less than 0.01 for the three traits (P < 0.05).

As shown in Table 2, accuracies of the linear model were very similar to the logit and probit models for the three types of EBV, and the differences among them were less than 0.01 for the three traits. The differences of accuracies for a ranged from 0 to 0.008 for trait T4/4, 0 to 0.008 for T2/4 and 0 to 0.007 for T2/2. The differences of accuracies for m ranged from 0 to 0.008 for trait T4/4, 0.001 to 0.006 for T2/4 and 0 to 0.001 for T2/2. The differences of accuracies for a + m ranged from 0 to 0.002 for trait T4/4, 0 to 0.001 for T2/4 and 0 to 0.001 for T2/2.

In scenarios of G80_ran and G_alive, 20% animals did not have genotype data. Additional file 1: Table S1 shows that the accuracies of genotyped individuals were higher than those of non-genotyped pigs. The differences of accuracies for a ranged from 0.077 to 0.093 for trait T4/4, 0.037 to 0.046 for T2/4 and 0.061 to 0.072 for T2/2. The differences of accuracies for m ranged from 0.058 to 0.090 for trait T4/4, 0.053 to 0.074 for T2/4 and 0.058 to 0.087 for T2/2. The differences of accuracies for the total EBV ranged from 0.094 to 0.109 for trait T4/4, 0.068 to 0.086 for T2/4 and 0.079 to 0.094 for T2/2. In addition, the accuracies of the three types of EBV for non-genotyped animals (Additional file 1: Table S1) were higher than those for animals in scenario of without any genotype information (Table 2, G_none).

The regression coefficients of the EBV from the whole data with all animals having genotypes on the EBV from different reference data are presented in Table 3. The range of the regression coefficients of direct EBV were between 1.046 and 1.132 for T4/4, 1.001 and 1.126 for T2/4, 0.944 and 1.019 for T2/2. The range of the regression coefficients of maternal (m) EBV were between 0.895 and 0.938 for T4/4, 1.057 and 1.085 for T2/4, 1.000 and 1.043 for T2/2. The range of the regression coefficients of the total EBV (a + m) were between 0.974 and 1.026 for T4/4, 1.082 and 1.122 for T2/4, 0.960 and 1.013 for T2/2. The regression coefficients around 1 indicated that dispersions of predictions were unbiased with respect to use of the different reference data. The regression coefficients for validation individuals with or without genotype are presented in Additional file 1: Table S2. The regression coefficients of genotyped individuals were similar to those of non-genotyped individuals for all three traits.

Table 3 Regression coefficient of the EBV from whole data on the EBV from reference data

Table 4 shows the mean total TBV of the top 1% individuals with highest total EBV. It was observed that the higher the accuracy of EBV for a + m (Table 2), the higher the TBV. For trait T4/4, the scenario of all individuals with genotypes obtained the highest TBV for a + m (4.498 to 4.553), followed by scenario G80_ran (4.297 to 4.346), after then by scenario G_alive (4.221 to 4.308), and the lowest was scenario G_none (2.583 to 2.712). The order of TBV for a + m from the four scenarios was the same in the other two traits T4/4 and T2/4. The order of TBV for a is the same as that for a + m but not for m. The order of TBV for m between the scenarios G80_ran and G_alive was changed, G_alive was higher than G80_ran for T4/4 and T2/2. When using genomic data, TBVs for a from linear model were higher than those from logit model and probit model. However, using pedigree-based models without genomic information, TBVs for a from linear model were lower than the logit and probit models. With or without genomic information, TBVs for maternal effect, (m) from linear model were lower than those from the logit and probit models for all traits.

Table 4 The mean of true breeding value of the top 1% of animals with the highest total estimated breeding value

Table 5 shows the mean total TBV of the top 30% individuals with highest total EBV. For all traits, the order of the four scenarios of total TBV of the top 30% individuals is consistent with that of the top 1% individuals, i.e., scenario G_all obtained the highest TBV, followed by scenario G80_ran, after then by scenario G_alive, and the lowest was scenario G_none. In the four scenarios, linear model outperformed the logit and probit models for a, but not for m.

Table 5 The mean of true breeding value of the top 30% of animals with the total estimated breeding value

Discussion

In this study, we compared four genotyping strategies and three prediction models when predicting breeding values for three pig survival traits with different direct and maternal heritabilities. When using variance components estimated from pedigree-based model, genomic predictions were unbiased with respect to dispersion of predictions, even for the scenario with genotypes only from alive animals. Random genotyping individuals led to higher prediction accuracy than only genotyping alive individuals, given the same number of genotyped animals. The linear model can achieve similar genomic prediction ability as the logit and probit models.

In the current study, variance components were estimated from pedigree-based model and these estimates were used for predicting breeding values in all genotyping scenarios. It has been reported that when selection is based on genomic information, genetic parameters estimated without this information can be biased [29]. Similarly, when selection is based on pedigree information, genetic parameters estimated using ssGBLUP model can also be biased [30]. However, the impact of selection on variance components estimates was not an issue in the current study, because the simulated population was a random selection population. On the other hand, the current study involved the issue of selective genotyping. In a pig breeding program, dead animals are usually not genotyped, which may lead to biased estimation of variance components and genomic prediction when using a genomic model for parameter estimation. We carried out an extra simulation study using models with genomic data and found that parameter estimation using ssGBLUP model with genotypes only from alive animals severely overestimated additive genetic variance and led to a residual variance close to zero (Additional file 1: Table S3). Similarly, Wang et al. [31] reported that selective genotyping severely overestimated additive genetic variance using a ssGBLUP model. Due to problems with convergence and biased estimation of variance components in some scenarios, variances estimated from pedigree-based models were used for predicting breeding values in the current study.

Due to the estimates from the three models are on different scales, they cannot be directly compared. By a transformation from observed scale heritability to liability scale heritability [32], the liability scale heritabilities estimated from the linear model were consistent with those used in simulating data. However, the logit and probit model underestimated direct heritabilities and overestimated the correlation between direct and maternal additive genetic effects. The possible reason could be that including maternal additive genetic effect in the model increase model complexity, and it is difficult to distinguish direct and maternal additive genetic effects as reflected by large standard error for the estimates of correlation between direct and maternal additive genetic effects in this study. The logit and probit animal model could be more sensitive to model complexity compared with the linear animal model. This could be also the reason that the logit and probit models did not perform better prediction than the linear model in the current study though the two models are more appropriate in theory.

In this study, we compared accuracies of total EBV of four genotyping strategies for three traits. Accuracies of total EBV of three strategies using genomic information outperformed that using only pedigree information, and the accuracies of genotyped individuals were higher than those of non-genotyped individuals in the same strategy. Furthermore, since non-genotyped animal benefit from genomic information of other animals, the accuracies of non-genotyped individuals in scenarios G80_ran or G_alive were higher than the individuals in scenario G_none. Those results are consistent with previous study for piglet mortality using a ssGBLUP method in Danish Landrace and Yorkshire pigs [15]. Among the three strategies using genomic information, accuracies of total EBV of the strategy genotyping all individuals in the reference population was superior to the strategy genotyping only some individuals, the result was also consistent with theoretical expectations [33]. However, with the same size of genotyped individuals, genotyping both alive and dead pigs have a higher accuracy than genotyping only for alive pigs, indicating that the genotypes of dead pigs have an important influence on the accuracy of genomic prediction. Therefore, it could be a good strategy to genotype dead animals. In the current study, genetic values were generated from 730 QTLs for which the direct and maternal additive genetic effects followed a bivariate distribution, since previous studies [34] have revealed that pig mortality is a complex trait and has a polygenic genetic architecture. In case of pig mortality is controlled by a small number of genes, the frequency of unfavorable genes would be largely different between dead animals and alive animals, implying greater need to genotype dead animals for genomic prediction of pig mortality. A study based on real data of pig mortality will be of great importance, however genotype data of dead pigs are not available currently in a pig breeding program.

As expected, the trait with higher heritability had higher prediction accuracy. Further, with the same heritability for direct and maternal additive genetic effect of traits T4/4 and T2/2, accuracies of direct EBV (a) were higher than those of maternal EBV (m) for scenarios of G_all, G80_ran, and G_none, indicating maternal genetic effect is more difficult to estimate in general (Table 1). However, accuracies of maternal EBV were higher than those of direct EBV in scenario of G_alive, achieving accuracies similar to those in scenario G80_ran, suggesting selective genotyping for alive animal has small impact on prediction accuracy for maternal additive genetic effect, but large impact on predicting direct additive genetic effect.

We compared the accuracy of genomic prediction of a linear model, a logit model and a probit model for survival in pigs. Using pedigree information, accuracies of total EBV were very similar among the three models, the differences were less than 1% for all traits T4/4, T2/4 and T2/2. Previous studies have shown that linear, the logit and probit models have similar predictive capabilities for threshold traits [19, 20, 36]. In a simulation study, Carlén et al. [36] showed the prediction ability of linear and threshold models were very similar for mastitis which was defined as a binary trait in Dairy Cattle. Koeck et al. [19] evaluated the performance of a linear, a logit and a probit model for genetic analyses of clinical mastitis in Austrian Fleckvieh dual purpose cows and showed that there were very small differences in the predictive ability among the three models. In a Norwegian Red cows population, Vazquez et al. [20] also observed similar results when comparing the genetic predictive ability of threshold and linear models for clinical mastitis. Using genomic information, accuracies of total EBV were higher than those only using pedigree information, but like pedigree-based prediction, accuracies were very similar among linear, logit and threshold models for all the three traits in the current study. Although the logit and probit models were hypothesized to be more suitable for threshold traits, the results indicated that the predictive power of the linear, the logit and probit models are similar in genomic prediction for survival traits.

Conclusions

In this study, three survival traits with different heritabilities were simulated to explore the impact of genotyping strategies and statistical models on genomic prediction. The results showed that genomic predictions with genotypes only from alive animals were unbiased when using variance components estimated from pedigree-based model. Randomly genotyping individuals can obtain higher accuracy than only genotyping alive individuals, given the same number of genotyped individuals. The predictive powers of the linear model, the logit and probit models were similar. We conclude that the genomic information of dead individuals is very useful, and linear model is a good choice for genomic prediction of survival in pigs. It is recommended to use variances estimated from pedigree-based model for genomic prediction in the case of selective genotyping.

Availability of data and materials

The datasets analyzed during the current study are available from the corresponding author on reasonable request.

Abbreviations

EBV:

Estimated breeding value

GBLUP:

Genomic best linear unbiased prediction

GEBV:

Genomic estimated breeding value

LG:

Logit model

LM:

Linear model

PM:

Probit model

QTL:

Quantitative trait locus

ssGBLUP:

Single-step GBLUP model

TBV:

True breeding value

References

  1. Knauer MT, Hostetler CE. Us swine industry productivity analysis, 2005 to 2010. J Swine Health Prod. 2013;21(5):248–52.

    Google Scholar 

  2. Koketsu Y, Iida R, Piñeiro C. A 10-year trend in piglet pre-weaning mortality in breeding herds associated with sow herd size and number of piglets born alive. Porcine Health Management. 2021;7(1):4.

    Article  Google Scholar 

  3. Schaeffer LR. Strategy for applying genome-wide selection in dairy cattle. J Anim Breed Genet. 2006;123(4):218–23.

    Article  CAS  Google Scholar 

  4. Luan T, Woolliams JA, Lien S, Kent M, Svendsen M, Meuwissen THE. The accuracy of genomic selection in norwegian red cattle assessed by cross-validation. Genetics. 2009;183(3):1119–26.

    Article  Google Scholar 

  5. VanRaden PM, Van Tassell CP, Wiggans GR, Sonstegard TS, Schnabel RD, Taylor JF, et al. Invited review: Reliability of genomic predictions for north american holstein bulls. J Dairy Sci. 2009;92(1):16–24.

    Article  CAS  Google Scholar 

  6. Lillehammer M, Meuwissen THE, Sonesson AK. Genomic selection for maternal traits in pigs. J Anim Sci. 2011;89(12):3908–16.

    Article  CAS  Google Scholar 

  7. Ostersen T, Christensen O, Henryon M, Nielsen B, Su G, Madsen P. Deregressed ebv as the response variable yield more reliable genomic predictions than traditional ebv in pure-bred pigs. Genet Sel Evol. 2011;43(1):38.

    Article  Google Scholar 

  8. Christensen OF, Madsen P, Nielsen B, Ostersen T, Su G. Single-step methods for genomic evaluation in pigs. Animal. 2012;6(10):1565–71.

    Article  CAS  Google Scholar 

  9. Chen CY, Misztal I, Aguilar I, Tsuruta S, Meuwissen THE, Aggrey SE, et al. Genome-wide marker-assisted selection combining all pedigree phenotypic information with genotypic data in one step: An example using broiler chickens. J Anim Sci. 2011;89(1):23–8.

    Article  CAS  Google Scholar 

  10. Wolc A, Arango J, Settar P, Fulton J, O’Sullivan N, Preisinger R, et al. Persistence of accuracy of genomic estimated breeding values over generations in layer chickens. Genet Sel Evol. 2011;43(1):23.

    Article  Google Scholar 

  11. Liu T, Qu H, Luo C, Shu D, Wang J, Lund M, et al. Accuracy of genomic prediction for growth and carcass traits in chinese triple-yellow chickens. BMC Genet. 2014;15(1):110.

    Article  Google Scholar 

  12. Su G, Guldbrandtsen B, Gregersen VR, Lund MS. Preliminary investigation on reliability of genomic estimated breeding values in the danish holstein population. J Dairy Sci. 2010;93(3):1175–83.

    Article  CAS  Google Scholar 

  13. Meuwissen THE, Hayes BJ, Goddard ME. Prediction of total genetic value using genome-wide dense marker maps. Genetics. 2001;157(4):1819–29.

    Article  CAS  Google Scholar 

  14. Knol EF, Nielsen B, Knap PW. Genomic selection in commercial pig breeding. Anim Front. 2016;6(1):15–22.

    Article  Google Scholar 

  15. Guo X, Christensen OF, Ostersen T, Wang Y, Lund MS, Su G. Improving genetic evaluation of litter size and piglet mortality for both genotyped and nongenotyped individuals using a single-step method1. J Anim Sci. 2015;93(2):503–12.

    Article  CAS  Google Scholar 

  16. Leite NG, Knol EF, Garcia ALS, Lopes MS, Zak L, Tsuruta S, et al. Investigating pig survival in different production phases using genomic models. J Anim Sci. 2021;99(8):skab217.

    Article  Google Scholar 

  17. Su G, Sorensen D, Lund MS. Variance and covariance components for liability of piglet survival during different periods. Animal. 2008;2(2):184–9.

    Article  CAS  Google Scholar 

  18. Gianola D, Foulley JL. Sire evaluation for ordered categorical data with a threshold model. Genet Sel Evol. 1983;15(2):201–24.

    Article  CAS  Google Scholar 

  19. Koeck A, Heringstad B, Egger-Danner C, Fuerst C, Fuerst-Waltl B. Comparison of different models for genetic analysis of clinical mastitis in austrian fleckvieh dual-purpose cows. J Dairy Sci. 2010;93(9):4351–8.

    Article  CAS  Google Scholar 

  20. Vazquez AI, Perez-Cabal MA, Heringstad B, Rodrigues-Motta M, Rosa GJM, Gianola D, et al. Predictive ability of alternative models for genetic analysis of clinical mastitis. J Anim Breed Genet. 2012;129(2):120–8.

    Article  CAS  Google Scholar 

  21. Sargolzaei M, Schenkel FS. Qmsim: A large-scale genome simulator for livestock. Bioinformatics. 2009;25(5):680–1.

    Article  CAS  Google Scholar 

  22. Ma X, Christensen OF, Gao H, Huang R, Nielsen B, Madsen P, et al. Prediction of breeding values for group-recorded traits including genomic information and an individually recorded correlated trait. Heredity. 2021;126(1):206–17.

    Article  CAS  Google Scholar 

  23. Henderson CR. Best linear unbiased estimation and prediction under a selection model. Biometrics. 1975;31(2):423–47.

    Article  CAS  Google Scholar 

  24. Christensen OF, Lund MS. Genomic prediction when some animals are not genotyped. Genet Sel Evol. 2010;42(1):2.

    Article  Google Scholar 

  25. VanRaden PM. Efficient methods to compute genomic predictions. J Dairy Sci. 2008;91(11):4414–23.

    Article  CAS  Google Scholar 

  26. Jensen J, Mäntysaari EA, Madsen P, Thompson R. Residual maximum likelihood estimation of (co)variance components in multivariate mixed linear models using average information. J Indian Soc Agric Stat. 1997;49:215–36.

    Google Scholar 

  27. Madsen P, Su G, Labouriau R, Christensen OF. Dmu - a package for analyzing multivariate mixed models. In: 9th World Congress on Genetics Applied to Livestock Production. Leipzig, Germany; 2010. paper 732.

  28. Legarra A, Reverter A. Semi-parametric estimates of population accuracy and bias of predictions of breeding values and future phenotypes using the lr method. Genet Sel Evol. 2018;50(1):53.

    Article  Google Scholar 

  29. Hidalgo J, Tsuruta S, Lourenco D, Masuda Y, Huang Y, Gray KA, et al. Changes in genetic parameters for fitness and growth traits in pigs under genomic selection. J Anim Sci. 2020;98(2):skaa032.

    Article  Google Scholar 

  30. Gao H, Madsen P, Aamand GP, Thomasen JR, Sorensen AC, Jensen J. Bias in estimates of variance components in populations undergoing genomic selection: A simulation study. BMC Genomics. 2019;20(1):956.

    Article  CAS  Google Scholar 

  31. Wang L, Janss LL, Madsen P, Henshall J, Huang C-H, Marois D, et al. Effect of genomic selection and genotyping strategy on estimation of variance components in animal models using different relationship matrices. Genet Sel Evol. 2020;52(1):31.

    Article  CAS  Google Scholar 

  32. Dempster ER, Lerner IM. Heritability of threshold characters. Genetics. 1950;35(2):212–36.

    Article  CAS  Google Scholar 

  33. Daetwyler HD, Villanueva B, Woolliams JA. Accuracy of predicting the genetic risk of disease using a genome-wide approach. PLoS ONE. 2008;3(10):e3395.

    Article  Google Scholar 

  34. Guo X, Su G, Christensen OF, Janss L, Lund MS. Genome-wide association analyses using a bayesian approach for litter size and piglet mortality in danish landrace and yorkshire pigs. BMC Genomics. 2016;17:468.

  35. Ding R, Qiu Y, Zhuang Z, Ruan D, Wu J, Zhou S, et al. Genome-wide association studies reveals polygenic genetic architecture of litter traits in duroc pigs. Theriogenology. 2021;173:269–78.

    Article  CAS  Google Scholar 

  36. Carlén E, Emanuelson U, Strandberg E. Genetic evaluation of mastitis in dairy cattle using linear models, threshold models, and survival analysis: A simulation study. J Dairy Sci. 2006;89(10):4049–57.

    Article  Google Scholar 

Download references

Acknowledgements

Not applicable.

Funding

This study was funded by the “Genetic improvement of pig survival” project from Danish Pig Levy Foundation (Aarhus, Denmark). The China Scholarship Council (CSC) is acknowledged for providing scholarship to the first author. 

Author information

Authors and Affiliations

Authors

Contributions

GS and TL conceived and designed the study. TL simulated and analyzed data. TL and GS wrote the manuscript. BN, OFC and MSL helped in interpreting results and improved the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Guosheng Su.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Supplementary Information

Additional file 1:

Table S1. Correlation coefficient between the EBV and true breeding values for validation individuals with or without genotypes. Table S2. Regression coefficient of the EBV from whole data on the EBV from reference data for validation individuals with or without genotype. Table S3. Estimates of variances and heritability using a linear model without maternal additive genetic effect for the trait T4/4.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, T., Nielsen, B., Christensen, O.F. et al. The impact of genotyping strategies and statistical models on accuracy of genomic prediction for survival in pigs. J Animal Sci Biotechnol 14, 1 (2023). https://doi.org/10.1186/s40104-022-00800-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s40104-022-00800-5

Keywords