 Research
 Open Access
 Published:
The impact of genotyping strategies and statistical models on accuracy of genomic prediction for survival in pigs
Journal of Animal Science and Biotechnology volume 14, Article number: 1 (2023)
Abstract
Background
Survival from birth to slaughter is an important economic trait in commercial pig productions. Increasing survival can improve both economic efficiency and animal welfare. The aim of this study is to explore the impact of genotyping strategies and statistical models on the accuracy of genomic prediction for survival in pigs during the total growing period from birth to slaughter.
Results
We simulated pig populations with different direct and maternal heritabilities and used a linear mixed model, a logit model, and a probit model to predict genomic breeding values of pig survival based on data of individual survival records with binary outcomes (0, 1). The results show that in the case of only alive animals having genotype data, unbiased genomic predictions can be achieved when using variances estimated from pedigreebased model. Models using genomic information achieved up to 59.2% higher accuracy of estimated breeding value compared to pedigreebased model, dependent on genotyping scenarios. The scenario of genotyping all individuals, both dead and alive individuals, obtained the highest accuracy. When an equal number of individuals (80%) were genotyped, random sample of individuals with genotypes achieved higher accuracy than only alive individuals with genotypes. The linear model, logit model and probit model achieved similar accuracy.
Conclusions
Our conclusion is that genomic prediction of pig survival is feasible in the situation that only alive pigs have genotypes, but genomic information of dead individuals can increase accuracy of genomic prediction by 2.06% to 6.04%.
Background
Survival from birth to slaughter is an important economic trait in commercial pig productions. Increased survival also improves the welfare in pigs. According to productivity data, the cumulative survival rate from birth to slaughter is lower than 70% [1], and in addition there has been a downward trend for piglet preweaning survival in the past ten years [2]. Use of genomic information in the selection program will be a sustainable and effective way to reduce pig mortality. As a powerful genetic improvement tool, genomic selection has been widely used in animal breeding, such as in cattle [3,4,5], pig [6,7,8], and chicken [9,10,11]. Genomic selection is especially beneficial for the traits with low heritability that have slow genetic progress when using traditional pedigreebased methods [12,13,14]. Guo et al. [15] studied the accuracy of estimated breeding values for piglet survival rate from birth to day 5 and reported that the accuracy for the singlestep method was higher than for pedigreebased method by 14.2% for Landrace, and by 7.2% for Yorkshire. In a crossbred pig population, Leite et al. [16] compared the accuracies of the estimated breeding values of mortality at five stages from birth to slaughter, and reported that the accuracy for the singlestep method was 16.7%–78.9% higher than for pedigreebased method, with the largest improvement of accuracy for lactation mortality and smallest improvement for postweaning mortality.
Usually, like litter size, piglet survival is recorded as a trait of the sow or the service sire [15, 16]. However, survival is a complex trait that is also affected by the pig’s own genotype. It may therefore be more appropriate to assess genetic merit of survival at individual level [17]. However, evaluating survival at individual level will introduce problems with genotyping strategies in the sense that, generally, dead individuals do not have genotypes. Using only the genotype data of alive individuals may lead to biased genomic predictions. The influence of the genotype of the dead individuals on the accuracy and unbiasedness of genomic prediction needs to be studied.
Finally, survival at individual level is a binary trait which does not obey a normal distribution, and thus conventional statistical analysis methods may not be suitable [18]. Therefore, when estimating the breeding value, a logit model or a liability threshold model could be more appropriate. However, Koeck et al. [19] evaluated the performance of a linear model and a logit model for genetic analyses of clinical mastitis in Austrian Fleckvieh dual purpose cows and found that there was no difference in the predictive ability between the linear model and the logit model. In the Norwegian Red cows population, Vazquez et al. [20] also compared the genetic evaluation of a liability threshold model with a linear model for clinical mastitis, where the results also showed that there was no difference in the predictive capabilities of the two models. It is necessary to investigate if a logit or a liability threshold model is better than a linear model for predicting breeding value of survival in pig populations.
We hypothesized that different genotyping strategies affect accuracy and unbiasedness in the breeding value estimation. Furthermore, we hypothesized that logit or liability threshold models are more suitable for predicting threshold traits as well for genomic prediction as without genomic information. Therefore, this study has two objectives: (1) explore the impact of genotyping scenarios, especially no genotypes of dead individuals on genomic prediction of mortality; (2) assess linear versus logit and liability threshold models in estimation of breeding value.
Materials and methods
Data simulation
The data were simulated using QMSim software [21] mimicking a pig population. In this study, we simulated 18 chromosomes, each chromosome was 100 cM, had 3100 markers and 50 QTLs. It was assumed that the QTL effects had a normal distribution. The simulation started with a founder population of 200 males and 200 females, and went through 300 nonoverlapping historical generations to generate linkage disequilibrium between markers and QTLs. In total, about 45,000 markers and 730 QTLs were segregating in the genome for the last historical population, with slight differences in the number of markers and QTLs of each repetition. After historical generations, 30 boars randomly selected from the last history generation and all 200 sows in the generation were used to create a base population. After this, the population went through eight nonoverlapping generations. In each generation, 30 sires and 300 dams were randomly selected from alive animals (see below on how survival/death of animals was simulated), a sire mated 10 dams randomly, and each dam produced one litter. The litter sizes were 10, 12, 14, 16, or 18 with the probabilities 0.02, 0.14, 0.68, 0.14, 0.02, respectively, and sex ratio of piglets was 1:1. The data from generations 5 ~ 8 were used in the analysis.
The phenotypic liability of an individual to be alive was generated as the sum of direct additive genetic effect of the individual, maternal additive genetic effect of the dam, litter effect and random residual. Fixed effects (such as herdyearmonth) were not considered. In this study, three survival traits with different variances and covariances were simulated, i.e., direct heritability and maternal heritability were set as 0.04 and 0.04 (T_{4/4}), 0.02 and 0.04 (T_{2/4}), or 0.02 and 0.02 (T_{2/2}), respectively. The genetic correlation between direct and maternal additive genetic effects was 0.30. The variance of the litter effect was the same as the maternal additive genetic variance. The direct and maternal QTL allele effects were sampled from a bivariate normal distribution with the specified correlation. The true breeding values (TBVs) of direct and maternal additive genetic effect were defined as the sum of the QTL allele effects, and these TBVs were scaled to have the variances as the designed values [22]. The other random effects were sampled from normal distributions with the corresponding variance. The phenotype in observed scale was scored as 1 if the liability to survival was the top 80%, and otherwise 0, i.e., the mortality rate was 20%. Each of the three traits with different heritability was simulated with 40 replicates.
Four genotyping scenarios were studied: (1) all pigs were genotyped (G_all); (2) 80% of pigs randomly selected from the whole population were genotyped (G80_ran); (3) only alive pigs (80%) were genotyped (G_alive); (4) no pig was genotyped (G_none).
Statistical analysis
A linear, a logit and a probit model (i.e., a liability threshold model) were used for estimation of genetic parameters and breeding values. The models were as follows:
The linear model (LM) is,
where y is the vector of binary observations of pig survival with 0 and 1 representing dead and alive, respectively; µ is the overall mean; 1 is the vector of ones; l is the vector of litter effects; a is the vector of direct additive genetic effects; m is the vector of maternal additive genetic effects; and e is the vector of residual effects. The matrices W_{l}, Z_{a}, Z_{m} are incidence matrixes associating l, a, m with y. In the model, direct and maternal additive genetic effects are correlated, and the other effects are independent of each other. Thus, it is assumed that l, e, a and m have the following distributions:\({\varvec{l}} \sim N\left(0,{\varvec{I}}{\sigma }_{l}^{2}\right)\), \({\varvec{e}} \sim N\left(0,\mathbf{I}{\sigma }_{e}^{2}\right)\), \(\left[\begin{array}{c}{\varvec{a}}\\ {\varvec{m}}\end{array}\right]\sim N\left(0,\left[\begin{array}{c}{\sigma }_{a}^{2} {\sigma }_{am}\\ {\sigma }_{am} {\sigma }_{m}^{2}\end{array}\right]\otimes {\varvec{K}}\right)\), where \({\sigma }_{l}^{2}\), \({\sigma }_{e}^{2}\), \({\sigma }_{a}^{2}\), \({\sigma }_{m}^{2}\) and \({\sigma }_{am}\) are litter variance, residual variance, direct additive genetic variance, maternal additive genetic variance, and covariance between direct and maternal additive genetic effects, respectively, and K is an additive genetic relationship matrix based on pedigree and/or genomic information. When using the pedigreebased method for the scenario of no genotyping, K was constructed from pedigree information [23]. When using the singlestep GBLUP model (ssGBLUP), K represents the H matrix constructed from pedigree and genome information [24]. The H matrix is as follows,
where A_{11} and A_{22} are the submatrixes of pedigreebased relationship matrix (A) for relationships between genotyped individuals and between nongenotyped individuals, respectively, A_{12} or A_{21} are the submatrixes for relationships between genotyped and nongenotyped individuals and \({{\varvec{G}}}_{{\varvec{\omega}}}=\left(1\omega \right){{\varvec{G}}}^{\boldsymbol{*}}+\omega {{\varvec{A}}}_{11}\). In this study, ω is set to 0.2. G was the markerbased genomic relationship matrix [25], G* is the adjustment matrix of G, which is calculated by the following formula [8],
In the scenario where all animals are genotyped, \({\varvec{K}}\boldsymbol{ }=\boldsymbol{ }{\varvec{G}}\_{\varvec{\omega}}\).
The logit model and probit model (also called liability threshold model) are described as,
For the logit model (LG), η is the vector of logodds of the expected pig survival, \({\eta }_{i}={\mathrm{log}}_{\mathrm{e}}\frac{{\upsilon }_{i}}{1{\upsilon }_{i}}\), where υ_{i} is the expected value of y_{i}. For the probit model (PM), η is the vector of expected liability, \({\eta }_{i}={\upphi }^{1}\left({\upsilon }_{i}\right)\), where \({\upphi }^{1}\left( .\right)\) is the inverse cumulative standard normal distribution function. The vectors µ, l, a, m, and the matrixes W_{l}, Z_{a}, Z_{m} are defined similar to those in the linear model.
The variance components were estimated using AIREML method [26]. The AIREML procedure for some ssGBLUP model did not converge. Therefore, variance components estimated from pedigreebased models were used in estimation of breeding values in all models. The estimation of variance components and breeding values was performed using the DMU software [27].
Validation of genomic predictions
To validate genomic prediction, the 5 ~ 7^{th} generations were used as reference population, and the 8^{th} generation was used as validation population. In this study, genomic predictions were evaluated using the following criteria: 1) The correlation between the estimated breeding value (EBV) and the true breeding value (TBV, i.e., a, m or a + m in liability scale in the simulation) to assess the accuracy of genomic prediction; 2) Average true breeding value of the top 1%, 30% of all individuals in EBVs to assess the realized selection differential, where 1% can be considered as selection intensity for boars and 30% for sows; 3) Regression of EBV from whole data with genotypes of all animals on the EBV from reference data for each genotyping scenario, similar to Legarra and Reverter's study [28], to evaluate dispersion bias of a particular model and genotyping scenario. Note that dispersion bias was assessed by comparing the EBV using full data information instead of true breeding value. The reason was that the true BV in the simulation was BV of liability, but the EBV from linear model was in observed scale and EBV from logit model was in logit scale. Even for probit model, the scale of EBV was also different from simulated TBV, before a restriction of residual variance being 1 in the probit model. Thus, the expected regression of true BV on EBV was not equal to one even in the case of unbiased prediction. Paired ttest was used to test the difference between accuracies of EBV from the four genotyping strategies and from the three models.
Results
The variance components estimated from the model with pedigreebased relationship matrix were used for estimation of breeding values. Heritabilities estimated using pedigree information are shown in Table 1. Proportions of variances and heritabilities were different among the three models due to different scales. For traits T_{4/4} and T_{2/2}, when using the logit model and the probit model, the estimated direct heritability ranged from 0.011 to 0.22 and was lower than the estimated maternal heritability, which ranged from 0.019 to 0.039. This was unexpected since direct and maternal heritabilities were the same in the simulation for the two traits. For the three models, the estimates of correlation coefficients between the direct and maternal additive effects ranged from 0.286 to 0.523, and had large standard errors.
Accuracies of EBV were measured as correlation coefficients between EBV and TBV. Accuracies of estimated direct (a), maternal (m) and total (a + m) breeding values are shown in Table 2. Models using genomic information achieved up to 59.2% higher accuracy of estimated breeding value than models using pedigree information, dependent on genotyping scenarios. Accuracies of EBV for a from the three models using only pedigreebased relationship matrix (scenario G_none) ranged from 0.287 to 0.288 for trait T_{4/4}, 0.242 to 0.245 for T_{2/4} and 0.224 to 0.226 for T_{2/2}. When using genomic data across the three scenarios (G_all, G80_ran, G_alive), the accuracies ranged from 0.375 to 0.459 for T_{4/4}, 0.293 to 0.352 for T_{2/4} and 0.286 to 0.340 for T_{2/2}. Accuracies of EBV for the maternal effect, m using only pedigreebased relationship matrix ranged from 0.247 to 0.251 for trait T_{4/4}, 0.264 to 0.270 for T_{2/4} and 0.196 to 0.197 for T_{2/2}. When using genomic data and across all scenarios, the accuracies of maternal effect ranged from 0.385 to 0.409 for T_{4/4}, 0.397 to 0.418 for T_{2/4} and 0.310 to 0.325 for T_{2/2}. Accuracies of EBV for total genetic effect, a + m using pedigreebased models without genomic information ranged from 0.314 to 0.315 for trait T_{4/4}, 0.310 to 0.311 for T_{2/4} and 0.249 for T_{2/2}. Across all scenarios with genomic data, the accuracies ranged from 0.447 to 0.500 for T_{4/4}, 0.428 to 0.458 for T_{2/4} and 0.359 to 0.391 for T_{2/2}.
As expected, for the three types of EBV (a, m, and a + m), the scenario of all individuals, including dead individuals, being genotyped (G_all) had the highest accuracy. The composition of genotyping individuals affected the accuracies of EBV for a and a + m, but not for m. In scenario of G_alive, the accuracies of EBV for a were 0.375 to 0.378 for trait T_{4/4}, 0.293 to 0.299 for T_{2/4} and 0.286 to 0.288 for T_{2/2}. With the same size of genotyped pigs, the accuracies of G80_ran were higher than those in G_alive by 12.70% ~ 13.76% for trait T_{4/4}, 10.92% ~ 12.20% for T_{2/4} and 10.14% ~ 11.46% for T_{2/2}. The trend of accuracies for a + m was the same as that for a. Thus, the accuracies of EBV for a + m in G_alive were 0.447 to 0.449 for trait T_{4/4}, 0.428 to 0.429 for T_{2/4} and 0.359 to 0.360 for T_{2/2}, and the accuracies of G80_ran were higher than those in G_alive by 5.35% ~ 6.04% for trait T_{4/4}, 2.56% ~ 2.57% for T_{2/4} and 3.06% ~ 3.34% for T_{2/2}. However, the trend of accuracies for m was different from those for a and a + m in terms of composition of genotyped individuals. The accuracies of EBV for m in G80_ran were similar to those in G_alive, and the differences among them were less than 0.01 for the three traits (P < 0.05).
As shown in Table 2, accuracies of the linear model were very similar to the logit and probit models for the three types of EBV, and the differences among them were less than 0.01 for the three traits. The differences of accuracies for a ranged from 0 to 0.008 for trait T_{4/4}, 0 to 0.008 for T_{2/4} and 0 to 0.007 for T_{2/2}. The differences of accuracies for m ranged from 0 to 0.008 for trait T_{4/4}, 0.001 to 0.006 for T_{2/4} and 0 to 0.001 for T_{2/2}. The differences of accuracies for a + m ranged from 0 to 0.002 for trait T_{4/4}, 0 to 0.001 for T_{2/4} and 0 to 0.001 for T_{2/2}.
In scenarios of G80_ran and G_alive, 20% animals did not have genotype data. Additional file 1: Table S1 shows that the accuracies of genotyped individuals were higher than those of nongenotyped pigs. The differences of accuracies for a ranged from 0.077 to 0.093 for trait T_{4/4}, 0.037 to 0.046 for T_{2/4} and 0.061 to 0.072 for T_{2/2}. The differences of accuracies for m ranged from 0.058 to 0.090 for trait T_{4/4}, 0.053 to 0.074 for T_{2/4} and 0.058 to 0.087 for T_{2/2}. The differences of accuracies for the total EBV ranged from 0.094 to 0.109 for trait T_{4/4}, 0.068 to 0.086 for T_{2/4} and 0.079 to 0.094 for T_{2/2}. In addition, the accuracies of the three types of EBV for nongenotyped animals (Additional file 1: Table S1) were higher than those for animals in scenario of without any genotype information (Table 2, G_none).
The regression coefficients of the EBV from the whole data with all animals having genotypes on the EBV from different reference data are presented in Table 3. The range of the regression coefficients of direct EBV were between 1.046 and 1.132 for T_{4/4}, 1.001 and 1.126 for T_{2/4}, 0.944 and 1.019 for T_{2/2}. The range of the regression coefficients of maternal (m) EBV were between 0.895 and 0.938 for T_{4/4}, 1.057 and 1.085 for T_{2/4}, 1.000 and 1.043 for T_{2/2}. The range of the regression coefficients of the total EBV (a + m) were between 0.974 and 1.026 for T_{4/4}, 1.082 and 1.122 for T_{2/4}, 0.960 and 1.013 for T_{2/2}. The regression coefficients around 1 indicated that dispersions of predictions were unbiased with respect to use of the different reference data. The regression coefficients for validation individuals with or without genotype are presented in Additional file 1: Table S2. The regression coefficients of genotyped individuals were similar to those of nongenotyped individuals for all three traits.
Table 4 shows the mean total TBV of the top 1% individuals with highest total EBV. It was observed that the higher the accuracy of EBV for a + m (Table 2), the higher the TBV. For trait T_{4/4}, the scenario of all individuals with genotypes obtained the highest TBV for a + m (4.498 to 4.553), followed by scenario G80_ran (4.297 to 4.346), after then by scenario G_alive (4.221 to 4.308), and the lowest was scenario G_none (2.583 to 2.712). The order of TBV for a + m from the four scenarios was the same in the other two traits T_{4/4} and T_{2/4}. The order of TBV for a is the same as that for a + m but not for m. The order of TBV for m between the scenarios G80_ran and G_alive was changed, G_alive was higher than G80_ran for T_{4/4} and T_{2/2}. When using genomic data, TBVs for a from linear model were higher than those from logit model and probit model. However, using pedigreebased models without genomic information, TBVs for a from linear model were lower than the logit and probit models. With or without genomic information, TBVs for maternal effect, (m) from linear model were lower than those from the logit and probit models for all traits.
Table 5 shows the mean total TBV of the top 30% individuals with highest total EBV. For all traits, the order of the four scenarios of total TBV of the top 30% individuals is consistent with that of the top 1% individuals, i.e., scenario G_all obtained the highest TBV, followed by scenario G80_ran, after then by scenario G_alive, and the lowest was scenario G_none. In the four scenarios, linear model outperformed the logit and probit models for a, but not for m.
Discussion
In this study, we compared four genotyping strategies and three prediction models when predicting breeding values for three pig survival traits with different direct and maternal heritabilities. When using variance components estimated from pedigreebased model, genomic predictions were unbiased with respect to dispersion of predictions, even for the scenario with genotypes only from alive animals. Random genotyping individuals led to higher prediction accuracy than only genotyping alive individuals, given the same number of genotyped animals. The linear model can achieve similar genomic prediction ability as the logit and probit models.
In the current study, variance components were estimated from pedigreebased model and these estimates were used for predicting breeding values in all genotyping scenarios. It has been reported that when selection is based on genomic information, genetic parameters estimated without this information can be biased [29]. Similarly, when selection is based on pedigree information, genetic parameters estimated using ssGBLUP model can also be biased [30]. However, the impact of selection on variance components estimates was not an issue in the current study, because the simulated population was a random selection population. On the other hand, the current study involved the issue of selective genotyping. In a pig breeding program, dead animals are usually not genotyped, which may lead to biased estimation of variance components and genomic prediction when using a genomic model for parameter estimation. We carried out an extra simulation study using models with genomic data and found that parameter estimation using ssGBLUP model with genotypes only from alive animals severely overestimated additive genetic variance and led to a residual variance close to zero (Additional file 1: Table S3). Similarly, Wang et al. [31] reported that selective genotyping severely overestimated additive genetic variance using a ssGBLUP model. Due to problems with convergence and biased estimation of variance components in some scenarios, variances estimated from pedigreebased models were used for predicting breeding values in the current study.
Due to the estimates from the three models are on different scales, they cannot be directly compared. By a transformation from observed scale heritability to liability scale heritability [32], the liability scale heritabilities estimated from the linear model were consistent with those used in simulating data. However, the logit and probit model underestimated direct heritabilities and overestimated the correlation between direct and maternal additive genetic effects. The possible reason could be that including maternal additive genetic effect in the model increase model complexity, and it is difficult to distinguish direct and maternal additive genetic effects as reflected by large standard error for the estimates of correlation between direct and maternal additive genetic effects in this study. The logit and probit animal model could be more sensitive to model complexity compared with the linear animal model. This could be also the reason that the logit and probit models did not perform better prediction than the linear model in the current study though the two models are more appropriate in theory.
In this study, we compared accuracies of total EBV of four genotyping strategies for three traits. Accuracies of total EBV of three strategies using genomic information outperformed that using only pedigree information, and the accuracies of genotyped individuals were higher than those of nongenotyped individuals in the same strategy. Furthermore, since nongenotyped animal benefit from genomic information of other animals, the accuracies of nongenotyped individuals in scenarios G80_ran or G_alive were higher than the individuals in scenario G_none. Those results are consistent with previous study for piglet mortality using a ssGBLUP method in Danish Landrace and Yorkshire pigs [15]. Among the three strategies using genomic information, accuracies of total EBV of the strategy genotyping all individuals in the reference population was superior to the strategy genotyping only some individuals, the result was also consistent with theoretical expectations [33]. However, with the same size of genotyped individuals, genotyping both alive and dead pigs have a higher accuracy than genotyping only for alive pigs, indicating that the genotypes of dead pigs have an important influence on the accuracy of genomic prediction. Therefore, it could be a good strategy to genotype dead animals. In the current study, genetic values were generated from 730 QTLs for which the direct and maternal additive genetic effects followed a bivariate distribution, since previous studies [34] have revealed that pig mortality is a complex trait and has a polygenic genetic architecture. In case of pig mortality is controlled by a small number of genes, the frequency of unfavorable genes would be largely different between dead animals and alive animals, implying greater need to genotype dead animals for genomic prediction of pig mortality. A study based on real data of pig mortality will be of great importance, however genotype data of dead pigs are not available currently in a pig breeding program.
As expected, the trait with higher heritability had higher prediction accuracy. Further, with the same heritability for direct and maternal additive genetic effect of traits T_{4/4} and T_{2/2}, accuracies of direct EBV (a) were higher than those of maternal EBV (m) for scenarios of G_all, G80_ran, and G_none, indicating maternal genetic effect is more difficult to estimate in general (Table 1). However, accuracies of maternal EBV were higher than those of direct EBV in scenario of G_alive, achieving accuracies similar to those in scenario G80_ran, suggesting selective genotyping for alive animal has small impact on prediction accuracy for maternal additive genetic effect, but large impact on predicting direct additive genetic effect.
We compared the accuracy of genomic prediction of a linear model, a logit model and a probit model for survival in pigs. Using pedigree information, accuracies of total EBV were very similar among the three models, the differences were less than 1% for all traits T_{4/4}, T_{2/4} and T_{2/2}. Previous studies have shown that linear, the logit and probit models have similar predictive capabilities for threshold traits [19, 20, 36]. In a simulation study, Carlén et al. [36] showed the prediction ability of linear and threshold models were very similar for mastitis which was defined as a binary trait in Dairy Cattle. Koeck et al. [19] evaluated the performance of a linear, a logit and a probit model for genetic analyses of clinical mastitis in Austrian Fleckvieh dual purpose cows and showed that there were very small differences in the predictive ability among the three models. In a Norwegian Red cows population, Vazquez et al. [20] also observed similar results when comparing the genetic predictive ability of threshold and linear models for clinical mastitis. Using genomic information, accuracies of total EBV were higher than those only using pedigree information, but like pedigreebased prediction, accuracies were very similar among linear, logit and threshold models for all the three traits in the current study. Although the logit and probit models were hypothesized to be more suitable for threshold traits, the results indicated that the predictive power of the linear, the logit and probit models are similar in genomic prediction for survival traits.
Conclusions
In this study, three survival traits with different heritabilities were simulated to explore the impact of genotyping strategies and statistical models on genomic prediction. The results showed that genomic predictions with genotypes only from alive animals were unbiased when using variance components estimated from pedigreebased model. Randomly genotyping individuals can obtain higher accuracy than only genotyping alive individuals, given the same number of genotyped individuals. The predictive powers of the linear model, the logit and probit models were similar. We conclude that the genomic information of dead individuals is very useful, and linear model is a good choice for genomic prediction of survival in pigs. It is recommended to use variances estimated from pedigreebased model for genomic prediction in the case of selective genotyping.
Availability of data and materials
The datasets analyzed during the current study are available from the corresponding author on reasonable request.
Abbreviations
 EBV:

Estimated breeding value
 GBLUP:

Genomic best linear unbiased prediction
 GEBV:

Genomic estimated breeding value
 LG:

Logit model
 LM:

Linear model
 PM:

Probit model
 QTL:

Quantitative trait locus
 ssGBLUP:

Singlestep GBLUP model
 TBV:

True breeding value
References
Knauer MT, Hostetler CE. Us swine industry productivity analysis, 2005 to 2010. J Swine Health Prod. 2013;21(5):248–52.
Koketsu Y, Iida R, Piñeiro C. A 10year trend in piglet preweaning mortality in breeding herds associated with sow herd size and number of piglets born alive. Porcine Health Management. 2021;7(1):4.
Schaeffer LR. Strategy for applying genomewide selection in dairy cattle. J Anim Breed Genet. 2006;123(4):218–23.
Luan T, Woolliams JA, Lien S, Kent M, Svendsen M, Meuwissen THE. The accuracy of genomic selection in norwegian red cattle assessed by crossvalidation. Genetics. 2009;183(3):1119–26.
VanRaden PM, Van Tassell CP, Wiggans GR, Sonstegard TS, Schnabel RD, Taylor JF, et al. Invited review: Reliability of genomic predictions for north american holstein bulls. J Dairy Sci. 2009;92(1):16–24.
Lillehammer M, Meuwissen THE, Sonesson AK. Genomic selection for maternal traits in pigs. J Anim Sci. 2011;89(12):3908–16.
Ostersen T, Christensen O, Henryon M, Nielsen B, Su G, Madsen P. Deregressed ebv as the response variable yield more reliable genomic predictions than traditional ebv in purebred pigs. Genet Sel Evol. 2011;43(1):38.
Christensen OF, Madsen P, Nielsen B, Ostersen T, Su G. Singlestep methods for genomic evaluation in pigs. Animal. 2012;6(10):1565–71.
Chen CY, Misztal I, Aguilar I, Tsuruta S, Meuwissen THE, Aggrey SE, et al. Genomewide markerassisted selection combining all pedigree phenotypic information with genotypic data in one step: An example using broiler chickens. J Anim Sci. 2011;89(1):23–8.
Wolc A, Arango J, Settar P, Fulton J, O’Sullivan N, Preisinger R, et al. Persistence of accuracy of genomic estimated breeding values over generations in layer chickens. Genet Sel Evol. 2011;43(1):23.
Liu T, Qu H, Luo C, Shu D, Wang J, Lund M, et al. Accuracy of genomic prediction for growth and carcass traits in chinese tripleyellow chickens. BMC Genet. 2014;15(1):110.
Su G, Guldbrandtsen B, Gregersen VR, Lund MS. Preliminary investigation on reliability of genomic estimated breeding values in the danish holstein population. J Dairy Sci. 2010;93(3):1175–83.
Meuwissen THE, Hayes BJ, Goddard ME. Prediction of total genetic value using genomewide dense marker maps. Genetics. 2001;157(4):1819–29.
Knol EF, Nielsen B, Knap PW. Genomic selection in commercial pig breeding. Anim Front. 2016;6(1):15–22.
Guo X, Christensen OF, Ostersen T, Wang Y, Lund MS, Su G. Improving genetic evaluation of litter size and piglet mortality for both genotyped and nongenotyped individuals using a singlestep method1. J Anim Sci. 2015;93(2):503–12.
Leite NG, Knol EF, Garcia ALS, Lopes MS, Zak L, Tsuruta S, et al. Investigating pig survival in different production phases using genomic models. J Anim Sci. 2021;99(8):skab217.
Su G, Sorensen D, Lund MS. Variance and covariance components for liability of piglet survival during different periods. Animal. 2008;2(2):184–9.
Gianola D, Foulley JL. Sire evaluation for ordered categorical data with a threshold model. Genet Sel Evol. 1983;15(2):201–24.
Koeck A, Heringstad B, EggerDanner C, Fuerst C, FuerstWaltl B. Comparison of different models for genetic analysis of clinical mastitis in austrian fleckvieh dualpurpose cows. J Dairy Sci. 2010;93(9):4351–8.
Vazquez AI, PerezCabal MA, Heringstad B, RodriguesMotta M, Rosa GJM, Gianola D, et al. Predictive ability of alternative models for genetic analysis of clinical mastitis. J Anim Breed Genet. 2012;129(2):120–8.
Sargolzaei M, Schenkel FS. Qmsim: A largescale genome simulator for livestock. Bioinformatics. 2009;25(5):680–1.
Ma X, Christensen OF, Gao H, Huang R, Nielsen B, Madsen P, et al. Prediction of breeding values for grouprecorded traits including genomic information and an individually recorded correlated trait. Heredity. 2021;126(1):206–17.
Henderson CR. Best linear unbiased estimation and prediction under a selection model. Biometrics. 1975;31(2):423–47.
Christensen OF, Lund MS. Genomic prediction when some animals are not genotyped. Genet Sel Evol. 2010;42(1):2.
VanRaden PM. Efficient methods to compute genomic predictions. J Dairy Sci. 2008;91(11):4414–23.
Jensen J, Mäntysaari EA, Madsen P, Thompson R. Residual maximum likelihood estimation of (co)variance components in multivariate mixed linear models using average information. J Indian Soc Agric Stat. 1997;49:215–36.
Madsen P, Su G, Labouriau R, Christensen OF. Dmu  a package for analyzing multivariate mixed models. In: 9th World Congress on Genetics Applied to Livestock Production. Leipzig, Germany; 2010. paper 732.
Legarra A, Reverter A. Semiparametric estimates of population accuracy and bias of predictions of breeding values and future phenotypes using the lr method. Genet Sel Evol. 2018;50(1):53.
Hidalgo J, Tsuruta S, Lourenco D, Masuda Y, Huang Y, Gray KA, et al. Changes in genetic parameters for fitness and growth traits in pigs under genomic selection. J Anim Sci. 2020;98(2):skaa032.
Gao H, Madsen P, Aamand GP, Thomasen JR, Sorensen AC, Jensen J. Bias in estimates of variance components in populations undergoing genomic selection: A simulation study. BMC Genomics. 2019;20(1):956.
Wang L, Janss LL, Madsen P, Henshall J, Huang CH, Marois D, et al. Effect of genomic selection and genotyping strategy on estimation of variance components in animal models using different relationship matrices. Genet Sel Evol. 2020;52(1):31.
Dempster ER, Lerner IM. Heritability of threshold characters. Genetics. 1950;35(2):212–36.
Daetwyler HD, Villanueva B, Woolliams JA. Accuracy of predicting the genetic risk of disease using a genomewide approach. PLoS ONE. 2008;3(10):e3395.
Guo X, Su G, Christensen OF, Janss L, Lund MS. Genomewide association analyses using a bayesian approach for litter size and piglet mortality in danish landrace and yorkshire pigs. BMC Genomics. 2016;17:468.
Ding R, Qiu Y, Zhuang Z, Ruan D, Wu J, Zhou S, et al. Genomewide association studies reveals polygenic genetic architecture of litter traits in duroc pigs. Theriogenology. 2021;173:269–78.
Carlén E, Emanuelson U, Strandberg E. Genetic evaluation of mastitis in dairy cattle using linear models, threshold models, and survival analysis: A simulation study. J Dairy Sci. 2006;89(10):4049–57.
Acknowledgements
Not applicable.
Funding
This study was funded by the “Genetic improvement of pig survival” project from Danish Pig Levy Foundation (Aarhus, Denmark). The China Scholarship Council (CSC) is acknowledged for providing scholarship to the first author.
Author information
Authors and Affiliations
Contributions
GS and TL conceived and designed the study. TL simulated and analyzed data. TL and GS wrote the manuscript. BN, OFC and MSL helped in interpreting results and improved the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Supplementary Information
Additional file 1:
Table S1. Correlation coefficient between the EBV and true breeding values for validation individuals with or without genotypes. Table S2. Regression coefficient of the EBV from whole data on the EBV from reference data for validation individuals with or without genotype. Table S3. Estimates of variances and heritability using a linear model without maternal additive genetic effect for the trait T_{4/4}.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Liu, T., Nielsen, B., Christensen, O.F. et al. The impact of genotyping strategies and statistical models on accuracy of genomic prediction for survival in pigs. J Animal Sci Biotechnol 14, 1 (2023). https://doi.org/10.1186/s40104022008005
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s40104022008005
Keywords
 Genomic prediction
 Genotyping strategy
 Simulation
 Statistical models
 Survival