Estimates of genomic inbreeding and identification of candidate regions that differ between Chinese indigenous sheep breeds

A run of homozygosity (ROH) is a consecutive tract of homozygous genotypes in an individual that indicates it has inherited the same ancestral haplotype from both parents. Genomic inbreeding can be quantified based on ROH. Genomic regions enriched with ROH may be indicative of selection sweeps and are known as ROH islands. We carried out ROH analyses in five Chinese indigenous sheep breeds; Altay sheep (n = 50 individuals), Large-tailed Han sheep (n = 50), Hulun Buir sheep (n = 150), Short-tailed grassland sheep (n = 150), and Tibetan sheep (n = 50), using genotypes from an Ovine Infinium HD SNP BeadChip. A total of 18,288 ROH were identified. The average number of ROH per individual across the five sheep breeds ranged from 39 (Hulun Buir sheep) to 78 (Large-tailed Han sheep) and the average length of ROH ranged from 0.929 Mb (Hulun Buir sheep) to 2.544 Mb (Large-tailed Han sheep). The effective population size (Ne) of Altay sheep, Large-tailed Han sheep, Hulun Buir sheep, Short-tailed grassland sheep and Tibetan sheep were estimated to be 81, 78, 253, 238 and 70 five generations ago. The highest ROH-based inbreeding estimate (FROH) was 0.0808 in Large-tailed Han sheep, whereas the lowest FROH was 0.0148 in Hulun Buir sheep. Furthermore, the highest proportion of long ROH fragments (> 5 Mb) was observed in the Large-tailed Han sheep breed which indicated recent inbreeding. In total, 49 ROH islands (the top 0.1% of the SNPs most commonly observed in ROH) were identified in the five sheep breeds. Three ROH islands were common to all the five sheep breeds, and were located on OAR2: 12.2–12.3 Mb, OAR12: 78.4–79.1 Mb and OAR13: 53.0–53.6 Mb. Three breed-specific ROH islands were observed in Altay sheep (OAR15: 3.4–3.8 Mb), Large-tailed Han sheep (ORA17: 53.5–53.8 Mb) and Tibetan sheep (ORA5:19.8–20.2 Mb). Collectively, the ROH islands harbored 78 unique genes, including 19 genes that have been documented as having associations with tail types, adaptation, growth, body size, reproduction or immune response. Different ROH patterns were observed in five Chinese indigenous sheep breeds, which reflected their different population histories. Large-tailed Han sheep had the highest genomic inbreeding coefficients and the highest proportion of long ROH fragments indicating recent inbreeding. Candidate genes in ROH islands could be used to illustrate the genetic characteristics of these five sheep breeds. Our findings contribute to the understanding of genetic diversity and population demography, and help design and implement breeding and conservation strategies for Chinese sheep.


Introduction
Selection is one of the main forces reshaping the genomes of domestic animals. A genomic region subjected to intense selection would leave a footprint as a result of the selection and this is known as a selection signature. These signatures might demonstrate increased frequency of favorable allele(s), and reduced nucleotide diversity around the selected locus [1]. The reduction in genetic variation can be characterized as consecutive segments of homozygous genotypes (i.e. runs of homozygosity; ROH). In animal breeding, selection plays the vital role in achieving genetic gain. Mating among related animals cannot be avoided because only a small proportion of individuals are elite, and these tend to be used more widely than average animals. Inbreeding results in loss of genetic diversity, the emergence of harmful recessive mutations, as well as reducing productive performance, notably fecundity. All these effects impact the profitability and sustainability of livestock and poultry [2,3].
Traditionally, inbreeding was characterized in terms of the inbreeding coefficient, calculated according to pedigree records (F PED ) as proposed by Wright [4]. F PED reflects the probability that a pair of alleles is identical by descent (IBD), which is a statistical expectation [5]. Moreover, F PED is usually underestimated because it assumes the founder animals in the pedigree are unrelated in which case F PED ignores historical inbreeding prior to the founders [6][7][8]. The availability of genome sequencing data and high-density SNP genotypic data provides an opportunity to evaluate inbreeding at a molecular level. Inbreeding evaluation based on genomic information could reflect the true inbreeding value by directly identifying alleles at a locus that are IBD.
An individual inheriting the same haplotype from both parents exhibits ROH [9]. McQuillan et al. [10] first used ROH to compute the genomic inbreeding coefficient (F ROH ) in human. In animal genetics, ROH has been used to estimate whole-genome inbreeding at both the individual and population levels [11,12] and to detect selection signatures [13][14][15]. Forutan et al. [7] reported that based on simulation F ROH provides the closest estimates to true inbreeding [5]. Moreover, the correlations between F PED and F ROH were moderate to high (0.62-0.75) [16][17][18], so F ROH was considered as an alternative approach to evaluate the inbreeding degree of an individual [5], particularly when pedigree information is unavailable or might be unreliable.
The length of IBD segments follows an inverse exponential distribution with a mean of 1/2 t Morgans, where t represents the number of generations from a common ancestor [19]. Therefore, the length of ROH can be used to infer inbreeding history. Shorter ROH are the result of more ancient inbreeding, while longer ROH suggest more recent inbreeding [9]. Hence, the detection and characterization of ROH can provide insight into how population history, structure and demography evolved over time [9,17,20] and to characterize inbreeding levels [15,21]. Genomic regions enriched with ROH tend to generate ROH hotspots, which are also called ROH islands [9]. ROH islands could be used to positionally identify genes under natural or artificial selection in past adaptation and breeding processes [14,15,17,22].
The aim of this study was to investigate the occurrence and distribution of ROH in five Chinese indigenous sheep breeds using genotypes assayed from the Ovine Infinium HD SNP BeadChip. These five sheep breeds are regionally disparate and possess breed specific characterizations. Based on ROH, we calculate genomic inbreeding coefficients and identify candidate genes residing in ROH islands in these Chinese indigenous sheep breeds.
Historical effective population sizes (Ne) of the five sheep breeds were computed as below by rearranging a formula proposed by Sved [25]: where N e(t) is the effective population size t generations prior to the genotyped animals and t is approximately equal to 1 2c [26]. The parameter c represents the genetic distance between two SNPs expressed in Morgans, such that c = 0.5 represents no linkage between two loci. We relate the sheep linkage map distances between pairwise markers from their physical locations on the same autosome according to the ratio of the total physical distance to the total recombinant genetic distance. The total physical distance and total genetic distance of sheep were obtained from the links https://ftp.ncbi.nlm.nih. gov/genomes/refseq/vertebrate_mammalian/Ovis_aries/ all_assembly_versions/suppressed/GCF_000298735.1_ Oar_v3.1/ and https://ftp.ncbi.nlm.nih.gov/genomes/ MapView/Ovis_aries/non_sequence/, respectively. E(r 2 | c t ) is the mean values of r 2 between all pairwise SNPs spanning specific physical distance across all autosomes. In this study, the average ratio of the total physical distance to the total recombinant genetic distance was 1.415, and c = 0.1 M amounted to the average physical distance between SNP pairs of around 7.06 Mb, which can estimate Ne 5 generations previous to the genotyped animals. To better understand the historical change and trend of Ne for each breed, Ne of 1,000, 500, 200, 100, 50, 20, 10 and 5 generations ago were estimated, respectively.

Identification of runs of homozygosity
The R package detectRUNS was used to detect ROH per individual [27]. The following criteria were set to detect ROH: (1) a sliding window of 50 SNPs; (2) a maximum of one heterozygous genotype per window; (3) the default value 0.05 as the threshold of the sliding window; (4) the maximum gap of 500 kb between two consecutive SNPs in ROH; (5) the minimum SNP density per ROH was set to one SNP every 50 kb; (6) the minimum ROH length was set to 500 Kb to exclude short ROH due to LD; (7) To minimize the number of ROH detected by chance, the minimum number of SNPs that constituted a ROH was set based on the method proposed by Lencz et al. [28]. That is: where l is the minimum number of SNPs that must be in a ROH, n SNP is the number of SNPs of each individual, n i is the total number of individuals in the whole population, α is the false positive rate of identified ROH (set to 0.05 in the present study) and N het was the mean heterozygosity of all SNPs. In this study, calculated from our genotypic data, l was equal to 53 SNPs.

ROH size categories
For each sheep breed, the average number of ROH per individual and the average length of ROH were estimated. The identified ROH were divided into five classes based on their length: 0.5-5 Mb, 5-10 Mb, 10-15 Mb, 15-20 Mb and > 20 Mb. Frequency of the ROH numbers in each length category was calculated for the five sheep breeds. For each category, the mean sum of ROH per animal for each breed was calculated by summing all ROH in that category and averaging the sum number of animals in that breed. The ROH number of each chromosome for the five sheep breeds were counted respectively as well as the total length and total number of ROH for each animal.

Estimation of ROH-based inbreeding coefficients
Genomic inbreeding coefficients based on ROH (F ROH ) were computed for each individual using the equation proposed by McQuillan et al. [10]: where ∑L ROH is the total length of all the ROH identified in an individual, L AUTO is the total length of the autosomes covered by SNPs, which was 2,452 Mb in our study. To investigate differences of F ROH on each chromosome, we calculated F ROH for each chromosome. Moreover, inbreeding coefficient based on expected number of homozygous genotypes (F HOM ) was calculated using PLINK v1.9 [23]. Pearson's correlation between F ROH and F HOM was calculated.

Detection of ROH islands
To identify the genomic regions most commonly associated with ROHs, the percentage of occurrence of SNPs in ROH was calculated by counting the number of times that a SNP was detected in an ROH across all the individuals in each breed. In this study, the top 0.1% of the SNPs observed in ROHs was selected as the threshold for identifying the genomic regions most commonly associated with ROH in each breed. A series of adjacent SNPs that exceeded this threshold formed a genomic region which we refer to as an ROH island. In this study, the breed specific thresholds were 40%, 44%, 37%, 39% and 50% of the individuals sharing the overlapping homozygous regions (ROH islands) in Altay sheep, Large-tailed Han sheep, Hulun Buir sheep, Short-tailed grassland sheep and Tibetan sheep, respectively. Gene annotation in ROH islands was performed on the basis of sheep reference genome Ovis_aries.Oar_v3.1. The biological function of the genes residing in ROH islands was conducted by survey of relevant literature.

Results
Estimation of LD and effective population size (Ne)   Table S1.

ROH detection
A total of 18,288 ROHs were identified across the five sheep breeds.  Table 2 shows the percentages of ROH numbers in five length categories of 0.5-5, 5-10, 10-15, 15-20 and > 20 Mb in each breed.
Regardless of breed, most ROH were shorter than 5 Mb. Compared with the other four sheep breeds, Large-tailed Han sheep had a higher proportion of long ROHs (> 5 Mb). Fig. 3 illustrates the mean sum of ROH in each length category of the five sheep breeds. Large-tailed Han had the highest mean sum of ROH in all ROH length categories. Especially in the category of > 20 Mb, the gap between Large-tailed Han and other breeds was more prominent. As seen in Fig. 4, the percentages of ROH numbers on autosomes varied but the trends across the five sheep breeds tended to be similar. The highest percentage was observed on OAR2 in all five sheep breeds, while the lowest percentage of ROH number was on OAR24 in Short-tailed grassland sheep and OAR26 in the other four breeds. On the whole, the numbers of ROH per chromosome tended to increase with chromosome length, with the average correlation coefficient of 0.934 across all sheep breeds.   Genomic inbreeding coefficients  Fig. 7 displays the percentage of occurrence of SNPs in ROH against the position of the SNP along all the autosomes. As seen in Fig. 7, ROH islands were mainly distributed on OARs 2, 9, 10, 12 and 13, with many overlap regions observed among the five sheep breeds. Totally, 49 genomic regions were identified as ROH islands in the five sheep breeds ( Table 3). Three of those genomic regions were common to all the five breeds. They were located on OAR2: 12.  Table S2.

Discussion
Linkage disequilibrium (LD) and effective population size (Ne) affected by demography In this study, we collected five Chinese indigenous sheep breeds with different tail types: short fat-tailed (Short-tailed grassland sheep), medium fat-tailed (Hulun Buir sheep), long fat-tailed (Large-tailed Han sheep), fatrumped (Altay sheep), and thin-tailed sheep (Tibetan sheep). Large-tailed Han sheep possess the fattiest and largest tails of all Chinese local sheep breeds. The conspicuous feature of Large-tailed Han sheep is their long fat tails, which can reach the ground in some extreme individuals. A remarkable feature of Altay sheep is their       [29]. Their Ne were about 253 and 238 at five generation ago, respectively, which were close to Ne of Sunite sheep (207) at seven generations ago in our previous study [30]. Like Hulun Buir sheep and Short-tailed grassland sheep, Sunite sheep also originated from Mongolia sheep and had a similar breed history and management system. These results demonstrate the high genetic diversity of Mongolian sheep.

ROH and ROH-based inbreeding coefficient (F ROH )
Since the length of ROH can be used to infer when inbreeding happened, the number, length and distribution of ROH can provide valuable information about the demography history. Furthermore, we can further utilize the lengths of ROH to estimate the ROH-based inbreeding coefficients. In the current study, ROH identified in all five sheep breeds were unevenly distributed (Fig. 4), with OAR2 having the largest number of ROH among all sheep populations. The number of ROH had high positive correlation with chromosome length (0.934).
Our results were consistent with other sheep breeds [31,32]. However, the smallest number of ROH per chromosome was on different chromosomes in different sheep breeds [31,32]. Moreover, the mean numbers of ROH varied in the five sheep breeds as well as the average   (Fig. 3). Moreover, the most individuals carrying a large number of ROH with a total length ≥ 600 Mb were mainly observed in Largetailed Han sheep. Furthermore, Large-tailed Han sheep showed the highest F ROH in both genome (0.0808) and chromosome level (Fig. 7). These results demonstrate that Large-tailed Han sheep had low genetic diversity, and more recent inbreeding events. This might be due to the uncontrolled mating of related individuals in the national Large-tailed Han sheep conservation of China where we sampled these individuals. On the contrary, the Hulun Buir sheep breeds exhibited the least mean number of ROH per animal (39.08) and the shortest average length of ROH (0.929 Mb). Hulun Buir sheep also showed the lowest F ROH , followed by Short-tailed grassland sheep which was consistent with the results of effective population size. These reflected their low level of inbreeding resulting from management systems based on random mating in the grassland. The difference of mean number per animal and average length of ROH may reflect the demography of the different populations. In general, the results of ROH had reflected the inbreeding and population history of the five sheep breeds, and the results of LD and effective population size basically supported and verified the results of ROH. Our results seemed to indicate that ROH can be used as a useful tool for inbreeding evaluation and livestock conservation.
Candidate genes within ROH islands ROH islands are generated from natural or artificial selection and could be used to identify selection signatures. In the process of long-term domestication and adaptation, sheep breeds have formed breedspecific traits. The high frequency homozygous fragments in the genome representing ROH islands can be used to elucidate the genetic mechanism of the breed specific traits. The thresholds in the present study were more stringent than those of other studies using low-density chips [12,13], which could avoid false positive results. There were three breed-specific ROH islands: in Altay sheep, Large-tailed Han sheep and Tibetan sheep. In Altay sheep, the specific ROH island was located on OAR15: 3.4-3.8 Mb. In that genomic region, the fat-tail sheep breeds (Large-tailed Han sheep, Hulun Buir sheep and Short-tailed grassland sheep) also had peaks close to the top 0.1% threshold line (Fig. 7). This genomic region harbors PDGFD that has been documented as a causal gene for fat deposition in sheep tails [33][34][35][36][37]. Moreover, HOXA10 was identified in overlapped ROH island (OAR4: 68.7-69.1 Mb) of Hulun Buir sheep, Short-tailed grassland sheep and Large-tailed Han sheep populations. HOXA10 was identified as a candidate gene related to tail type by selection signature detection [35] and further validated as a candidate gene strongly linked with fat deposition in sheep tail by RNA Seq [38]. In addition, PCCB resided in the overlapped ROH islands of Largetailed Han sheep and Altay sheep, and is involved in the metabolism of fatty acids in pig [39].
These results were supported by the samples with obvious breed feature in terms of tail types. According to sheep tail morphology, the five sheep breeds can be classified into five classes: long fat-tailed (Large-tailed Han sheep), median fat-tailed (Hulun Buir sheep), short fattailed (Short-tailed grassland sheep), fat-rumped (Altay sheep) or thin-tailed sheep (Tibetan sheep). In the Tibetan sheep population, the breed specific ROH island resided in OAR5: 19.8-20.2 Mb. That genomic region harbored P4HA2, which is related to hypoxic adaptation and can be induced to express in hypoxic conditions [40,41]. This may indicate that P4HA2 gene had been selected in the process of Tibetan sheep adapting to high altitude environment. In Large-tailed Han sheep population, the breed specific ROH island was on the OAR17: 53.5-53.8 Mb. On that region, P2RX7 was also annotated, and that gene had been found to be associated with the final weight and backfat thickness of Landrace pigs [42].
Three ROH islands located on OAR2: 12.2-12.3 Mb, OAR12: 78.4-79.1 Mb and OAR13: 53.0-53.6 Mb were common to all the five sheep breeds. The latter two genomic regions harbored four important candidate genes of TNNI1, CSRP1, EEF1A2 and ABHD16B. TNNI1 has been implicated with carcass, growth and meat quality traits in pigs [43,44] and cattle [45]. CSRP1 was identified as a strong candidate gene associated with growth and carcass traits through SNV and haplotype analysis in the Chinese beef cattle [46]. EEF1A2 was involved in muscle development and lipid metabolism during fetal development in sheep [47]. Furthermore, GJB2, GJB6 and GJA3 were found in overlapping ROH islands of Hulun Buir sheep and Short-tailed grassland sheep, and have documented associations with body size and development by selection signature detection of Egyptian sheep and goat populations [48]. ABHD16B is the potential causative protein-altering variant for male infertility in Holstein cattle [49]. Other genes have documented involvement in reproduction. IFT81 was identified from ROH island in Large-tailed Han sheep population, and played an essential role in spermiogenesis and fertility male mice [50]. NPBWR2 was located on the overlapped ROH islands in Altay sheep, Hulun Buir sheep, Shorttailed grassland sheep and Large-tailed Han sheep, and play a role in modulating the reproductive activity in the pig [51]. HOXA3 resided in the overlapped ROH island in Hulun Buir sheep, Short-tailed grassland sheep and Large-tailed Han sheep populations and was reported to be expressed in bovine oocytes and early-stage embryos and may influence oocyte maturation and the first stages of embryonic development [52]. LATS2 was located on the overlapped ROH islands in Hulun Buir sheep and Short-tailed grassland sheep, and plays an essential role in embryonic development, proliferation control and genomic integrity [53].
In addition, we identified several genes related to immune and inflammatory response. We identified CSF2 and IL3 from the ROH islands of Tibetan sheep. Previous study had shown that CSF2 played an important role in immunity regulation, hematopoiesis and inflammation response [54][55][56]. Furthermore, CSF2 was also reported that played pivotal roles in implantation events during early pregnancy in pigs [57] and influence the reproductive capacity in mice [58]. IFT88 resided in the overlapped ROH islands from Hulun Buir sheep and Short-tailed grassland sheep and had been reported to be involved in the inflammatory response of interleukin-1 [59].

Conclusions
In this study, we used genotypes assayed using an Ovine Infinium HD SNP BeadChip to characterize the pattern of LD, estimate the effective population sizes and investigate the occurrence and distribution of ROH across the genomes of five Chinese indigenous sheep breeds. Different LD and ROH patterns were observed in the five breeds. The large-tailed Han sheep population had the highest genomic inbreeding coefficients and the highest proportion of long ROH fragments which reflect recent inbreeding events. On the contrary, the opposite conditions were present in Hulun Buir sheep. In total, 49 ROH islands were identified. Three ROH islands were common to all the breeds, and three breed-specific ROH islands were in Altay sheep, Large-tailed Han sheep and Tibetan sheep. These ROH islands harbored 78 unique genes, including 19 genes documented as being involved in tail types, adaptation, growth, body size, reproduction or immune response. Our findings contribute to the understanding of genetic diversity, population demography and the underlying genetic mechanism of economically important traits, and help design and implement breeding and conservation strategies for Chinese sheep.
Additional file 1: Table S1 The effective population size across generations for each breed. Table S2 ROH hotspots identified in five Chinese indigenous sheep breeds and candidate genes annotated.