Construction of a transposase accessible chromatin landscape reveals chromatin state of repeat elements and potential causal variant for complex traits in pigs
Journal of Animal Science and Biotechnology volume 13, Article number: 112 (2022)
A comprehensive landscape of chromatin states for multiple mammalian tissues is essential for elucidating the molecular mechanism underlying regulatory variants on complex traits. However, the genome-wide chromatin accessibility has been only reported in limited tissue types in pigs.
Here we report a genome-wide landscape of chromatin accessibility of 20 tissues in two female pigs at ages of 6 months using ATAC-seq, and identified 557,273 merged peaks, which greatly expanded the pig regulatory element repository. We revealed tissue-specific regulatory elements which were associated with tissue-relevant biological functions. We identified both positive and negative significant correlations between the regulatory elements and gene transcripts, which showed distinct distributions in terms of their strength and distances from corresponding genes. We investigated the presence of transposable elements (TEs) in open chromatin regions across all tissues, these included identifications of porcine endogenous retroviruses (PERVs) exhibiting high accessibility in liver and homology of porcine specific virus sequences to universally accessible transposable elements. Furthermore, we prioritized a potential causal variant for polyunsaturated fatty acid in the muscle.
Our data provides a novel multi-tissues accessible chromatin landscape that serve as an important resource for interpreting regulatory sequences in tissue-specific and conserved biological functions, as well as regulatory variants of loci associated with complex traits in pigs.
Eukaryotic genomes are usually packed into nucleosomes, which comprise of 147 bp DNA wrapping around a histone octamer, forming the structural units of chromatin. Chromatin is categorized as active euchromatin or inactive heterochromatin based on its accessibility [1, 2], which is linked to fundamental cellular processes including gene transcription, DNA replication and repair . The accessible chromatin often reflects binding of transcriptional machinery to cis-regulatory elements such as promoters and enhancers [4,5,6,7,8]. Profiling the accessible chromatin in different tissues could help to interpret regulatory basis underlying spatiotemporal transcription of genes [9,10,11], as well as genomic variants causing diseases susceptibility and complex trait variations [12,13,14,15].
Over the past decades, several chromatin accessibility profiling methods including DNase I hypersensitive site sequencing (DNase-seq), micrococcal nuclease sequencing (MNase-seq) and assay for transposase-accessible chromatin using sequencing (ATAC-seq) have been developed. Among these, ATAC-seq has become increasingly popular because of its simplicity, high sensitivity and low cell number requirement , and has been applied to construct regulatory elements landscapes in human and mouse [17,18,19,20]. The international consortiums like, Encyclopedia of DNA Elements (ENCODE) , Roadmap Epigenomics projects  and the Functional Annotation of Animal Genomes (FAANG)  have played important roles in generating the epigenetic data in multiple species. There have been efforts to profile accessible chromatin in pigs [24,25,26]. However, only few tissues including fat, cerebellum, cortex, heart, liver, muscle, spleen, lung and hypothalamus were covered in these studies. Moreover, studies have investigated chromatin states of TEs including LINE, SINE, LTR and DNA transposons in human and mouse [27,28,29], allowing better understanding the regulatory function of TEs. However, the chromatin landscape of pig genomic TEs has not been explored yet. In particular, the PERV, as a type of TE, whose sequences are inserted into the pig genome, can be transmitted to humans after xenotransplantation, causing disease risk [30, 31].
Here, we performed ATAC-seq on 40 samples from 20 tissues in 2 female Duroc-Landrace-Yorkshire (DLY) commercial pigs, including 10 tissues (medulla spinalis, bronchia, parotid gland, pharynx, stomach, small intestine (SI), kidney, ovary, cervix, thymus) that have not been investigated in previous studies. We inferred the open chromatin regions displaying tissue-specific or tissue-conserved accessibility. By integrating transcriptomic data (RNA-seq), we revealed open chromatin regions that significantly correlated with gene transcription. In addition, we examined TEs including the PERVs and different TEs in the open chromatin regions, providing insights into the regulatory role of the TEs and their evolutionary history. The data and analyses in this study provide vital resources to explore the function of cis-regulatory elements in determining tissue-specific biological functions and complex traits in pigs.
Methods and materials
The pigs used in this study were obtained from a slaughter house in Nanchang, Jiangxi province. 20 tissues (brain, cerebellum, hypothalamus, medulla spinalis, lung, bronchia, parotid gland, pharynx, stomach, small intestine, liver, kidney, cervix, ovary, heart, thymus, longissimus dorsi, semimembranosus, backfat, leaf fat) was collected from 2 healthy commercial sows at age of 6 months. After slaughter, tissue samples were collected in sterile tube and stored at − 80 °C.
ATAC-seq library preparation and sequencing
ATAC-seq libraries were prepared as previously described with minor modifications . Briefly, samples were ground to powder with liquid nitrogen using mortar. Then, we homogenized the tissue powder in 2 mL ice-cold homogenization buffer (10% NP40, 500 mmol/L EDTA, 1 mol/L sucrose, 100 mmol/L PMSF, 14.3 mol/L β-mercaptoethanol, 1 mol/L CaCl2, 1 mol/L Mg(Ac)2, 1 mol/L Tris [pH 7.8], nuclease-free H2O) using a 7-mL Dounce homogenizer for 20 strokes. We filtered the solution through a 70-μm cell strainer and gradient centrifugation with different concentrations of iodixanol. After that, approximately 60,000 nuclei from each sample were collected, centrifuged at 500 × g for 5 min at 4 °C, washed the pellet with 1 mL ATAC-RSB (5 mol/L NaCl, 1 mol/L MgCl2, 1 mol/L Tris-HCl [pH 7.4], nuclease-free H2O) containing 0.1% Tween-20, centrifuged at 500 × g for 10 min at 4 °C; and resuspended the pellet in 50-μL of tagmentation reaction mix (2 × Tagmentation buffer, 10 × PBS, 0.5% Digitonin, 10% Tween20, Tn5 transposase and nuclease-free H2O), incubated for 30 min at 37 °C. Libraries were prepared using active motif kit (ATAC-Seq Kit #53150) as previously described . The ATAC-seq DNA was sequenced in the 150 bp paired-end sequencing mode with a NovaSeq 6000 platform. Two biological replicates were performed per tissue.
According to the Fig. 1b, the raw data with sequence adapters and low-quality reads were trimmed with Trim Galore v0.6.6  using default parameters. The high-quality reads were then aligned to the pig genome (Sus scrofa 11.1) using Bowtie2 v188.8.131.52 . Duplicate alignments were removed using Picard toolkit v1.119  and alignments with low mapping quality (mapq < 30) and mitochondrial DNA were removed with SAMtools v1.9 . We used MACS2 v2.1.1  to generate the accessible chromatin regions (peaks) for each sample with the parameter --qvalue 0.01 --nomodel --shift − 100 --extsize 200 -B -SPMR --keep-dup all, and then we merged peaks across all samples within the same tissue in two individuals overlapping peaks using BEDTools merge . We recalculated the number of reads in each merged peak using SAMtools bedcov  and peaks per kilobase million values were determined to measure the peak intensity using R program v3.5.1 . The clustering of the samples based on peak intensity were performed using hcluster function with ward. D method in R program v3.5.1 , and visualized using R package circlize  and dendextend . Genomic annotation for peaks was performed with Genomic Association Tester . And tissue-specific peaks are defined as peak intensity in specific tissue at least twice higher than in the others.
RNA-seq library preparation, sequencing and processing
Total RNA was extracted from 40 frozen samples using Trizol and sequenced using MGI-2000 system with paired end and strand specific. RNA-seq reads of each sample were mapped to pig genome (Sus scrofa 11.1) using STAR v2.7.1a . Expression level of each gene was determined with StringTie  and featureCounts  and transcript per million (TPM) normalization was performed using R program. Only genes expressed (TPM > 0) across the 40 samples in more than 2 samples were included in the subsequent analyses. The clustering of the samples based on gene expression levels were performed using hcluster function with ward.D method in R program, and visualized using R package circlize  and dendextend . And two samples (stomach_sample1 and backfat_sample2) with poor clustering were discarded. And tissue-specific genes are defined as expression level in specific tissue at least twice higher than in the others.
Gene ontology enrichment analysis
The gene ontology enrichment analysis was performed using ClueGO . The P-values for the enrichment of Gene Ontology (GO) terms were corrected using Benjamini-Hochberg approach.
Transcription factor motifs
We tested for enrichment of vertebrates known transcription factor motifs in peaks using HOMER  with default parameters.
Correlation of peaks with gene expression
To clarify the regulatory relationship between peaks and genes, three other data were used in this study. The first study was from Kern et al. . A total of 16 samples were collected from 8 tissues (fat, cerebellum, brain, hypothalamus, liver, lung, muscle and spleen) of yorkshire with 2 biological replicates per tissue. The second was designed by Zhao et al. , which include 5 tissues (muscle, liver, fat, spleen, heart) of MeiShan, LargeWhite, Enshi and Duroc with 1–2 biological replicates per tissue. And the last one was executed by Yang et al. , which we downloaded and only included one tissue (skeletal muscle) of Luchuan and Duroc pigs with 3 biological replicates per breed (Additional file 1: Table S3). In total, combined with our own data, there were 85 samples of ATAC-seq and corresponding RNA-seq. After the above uniform ATAC-seq process analysis, we identified 796,506 peaks with P-value less than 10−5, and 14 samples with poor tissue clustering were discarded. And then 527,718 peaks exist in at least 2 samples were retained. For RNA-seq, after uniform RNA-seq process analysis above, 24,974 genes were identified, then, we selected 17,634 genes with TPM > 1 in at least 10% samples for further analysis. Using 527,718 peaks and 17,634 genes across 71 samples, we predicted target genes for peaks by implementing correlation between peak intensity and gene expression. Genes whose transcription start sites (TSSs) were within 500 Kb from the center of peaks were considered. We identified 827,942 significant correlations at q-value less than 0.01.
Spatial correlation between peaks and TEs
The data of TEs from pig (Sus scrofa 11.1) were analyzed by RepeatMasker v2.9.0  with parameter -s. And correlated interval sets between peaks and TEs were calculated by GenometriCorr (Genometric Correlation)  which was used to calculate the spatial correlation of genome-wide interval datasets. The permutation tests with 100 times were performed to verify whether TEs enriched in the regulatory elements or not.
Identification of accessible TEs
As previously reported , accessible TEs were identified by the overlapping between TEs and peaks, and the evaluation criterion is that at least 50% of the overlap should be TE while at least 20% of the overlap should be peak region. Accessible TEs that were only identified in one tissue was defined as tissue-specific accessible TEs while accessible TEs that were identified in all tissues was defined as shared accessible TEs. GO enrichment analysis of tissue-specific accessible TEs and shared accessible TEs were performed using ClueGO .
Multiple sequence alignment
Virus sequences were obtained from a study (unpublished) by self-assembly and comparison with the database . The conserved accessible TEs sequences and viral sequences were put together. Then, we used MAFFT v7.407  with the automatically detected parameters (--auto) to identified sequence homology. FastTree v2.1.11  was used to infer approximately-maximum-likelihood phylogenetic trees from alignments of nucleotide sequences. Finally, experiment visualization features in FigTree v1.4.4 .
The enrichment of GWAS signals
The pig GWAS data of fatty acid was collected from a published study . A total of 191 significant GWAS loci were obtained. Then, the enrichment analysis of these loci was performed with regulatory elements.
Mapping the open chromatin states of 20 pig tissues
To map genome-wide regulatory elements in pig, we performed ATAC-seq and RNA-seq on 40 samples representing 20 tissues, namely, brain, cerebellum, hypothalamus, medulla spinalis, lung, bronchia, parotid gland, pharynx, stomach, liver, SI, kidney, ovary, cervix, heart, thymus, longissimus muscle (LD), semimembranosus muscle (SM), backfat and leaf fat from two female DLY pigs at 6 months (Fig. 1a), representing two biological replicates per tissue. The data was subjected to analysis following the workflow shown in Fig. 1b.
After quality control, we obtained a total of 2.43G uniquely mapped reads from the 40 samples (Additional file 1: Table S1). The reads showed enrichment around TSSs and a periodical fragment size distribution corresponding to the nucleosome-free regions (NFR) (< 100 bp) and mono-, di-, and tri-nucleosomes (~ 200, 400 and 600 bp, respectively) (Additional file 2: Fig. S1a, S1b). Furthermore, we computed quality related statistics FRIP (fraction of reads in peaks), TSS score, library complexity (NRF (Non-redundancy Fraction), PBC1 and PBC2 (PCR Bottlenecking Coefficient 1 and 2)) and cross correlation (NSC and RSC) (Additional file 1: Table S1). These statistics suggested that the data generated hereby are of good quality. We identified an average of 77,096 peaks with average size of 448 bp per tissue (Additional file 2: Fig. S1d). Peaks from all samples were merged into 557,273 non-redundant peaks, among which 363,638 (65.25%) overlapped with peaks identified in Kern et al.  (Fig. 2c), while the rest of peaks were newly identified owing to more tissue types were profiled in this study.
To obtain a set of peaks with higher confidence, we only kept peaks that were evidenced in both biological replicates for each tissue, and finally retained 267,836 merged peaks for further analysis. Most of the 267,836 peaks located in intronic (54.23%) and intergenic (36.75%) regions, meanwhile, 4.23%, 2.59% and 2.20% peaks were evidenced in CDS, 5′UTR and 3′UTR regions (Fig. 2e). Similar to the results obtained in the previous study , we found that accessible chromatin regions vary greatly across tissues (range: 9729 to 94,946), covering 0.56% to 3.21% sequences of the reference genome (Fig. 2b), with a good overlap with those reported in independent studies by tissues (Additional file 1: Table S4).
Hierarchical clustering based on the peak intensity clearly separated different tissue types (Fig. 2a), indicating that the peak intensities largely reflect tissue identity. We grouped the 267,836 peaks into 20,488 proximal peaks (within 1 Kb of TSS) and 247,248 distal peaks (away from TSS 1 Kb). As expected, the distal peaks displayed stronger tissue specificity comparing to proximal peaks (Fig. 2d and Additional file 2: Fig. S1c), agreeing with the reported characteristics of promoters and enhancers .
To investigate the effect of regulatory elements on gene expression, RNA-seq was successfully performed on 38 out of the 40 ATAC-Seq samples. The summary statistics of each RNA library is listed in Supplementary Table 2. A total of 17,624 genes were identified. The reproducibility of the 38 RNA-seq data was verified by hierarchical clustering of gene expression (Additional file 2: Fig. S2a).
Tissue specificity of the open chromatin peaks
To explore the key regulatory elements that determine specialized functions of tissues, we defined peak whose average intensity was at least two folds in the target tissue relative to any other tissues as tissue-specific peak, and identified 16,704, 7887, 6742, 5960, 5286, 4278, 3089, 884, 649, 407, 326, 277, 261, 177, 118, 84, 21, 16, 6 and 3 tissue-specific peaks in cerebellum, kidney, liver, heart, parotid gland, cervix, thymus, leaf fat, stomach, brain, SI, LD, pharynx, SM, ovary, hypothalamus, medulla spinalis, lung, backfat and bronchia, respectively (Fig. 3a, b). We exemplified a liver specific peak located close to ARG1 (Fig. 3c and Additional file 2: S2e), a protein coding gene expressed primarily in the liver and involved in the urea cycle . The genes locate closest to the tissue-specific peaks were enriched in tissue-specific pathways, such as the heart and circulatory system development for heart specific peaks (Additional file 2: Fig. S2f), such as heart development and circulatory system development in heart. Given that the numbers of tissue-specific peaks were largely affected by the inclusion of physiologically similar tissues, we further determined peaks that specific to a certain tissue system using a similar rule applied on the individual tissues, and identified 11,114, 434, 310, 28, 5053 and 50 specific peaks for locomotor system, reproductive system, endocrine system, respiratory system, nervous system and digestive system, respectively, located close to genes with functions matching the biology of corresponding tissues (Fig. 3a). For example, genes expressed specifically in nervous system were significantly involved in synaptic membrane (q-value = 1.39E-32), while those expressed specifically in locomotor system were significantly associated with muscle structure development (q-value = 5.68E-28) (Additional file 2: Fig. S2d).
Next, we investigated enrichment of TF binding motifs in the tissue-specific peaks using HOMER . We defined the TF motifs with enrichment strength (−log (P-value)) in one tissue were at least two folds relative to the other tissues as tissue-specific, and identified 205 tissue-specific TF motifs (Fig. 3d). These included motif of ATOH1 which controls primary cilia formation that facilitates SHH-Triggered granule neuron progenitor proliferation  was evidenced to be cerebellum specific, and motif of MEF2C which controls cardiac morphogenesis  was enriched in heart.
Ubiquitously accessible chromatin peaks
Meanwhile, we also identified 8734 peaks that consistently active in at least 90% of investigated tissues (Fig. 3e). The genes locate nearest to the conserved peaks were enriched for pathways like cytoskeleton and mitotic cell cycle that have housekeeping functions (Additional file 2: Fig. S2d). The top 5 enriched TF motifs (SP1, NFY, SP5, ELK4 and KLF3) (Additional file 1: Table S5) participate in chromatin remodeling and transcriptional activation (Fig. 3f), which are indispensable transcription factors in growth and development .
Tissue-specific gene expressions
We applied similar approach to investigate tissue specificity of gene expressions, and identified 135, 528, 439, 711, 228, 181, 117, 467, 40, 75, 469, 486, 94, 581, 124, 117, 411, 34, 336 and 1590 tissue-specific genes that were found to be specific in backfat, brain, bronchia, cerebellum, cervix, heart, hypothalamus, kidney, LD, leaf fat, liver, lung, medulla spinalis, ovary, parotid gland, pharynx, SI, SM, stomach and thymus, respectively (Additional file 2: Fig. S2b). For example, MYH7, a gene described to be responsible for hypertrophic cardiomyopathy, was highly expressed in heart than in other tissues (Additional file 2: Fig. S2c). The GO analysis showed these tissue-specific genes were associated with tissue-specific biological processes (Additional file 2: Fig. S2b). For example, genes significantly expressed in cerebellum were related to the function of synaptic signaling, and genes significantly expressed in thymus were related to the function of T cell activation (Additional file 2: Fig. S2b).
Joint profiling of chromatin accessibility and gene expression
To investigate the regulatory elements that link to gene transcription, we examined correlations of gene expressions with intensities of ATAC peaks within 500 Kb from the TSSs of corresponding genes, as majority of promoter-anchored chromatin interactions were within 500 Kb [63, 64]. The analysis was performed on the integrated dataset of 527,718 peaks and 17,634 genes from 71 samples, including 33 samples from public datasets (Additional file 1: Table S3, Additional file 2: Fig. S3). We identified a total of 827,942 significant peak - gene associations (q-value < 0.01) that involves 255,166 peaks and 17,493 genes (Additional file 2: Fig. S4). Majority of the significant correlations (406,896/827,942, 49.15%) are positive only, while we also identified 10.54% negative correlations only. For example, a peak (chr5:10,663,155-10,664,871) has high intensity in liver showed positive correlation with TMPRSS6 (Spearman r2 = 0.77, q-value = 6.43E-12) (Fig. 4f), which encodes a type II transmembrane serine protease exclusively produced by liver . We also exemplified negative peak - gene correlation where a peak (chr1:128,083,325-128,085,034) show strong negative correlation with expression levels of PDIA3 (Spearman r2 = − 0.60, q-value = 1.42E-6) (Fig. 4g). In addition, there are 40.31% peak - gene correlations showed both positive and negative correlations, indicating that the identity of regulatory elements was mutable.
On average, there were 2.87 genes per peak and 35.86 peaks per gene in positive correlations while 2.13 genes per peak and 14.49 peaks per gene in negative correlations (Fig. 4a, c). Interestingly, we observed that the strength of positive and negative correlations showed distinct distribution against peak - gene distances (Fig. 4b). The positive correlations were enriched around the TSSs of genes, while the negative correlations appeared to be underrepresented in TSS regions (Fig. 4d). We identified 1786 tissue-specific genes were associated with 38,574 tissue-specific peaks within 500 Kb from the TSS regions (Fig. 4e, h and Additional file 2: Fig. S4), suggesting important roles of tissue-specific chromatin accessibility in driving the tissue-specific genes expression thereby encoded tissue-specific biological functions.
Open chromatin states of transposable elements in different pig tissues
Over 40% of the sequences in the non-coding regions of the mammalian genome are TEs whose functions were largely unexplored. To understand the functions of TEs in pig tissues, we examined the chromatin accessibility of genome wide TEs. We found 26.91% of the 267,836 open chromatin regions were overlapped with at least one TE.
The spatial correlation analysis between TEs and open chromatin regions calculated by GenometriCorr  suggested underrepresentation of ratio of TEs located in the open chromatin regions than random expectation, indicating that TEs are tend to be epigenetically silenced. Despite this, each tissue contains an average of 17.96% peaks that were overlapped with at least one TE, range from 9.49% to 22.33% (Fig. 5a). The SINE, LINE and LTR accounted for approximately 40%, 30% and 23% of the accessible TEs, respectively, across the 20 tissues (Fig. 5b). We found over 50% of these TEs existed in a single tissue were located close to genes involved in tissue-specific functions (Fig. 5c, d). For example, we observed enrichment of T cell receptor signaling pathway enriched in thymus while muscle system process enriched in LD (Fig. 5d). These results suggested that accessible TEs existed in one tissue exhibit strong tissue-specific regulatory roles, which was consistent with the previous reports [66, 67]. Meanwhile, we observed approximate 0.19% of accessible TEs were commonly accessible in all 20 tissues (Fig. 5c). The conserved accessible TEs were more associated with housekeeping functions (e.g., ribosome assembly) (Fig. 5d).
Mounting evidence suggests that specific TEs in pigs, especially PERV, could potentially lead to immunodeficiency and tumorigenesis, making it difficult for xenotransplantation [68, 69]. Upon further inspection, we found a total of 10 unique PERVs overlapped with 13 peaks (Additional file 1: Table S6). For example, the PERV (sPERV-JX2like-12-23.0 M) was overlapped with the peak (chr12:22,990,219-22,991,950) which showed high intensity in liver (Fig. 5e), significantly correlated with multiple genes (NEUROD2, ENSSSCG00000017507, CACNB1, PLXDC1, PIP4K2B). Among them, PIP4K2B, which participates in 1-phosphatidylinositol-4-phosphate 5-kinase activity and IP6 , the product of 1-phosphatidylinositol-4-phosphate 5-kinase, can facilitates assembly of the immature HIV-1 Gag lattice , shows highly correlation (Fig. 5f), suggesting potential role of the PERVs in facilitating virus infection.
Besides this, we also focused on the functions of other genes regulated by PERVs. For example, GTPBP8 , BCKDK , IDH3B  and IDH1  were related to the function of mitochondrion; CD2BP2  and FUS  involved in mRNA splicing; BOC  and FZD5  participated in signal transduction pathway. Taken together, these results suggested that PERV which act as regulatory element involved in a variety of biological functions.
In this study, we identified a total of 162 TEs that are accessible in all investigated tissues. To trace the origin or evolutionary history of these ubiquitously accessible TEs, we calculated sequence similarity between 124 viral sequences (Additional file 1: Table S8) and 162 conserved accessible TEs sequences, and constructed a phylogenetic tree to reconstruct evolutionary history. Interestingly, we observed some conserved accessible TEs sequences were grouped with some viruses whose host was pig (Fig. 6), such as PBoV, TTSuVK2a, TTSuVK2b, PADV-A, PoBuV, PPV7 and Po-Circo-Like Virus 21 (Additional file 1: Table S9). We performed GO and KEGG enrichment analysis of all genes associated with conserved accessible TEs, and found that the biologic process of mitochondrial depolarization was highly enriched (Additional file 1: Table S10), which indicated that accessible TEs derived from viral sequences have a potential regulatory function in clearance of excess or impaired mitochondrial. Altogether, these evidences suggested that some special viral sequences have been integrated into the host genome and retained as regulatory elements that could play constitutive regulatory functions.
An annotation resource for regulatory mechanisms in complex traits
To demonstrate the utility of such resource in annotating candidate casual variants for complex traits, we examined overlap of 191 significant variants for fatty acid composition identified in a genome sequence based imputation GWAS  with the ATAC-seq peak regions identified in this study. A total of 22 variants were found to be coincident with the ATAC-seq peaks (Additional file 1: Table S7). Among these, a lead SNP (chr2:9,736,686, P = 2.10E-10) for C20:3n-6/C18:2n-6 was located in an ATAC-seq peak (chr2:9,736,357-9,737,907) that showed high intensity in brain and moderate intensity in muscle (Fig. 7a). This peak showed significant correlation with three genes (FADS1, FADS2 and RAB3IL1) (Additional file 1: Table S7). Among them, FADS1 (q = 1.56E-4) which exhibited strongest correlation (Fig. 7b), encodes delta-5 desaturase, one of the rate-limiting enzymes in the endogenous synthesis of polyunsaturated fatty acids (PUFAs) . FADS2 which converts linoleate and alpha-linolenate into PUFAs, is one of the key limiting enzymes in the lipid metabolic pathway  (Fig. 7c). In addition, several other genes related to fatty acid have been identified. Osteoglycin (OGN) is involved in matrix assembly, cellular growth, and migration. Previous study showed that OGN was associated with fat acids composition traits to some extent  and regulated lipid differentiation through the Wnt/β-catenin signaling pathway . TEAD3 which participates in the fatty acid, triacylglycerol, and ketone body metabolism pathway from the Reactome Pathways database , is a member of the transcriptional enhancer factor (TEF) family of transcription factors. Further experiment is needed to validate effect of the variants on the accessibility of the peaks, and in turn the expression of relevant genes and fatty acid composition phenotypes.
Regulatory elements are crucial for gene expression regulation, which involved in the initiation and regulation of transcription in different tissues and organs and provided precise control of genes and restrict gene expression to certain cells and tissues throughout life . Activation of regulatory elements needs ‘pioneer’ factor opening the compacted chromatin to recruit transcription coactivators and chromatin remodeling proteins [86, 87]. Regulatory elements usually contain multiple transcription factors binding sites (TFBS) which recognized by TFs. Genetic variations in regulatory elements affect gene expression usually through disrupting TFBS . Therefore, the generation of multi-tissues atlas of open chromatin enabled exploration of gene expression regulation and non-coding genetic variations.
In this study, we performed ATAC-seq and RNA-seq across 20 tissues in pig at the same development stage to characterized chromatin state landscape, uncovering extensive tissue-specific regulation of gene expression. However, not a single study until now, as far as we are aware, has comprehensively and systematically investigated chromatin accessibility in multi-tissues by ATAC-seq and conjunction with transcriptomic analysis. In our study, we observed that the patterns of TSS proximal regulatory elements were more similar across tissues while distal regulatory elements were more tissue-specific, reflecting the architecture and function of distal regulatory elements have strong tissue specificity .
Consistent with the previous reports that regulatory elements mediate specific differentiation of tissues or cells [90, 91], we identified a large number of tissue-specific open chromatin regions which regulate the genes with function matching the biology of corresponding tissues. In addition, we identified tissue-specific TF motifs based on the enrichment of their cognate binding motifs, such as ATOH1 enriched in cerebellum and FOXA1 enriched in liver. And we acknowledge that TF motifs enrichment analysis needs to verify using other approaches such as ChIP-seq for specific TFs.
Connecting peaks to their target genes remains challenge given that Hi-C data is not available for the samples under study, particularly for distal peaks which are far from target genes, and regarding genes nearest to peaks as targets is error prone [92,93,94]. To solve this problem, we combine other published data to reveal the association between peak intensity and gene expression, which has been proved effective . We observed significant negative peak - gene pairs showed strikingly different peak - gene distances compared to that in positive correlation pairs, which suggested that positive and negative correlation pairs may have different regulatory mechanism. To further understand tissue-specific regulatory relationships between peak intensity and gene expression, we filtered for tissue-specific correlation pairs (Additional file 2: Fig. S4). Our results suggest tissue-specific genes expression may occur through the activation of tissue-specific regulatory elements. Whereas, the above predictions require further support using 3D chromosome conformation data or validated through CRISPRi experiments.
To deepen our understanding of regulatory elements, here we performed profiles in TEs as they can provide binding sites for TFs and behave as alternative promoters and enhancers . In this study, we observed a small proportion of accessible TEs, which were in good accordance with the classical polarization theory that TEs promote genetic innovation but also threaten genome stability . Similar to previous study , we observed most of accessible TEs only present in one tissue. And tissue-specific accessible TEs were enriched in tissue-specific biological pathways, which support the findings of previous studies [52, 97,98,99]. Endogenous retroviruses (ERVs) are LTR retrotransposons (class I of TEs), which originated from exogenous retroviruses that infected the germ line throughout evolution [100, 101], which could manifest their function across multiple organs in an animal, thereby could have fundamental functional impact. In order to trace the potential source of accessible TEs, we performed multiple sequence alignments between the viral sequences and the conserved accessible TEs sequences, and found some candidate viral sequences (PBoV, TTSuVK2a, TTSuVK2b, PADV-A, PoBuV, PPV7 and Po-Circo-Like Virus 21) might be the source of conserved accessible TEs.
The annotation of regulatory elements has proven highly effective for the identification of candidate causative variants and screening candidate genes for complex traits [88, 102, 103]. Similarly, our results demonstrated that variants of fatty acid were enriched in active regulatory elements annotated by this study. Specifically, we speculate that a potential causative SNP (chr2:9,736,686, P = 2.10E-10) that was associated with C20:3n-6/C18:2n-6 and found within a tissue-conserved promoter, may regulate the expression of FADS1 which encodes the Δ5 desaturase enzyme - one of the rate-limiting enzymes in the endogenous synthesis of polyunsaturated fatty acids (PUFAs) . Taken together, our data unveiled the chromatin landscapes for studying the regulation of gene expression, provided potential sources of chromatin accessibility and deconstructed underlying mechanisms of complex traits in pig. Further investigations with additional complementary data—such as single cell and different development stage data—are warranted to fully dissect biological mechanisms of gene expression.
We performed an integrated analysis of transcriptome sequencing and transposase-accessible chromatin with high throughput sequencing and provided a comprehensive understanding of tissue-specific accessible chromatin regions and tissue-specific TF motifs. Then, we constructed the relationship between regulatory elements and their target genes. This large regulatory network will serve as a foundation for understanding the precise regulatory mechanisms in different tissues of pig. We also analyzed accessible TEs which act as candidate regulatory elements. In addition, we analyzed the sequence characteristics of regulatory elements, and found some virus sequences similar to regulatory element sequences, which provided a certain basis for the essential characteristics of regulatory elements. Finally, the regulatory elements identified in this paper were used to annotate the GWAS loci that affect fatty acid traits, providing a reference for genetic analysis of complex traits.
Availability of data and materials
Supporting data produced in this study have been uploaded to National Genomics Data Center with the accession number CRA007240.
Transposase-accessible chromatin with high throughput sequencing
DNase I hypersensitive site sequencing
Fraction of reads in peaks
Micrococcal nuclease sequencing
Normalized strand cross-correlation coefficient
Porcine mastadenovirus A
PCR Bottlenecking Coefficient 1
PCR Bottlenecking Coefficient 2
Parvoviridae Bocaparvovirus Porcine bocavirus
Po-Circo-Like Virus 21
Porcine parvovirus 7
Polyunsaturated fatty acids
Relative strand cross-correlation coefficient
Transcriptional enhancer factor
Transcription factors binding site
Transcripts per million
Transcription start site
Torque teno sus virus k2a
Torque teno sus virus k2b
Simpson RT. Nucleosome positioning can affect the function of a cis-acting DNA element in vivo. Nature. 1990;343(6256):387–9. https://doi.org/10.1038/343387a0.
Schones DE, Cui K, Cuddapah S, Roh TY, Barski A, Wang Z, et al. Dynamic regulation of nucleosome positioning in the human genome. Cell. 2008;132(5):887–98. https://doi.org/10.1016/j.cell.2008.02.022.
Kouzarides T. Chromatin modifications and their function. Cell. 2007;128(4):693–705. https://doi.org/10.1016/j.cell.2007.02.005.
Lee CK, Shibata Y, Rao B, Strahl BD, Lieb JD. Evidence for nucleosome depletion at active regulatory regions genome-wide. Nat Genet. 2004;36(8):900–5. https://doi.org/10.1038/ng1400.
Ozsolak F, Song JS, Liu XS, Fisher DE. High-throughput mapping of the chromatin structure of human promoters. Nat Biotechnol. 2007;25(2):244–8. https://doi.org/10.1038/nbt1279.
Sheffield NC, Furey TS. Identifying and characterizing regulatory sequences in the human genome with chromatin accessibility assays. Genes (Basel). 2012;3(4):651–70. https://doi.org/10.3390/genes3040651.
Yue F, Cheng Y, Breschi A, Vierstra J, Wu W, Ryba T, et al. A comparative encyclopedia of DNA elements in the mouse genome. Nature. 2014;515(7527):355–64. https://doi.org/10.1038/nature13992.
Thurman RE, Rynes E, Humbert R, Vierstra J, Maurano MT, Haugen E, et al. The accessible chromatin landscape of the human genome. Nature. 2012;489(7414):75–82. https://doi.org/10.1038/nature11232.
Rivera CM, Ren B. Mapping human epigenomes. Cell. 2013;155(1):39–55. https://doi.org/10.1016/j.cell.2013.09.011.
Lettice LA, Williamson I, Devenney PS, Kilanowski F, Dorin J, Hill RE. Development of five digits is controlled by a bipartite long-range cis-regulator. Development. 2014;141(8):1715–25. https://doi.org/10.1242/dev.095430.
Reddi PP, Urekar CJ, Abhyankar MM, Ranpura SA. Role of an insulator in testis-specific gene transcription. Ann N Y Acad Sci. 2007;1120:95–103. https://doi.org/10.1196/annals.1411.012.
Smith RP, Eckalbar WL, Morrissey KM, Luizon MR, Hoffmann TJ, Sun X, et al. Genome-wide discovery of drug-dependent human liver regulatory elements. PLoS Genet. 2014;10(10):e1004648. https://doi.org/10.1371/journal.pgen.1004648.
Wang D, Chen H, Momary KM, Cavallari LH, Johnson JA, Sadee W. Regulatory polymorphism in vitamin K epoxide reductase complex subunit 1 (VKORC1) affects gene expression and warfarin dose requirement. Blood. 2008;112(4):1013–21. https://doi.org/10.1182/blood-2008-03-144899.
Smemo S, Tena JJ, Kim KH, Gamazon ER, Sakabe NJ, Gomez-Marin C, et al. Obesity-associated variants within FTO form long-range functional connections with IRX3. Nature. 2014;507(7492):371–5. https://doi.org/10.1038/nature13138.
Kapoor A, Sekar RB, Hansen NF, Fox-Talbot K, Morley M, Pihur V, et al. An enhancer polymorphism at the cardiomyocyte intercalated disc protein NOS1AP locus is a major regulator of the QT interval. Am J Hum Genet. 2014;94(6):854–69. https://doi.org/10.1016/j.ajhg.2014.05.001.
Corces MR, Trevino AE, Hamilton EG, Greenside PG, Sinnott-Armstrong NA, Vesuna S, et al. An improved ATAC-seq protocol reduces background and enables interrogation of frozen tissues. Nat Methods. 2017;14(10):959–62. https://doi.org/10.1038/nmeth.4396.
Markenscoff-Papadimitriou E, Whalen S, Przytycki P, Thomas R, Binyameen F, Nowakowski TJ, et al. A chromatin accessibility atlas of the developing human telencephalon. Cell. 2020;182(3):754–69.e18. https://doi.org/10.1016/j.cell.2020.06.002.
Wu J, Huang B, Chen H, Yin Q, Liu Y, Xiang Y, et al. The landscape of accessible chromatin in mammalian preimplantation embryos. Nature. 2016;534(7609):652–7. https://doi.org/10.1038/nature18606.
Gorkin DU, Barozzi I, Zhao Y, Zhang Y, Huang H, Lee AY, et al. An atlas of dynamic chromatin landscapes in mouse fetal development. Nature. 2020;583(7818):744–51. https://doi.org/10.1038/s41586-020-2093-3.
Corces MR, Granja JM, Shams S, Louie BH, Seoane JA, Zhou W, et al. The chromatin accessibility landscape of primary human cancers. Science. 2018;362(6413):eaav1898. https://doi.org/10.1126/science.aav1898.
The ENCODE Project Consortium. The ENCODE (ENCyclopedia of DNA elements) project. Science. 2004;306(5696):636–40. https://doi.org/10.1126/science.1105136.
Roadmap Epigenomics Consortium, Kundaje A, Meuleman W, Ernst J, Bilenky M, Yen A, et al. Integrative analysis of 111 reference human epigenomes. Nature. 2015;518(7539):317–30. https://doi.org/10.1038/nature14248.
Andersson L, Archibald AL, Bottema CD, Brauning R, Burgess SC, Burt DW, et al. Coordinated international action to accelerate genome-to-phenome with FAANG, the functional annotation of animal genomes project. Genome Biol. 2015;16:57. https://doi.org/10.1186/s13059-015-0622-4.
Pan Z, Yao Y, Yin H, Cai Z, Wang Y, Bai L, et al. Pig genome functional annotation enhances the biological interpretation of complex traits and human disease. Nat Commun. 2021;12(1):5848. https://doi.org/10.1038/s41467-021-26153-7.
Yue J, Hou X, Liu X, Wang L, Gao H, Zhao F, et al. The landscape of chromatin accessibility in skeletal muscle during embryonic development in pigs. J Anim Sci Biotechnol. 2021;12(1):56. https://doi.org/10.1186/s40104-021-00577-z.
Kern C, Wang Y, Xu X, Pan Z, Halstead M, Chanthavixay G, et al. Functional annotations of three domestic animal genomes provide vital resources for comparative and agricultural research. Nat Commun. 2021;12(1):1821. https://doi.org/10.1038/s41467-021-22100-8.
Kunarso G, Chia NY, Jeyakani J, Hwang C, Lu X, Chan YS, et al. Transposable elements have rewired the core regulatory network of human embryonic stem cells. Nat Genet. 2010;42(7):631–4. https://doi.org/10.1038/ng.600.
Ito J, Sugimoto R, Nakaoka H, Yamada S, Kimura T, Hayano T, et al. Systematic identification and characterization of regulatory elements derived from human endogenous retroviruses. PLoS Genet. 2017;13(7):e1006883. https://doi.org/10.1371/journal.pgen.1006883.
Jang HS, Shah NM, Du AY, Dailey ZZ, Pehrsson EC, Godoy PM, et al. Transposable elements drive widespread expression of oncogenes in human cancers. Nat Genet. 2019;51(4):611–7. https://doi.org/10.1038/s41588-019-0373-3.
Sykes M, Sachs DH. Transplanting organs from pigs to humans. Sci Immunol. 2019;4(41):eaau6298. https://doi.org/10.1126/sciimmunol.aau6298.
Denner J, Tonjes RR. Infection barriers to successful xenotransplantation focusing on porcine endogenous retroviruses. Clin Microbiol Rev. 2012;25(2):318–43. https://doi.org/10.1128/CMR.05011-11.
Buenrostro JD, Wu B, Chang HY, Greenleaf WJ. ATAC-seq: a method for assaying chromatin accessibility genome-wide. Curr Protoc Mol Biol. 2015;109:21.29.1–9. https://doi.org/10.1002/0471142727.mb2129s109.
Trim Galore. https://github.com/FelixKrueger/TrimGalore. Accessed 17 June 2022.
Langmead B, Salzberg SL. Fast gapped-read alignment with bowtie 2. Nat Methods. 2012;9(4):357–9. https://doi.org/10.1038/nmeth.1923.
Picard. http://broadinstitute.github.io/picard. Accessed 17 June 2022.
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25(16):2078–9. https://doi.org/10.1093/bioinformatics/btp352.
Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 2008;9(9):R137. https://doi.org/10.1186/gb-2008-9-9-r137.
Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26(6):841–2. https://doi.org/10.1093/bioinformatics/btq033.
R Core Team. R. A language and environment for statistical computing. Vienna: R Foundation for statistical computing; 2013. URL http://www.R-project.org/
Gu Z, Gu L, Eils R, Schlesner M, Brors B. circlize implements and enhances circular visualization in R. Bioinformatics. 2014;30(19):2811–2. https://doi.org/10.1093/bioinformatics/btu393.
Galili T. dendextend: an R package for visualizing, adjusting and comparing trees of hierarchical clustering. Bioinformatics. 2015;31(22):3718–20. https://doi.org/10.1093/bioinformatics/btv428.
Genomic Association Tester. https://github.com/AndreasHeger/gat/blob/master/doc/contents.rst. Accessed 17 June 2022.
Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2012;29(1):15–21. https://doi.org/10.1093/bioinformatics/bts635.
Pertea M, Pertea GM, Antonescu CM, Chang TC, Mendell JT, Salzberg SL. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat Biotechnol. 2015;33(3):290–5. https://doi.org/10.1038/nbt.3122.
Liao Y, Smyth GK, Shi W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics. 2014;30(7):923–30. https://doi.org/10.1093/bioinformatics/btt656.
Bindea G, Mlecnik B, Hackl H, Charoentong P, Tosolini M, Kirilovsky A, et al. ClueGO: a Cytoscape plug-in to decipher functionally grouped gene ontology and pathway annotation networks. Bioinformatics. 2009;25(8):1091–3. https://doi.org/10.1093/bioinformatics/btp101.
Heinz S, Benner C, Spann N, Bertolino E, Lin YC, Laslo P, et al. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol Cell. 2010;38(4):576–89. https://doi.org/10.1016/j.molcel.2010.05.004.
Zhao Y, Hou Y, Xu Y, Luan Y, Zhou H, Qi X, et al. A compendium and comparative epigenomics analysis of cis-regulatory elements in the pig genome. Nat Commun. 2021;12(1):2217. https://doi.org/10.1038/s41467-021-22448-x.
Yang Y, Fan X, Yan J, Chen M, Zhu M, Tang Y, et al. A comprehensive epigenome atlas reveals DNA methylation regulating skeletal muscle development. Nucleic Acids Res. 2021;49(3):1313–29. https://doi.org/10.1093/nar/gkaa1203.
Chen N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr Protoc Bioinformatics. 2004. https://doi.org/10.1002/0471250953.bi0410s25 Chapter 4:Unit 4.10.
Favorov A, Mularoni L, Cope LM, Medvedeva Y, Mironov AA, Makeev VJ, et al. Exploring massive, genome scale datasets with the GenometriCorr package. PLoS Comput Biol. 2012;8(5):e1002529. https://doi.org/10.1371/journal.pcbi.1002529.
Miao B, Fu S, Lyu C, Gontarz P, Wang T, Zhang B. Tissue-specific usage of transposable element-derived promoters in mouse development. Genome Biol. 2020;21(1). https://doi.org/10.1186/s13059-020-02164-3.
Roux S, Paez-Espino D, Chen IA, Palaniappan K, Ratner A, Chu K, et al. IMG/VR v3: an integrated ecological and evolutionary framework for interrogating genomes of uncultivated viruses. Nucleic Acids Res. 2021;49(D1):D764–D75. https://doi.org/10.1093/nar/gkaa946.
Katoh K, Misawa K, Ki K, Miyata T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 2002;30(14):3059–66.
FigTree. http://tree.bio.ed.ac.uk/software/figtree. Accessed 17 June 2022.
Gorkin DU, Barozzi I, Zhao Y, Zhang Y, Ren B. Author correction: an atlas of dynamic chromatin landscapes in mouse fetal development. Nature. 2021;589(7842):1.
Zhang J, Zhang Y, Gong H, Cui L, Ma J, Chen C, et al. Landscape of loci and candidate genes for muscle fatty acid composition in pigs revealed by multiple population association analysis. Front Genet. 2019;10:1067. https://doi.org/10.3389/fgene.2019.01067.
Ko JY, Oh S, Yoo KH. Functional enhancers as master regulators of tissue-specific gene regulation and cancer development. Mol Cell. 2017;40(3):169–77. https://doi.org/10.14348/molcells.2017.0033.
Iyer R, Jenkinson CP, Vockley JG, Kern RM, Grody WW, Cederbaum S. The human arginases and arginase deficiency. J Inherit Metab Dis. 1998;21 Suppl 1:86–100. https://doi.org/10.1023/a:1005313809037.
Chang CH, Zanini M, Shirvani H, Cheng JS, Yu H, Feng CH, et al. Atoh1 controls primary cilia formation to allow for SHH-triggered granule neuron progenitor proliferation. Dev Cell. 2019;48(2):184–99.e5. https://doi.org/10.1016/j.devcel.2018.12.017.
Stone NR, Gifford CA, Thomas R, Pratt KJB, Samse-Knapp K, Mohamed TMA, et al. Context-specific transcription factor functions regulate epigenomic and transcriptional dynamics during cardiac reprogramming. Cell Stem Cell. 2019;25(1):87–102.e9. https://doi.org/10.1016/j.stem.2019.06.012.
Halstead MM, Ma X, Zhou C, Schultz RM, Ross PJ. Chromatin remodeling in bovine embryos indicates species-specific regulation of genome activation. Nat Commun. 2020;11(1):4654. https://doi.org/10.1038/s41467-020-18508-3.
Wu Y, Qi T, Wang H, Zhang F, Zheng Z, Phillips-Cremins JE, et al. Promoter-anchored chromatin interactions predicted from genetic analysis of epigenomic data. Nat Commun. 2020;11(1):2061. https://doi.org/10.1038/s41467-020-15587-0.
Javierre BM, Burren OS, Wilder SP, Kreuzhuber R, Hill SM, Sewitz S, et al. Lineage-specific genome architecture links enhancers and non-coding disease variants to target gene promoters. Cell. 2016;167(5):1369–84.e19. https://doi.org/10.1016/j.cell.2016.09.037.
Finberg KE, Heeney MM, Campagna DR, Aydinok Y, Pearson HA, Hartman KR, et al. Mutations in TMPRSS6 cause iron-refractory iron deficiency anemia (IRIDA). Nat Genet. 2008;40(5):569–71. https://doi.org/10.1038/ng.130.
Jin P, Qin S, Chen X, Song Y, Li-Ling J, Xu X, et al. Evolutionary rate of human tissue-specific genes are related with transposable element insertions. Genetica. 2013;140(10-12):513–23. https://doi.org/10.1007/s10709-013-9700-2.
Trizzino M, Kapusta A, Brown CD. Transposable elements generate regulatory novelty in a tissue-specific fashion. BMC Genomics. 2018;19(1):468. https://doi.org/10.1186/s12864-018-4850-3.
Bendinelli M, Matteucci D, Friedman H. Retrovirus-induced acquired Immunodeficiencies. In: Advances in cancer research. 1985;45:125–81. https://doi.org/10.1016/S0065-230X(08)60268-7.
Wegman-Points LJ, Teoh-Fitzgerald ML, Mao G, Zhu Y, Fath MA, Spitz DR, et al. Retroviral-infection increases tumorigenic potential of MDA-MB-231 breast carcinoma cells by expanding an aldehyde dehydrogenase (ALDH1) positive stem-cell like population. Redox Biol. 2014;2:847–54. https://doi.org/10.1016/j.redox.2014.06.006.
York JD. Regulation of nuclear processes by inositol polyphosphates. Biochim Biophys Acta. 2006;1761(5-6):552–9. https://doi.org/10.1016/j.bbalip.2006.04.014.
Dick RA, Zadrozny KK, Xu C, Schur FKM, Lyddon TD, Ricana CL, et al. Inositol phosphates are assembly co-factors for HIV-1. Nature. 2018;560(7719):509–12. https://doi.org/10.1038/s41586-018-0396-4.
Verma Y, Mehra U, Pandey DK, Kar J, Perez-Martinez X, Jana SS, et al. MRX8, the conserved mitochondrial YihA GTPase family member, is required for de novo Cox1 synthesis at suboptimal temperatures in Saccharomyces cerevisiae. Mol Biol Cell. 2021;32(21):ar16. https://doi.org/10.1091/mbc.E20-07-0457.
Sato Y, Tate H, Yoshizawa F, Sato Y. Data on the proliferation and differentiation of C2C12 myoblast treated with branched-chain ketoacid dehydrogenase kinase inhibitor. Data Brief. 2020;31:105766. https://doi.org/10.1016/j.dib.2020.105766.
Duncan DM, Kiefel P, Duncan I. Mutants for drosophila Isocitrate dehydrogenase 3b are defective in mitochondrial function and larval cell death. G3 (Bethesda). 2017;7(3):789–99. https://doi.org/10.1534/g3.116.037366.
Gonzalez-Menendez P, Romano M, Yan H, Deshmukh R, Papoin J, Oburoglu L, et al. An IDH1-vitamin C crosstalk drives human erythroid development by inhibiting pro-oxidant mitochondrial metabolism. Cell Rep. 2021;34(5):108723. https://doi.org/10.1016/j.celrep.2021.108723.
Kofler M, Heuer K, Zech T, Freund C. Recognition sequences for the GYF domain reveal a possible spliceosomal function of CD2BP2. J Biol Chem. 2004;279(27):28292–7. https://doi.org/10.1074/jbc.M402008200.
Orozco D, Edbauer D. FUS-mediated alternative splicing in the nervous system: consequences for ALS and FTLD. J Mol Med (Berl). 2013;91(12):1343–54. https://doi.org/10.1007/s00109-013-1077-2.
Echevarria-Andino ML, Allen BL. The hedgehog co-receptor BOC differentially regulates SHH signaling during craniofacial development. Development. 2020;147(23):dev189076. https://doi.org/10.1242/dev.189076.
Katoh M, Katoh M. Molecular genetics and targeted therapy of WNT-related human diseases (review). Int J Mol Med. 2017;40(3):587–606. https://doi.org/10.3892/ijmm.2017.3071.
Lee JM, Lee H, Kang S, Park WJ. Fatty acid Desaturases, polyunsaturated fatty acid regulation, and biotechnological advances. Nutrients. 2016;8(1):23. https://doi.org/10.3390/nu8010023.
Xie D, Fu Z, Wang S, You C, Monroig Ó, Tocher DR, et al. Characteristics of the fads2 gene promoter in marine teleost Epinephelus coioides and role of Sp1-binding site in determining promoter activity. Sci Rep. 2018;8(1):5305. https://doi.org/10.1038/s41598-018-23668-w.
Mroz TL, Eves-van den Akker S, Bernat A, Skarzynska A, Pryszcz L, Olberg M, et al. Transcriptome analyses of mosaic (MSC) mitochondrial mutants of cucumber in a highly inbred nuclear background. G3 (Bethesda). 2018;8(3):953–65. https://doi.org/10.1534/g3.117.300321.
Donati G, Proserpio V, Lichtenberger BM, Natsuga K, Sinclair R, Fujiwara H, et al. Epidermal Wnt/beta-catenin signaling regulates adipocyte differentiation via secretion of adipogenic factors. Proc Natl Acad Sci U S A. 2014;111(15):E1501–9. https://doi.org/10.1073/pnas.1312880111.
Gillespie M, Jassal B, Stephan R, Milacic M, Rothfels K, Senff-Ribeiro A, et al. The reactome pathway knowledgebase 2022. Nucleic Acids Res. 2022;50(D1):D687–D92. https://doi.org/10.1093/nar/gkab1028.
Hernandez-Garcia CM, Finer JJ. Identification and validation of promoters and cis-acting regulatory elements. Plant Sci. 2014;217-218:109–19. https://doi.org/10.1016/j.plantsci.2013.12.007.
Visel A, Rubin EM, Pennacchio LA. Genomic views of distant-acting enhancers. Nature. 2009;461(7261):199–205. https://doi.org/10.1038/nature08451.
Zhu F, Farnung L, Kaasinen E, Sahu B, Yin Y, Wei B, et al. The interaction landscape between transcription factors and the nucleosome. Nature. 2018;562(7725):76–81. https://doi.org/10.1038/s41586-018-0549-5.
Huo Y, Li S, Liu J, Li X, Luo XJ. Functional genomics reveal gene regulatory mechanisms underlying schizophrenia risk. Nat Commun. 2019;10(1):670. https://doi.org/10.1038/s41467-019-08666-4.
Ong CT, Corces VG. Enhancer function: new insights into the regulation of tissue-specific gene expression. Nat Rev Genet. 2011;12(4):283–93. https://doi.org/10.1038/nrg2957.
Zhu Y, Zhou Z, Huang T, Zhang Z, Li W, Ling Z, et al. Mapping and analysis of a spatiotemporal H3K27ac and gene expression spectrum in pigs. Sci China Life Sci. 2022. https://doi.org/10.1007/s11427-021-2034-5.
Smith AD, Sumazin P, Zhang MQ. Tissue-specific regulatory elements in mammalian promoters. Mol Syst Biol. 2007;3:73. https://doi.org/10.1038/msb4100114.
Mifsud B, Tavares-Cadete F, Young AN, Sugar R, Schoenfelder S, Ferreira L, et al. Mapping long-range promoter contacts in human cells with high-resolution capture hi-C. Nat Genet. 2015;47(6):598–606. https://doi.org/10.1038/ng.3286.
Schoenfelder S, Furlan-Magaril M, Mifsud B, Tavares-Cadete F, Sugar R, Javierre BM, et al. The pluripotent regulatory circuitry connecting promoters to their long-range interacting elements. Genome Res. 2015;25(4):582–97. https://doi.org/10.1101/gr.185272.114.
Sanyal A, Lajoie BR, Jain G, Dekker J. The long-range interaction landscape of gene promoters. Nature. 2012;489(7414):109–13. https://doi.org/10.1038/nature11279.
Pehrsson EC, Choudhary MNK, Sundaram V, Wang T. The epigenomic landscape of transposable elements across normal human development and anatomy. Nat Commun. 2019;10(1):5640. https://doi.org/10.1038/s41467-019-13555-x.
Senft AD, Macfarlan TS. Transposable elements shape the evolution of mammalian development. Nat Rev Genet. 2021;22(11):691–711. https://doi.org/10.1038/s41576-021-00385-1.
Jacques PE, Jeyakani J, Bourque G. The majority of primate-specific regulatory sequences are derived from transposable elements. PLoS Genet. 2013;9(5):e1003504. https://doi.org/10.1371/journal.pgen.1003504.
Sundaram V, Cheng Y, Ma Z, Li D, Xing X, Edge P, et al. Widespread contribution of transposable elements to the innovation of gene regulatory networks. Genome Res. 2014;24(12):1963–76. https://doi.org/10.1101/gr.168872.113.
Hutchins AP, Pei D. Transposable elements at the center of the crossroads between embryogenesis, embryonic stem cells, reprogramming, and long non-coding RNAs. Sci Bull (Beijing). 2015;60(20):1722–33. https://doi.org/10.1007/s11434-015-0905-x.
Garcia-Montojo M, Doucet-O'Hare T, Henderson L, Nath A. Human endogenous retrovirus-K (HML-2): a comprehensive review. Crit Rev Microbiol. 2018;44(6):715–38. https://doi.org/10.1080/1040841X.2018.1501345.
Mager DL, Stoye JP. Mammalian endogenous retroviruses. Microbiol Spectr. 2015;3(1):MDNA3-0009-2014. https://doi.org/10.1128/microbiolspec.MDNA3-0009-2014.
Xiang R, Berg IVD, MacLeod IM, Hayes BJ, Prowse-Wilkins CP, Wang M, et al. Quantifying the contribution of sequence variants with regulatory and evolutionary significance to 34 bovine complex traits. Proc Natl Acad Sci U S A. 2019;116(39):19398–408. https://doi.org/10.1073/pnas.1904159116.
Fang L, Liu S, Liu M, Kang X, Lin S, Li B, et al. Functional annotation of the cattle genome through systematic discovery and characterization of chromatin states and butyrate-induced variations. BMC Biol. 2019;17(1):68. https://doi.org/10.1186/s12915-019-0687-8.
Zhao R, Tian L, Zhao B, Sun Y, Cao J, Chen K, et al. FADS1 promotes the progression of laryngeal squamous cell carcinoma through activating AKT/mTOR signaling. Cell Death Dis. 2020;11(4):272. https://doi.org/10.1038/s41419-020-2457-5.
We are grateful to colleagues in State Key Laboratory of Pig Genetic Improvement and Production Technology, Jiangxi Agricultural University for sample collection.
This work was supported by National Key Research and Development Program of China (2021YFF1000601).
Ethics approval and consent to participate
All of the procedures involving animals are in compliance with the care and use guidelines of experimental animals established by the Ministry of Science and Technology of China. The Ethics Committee of Jiangxi Agricultural University approved this study.
Consent for publication
The authors declare that they have no competing interests.
Summary information on the ATAC-seq samples investigated in this study. Table S2. Summary information on the RNA-seq samples investigated in this study. Table S3. The information of download data. Table S4. Comparison with peaks of previous study designed by zhou et al. Table S5. Motif enrichment analysis of shared peaks. Table S6. The information of PERV overlapped peaks. Table S7. The information of GWAS SNPs which located in peak regions. Table S8. The information of virus data. Table S9. The information of virus data with sequence homology to shared open TEs. Table S10. GO and KEGG enrichment analysis of conserved accessible TEs homologous to viral sequences.
The analysis process and characteristics of peaks. (a) and (b) The ATAC-seq signal enrichment around TSSs and insert size distribution of ATAC-seq for 2 samples (brain_sample1 and brain_sample2). (c) Proportion of tissue-specific peaks that were distal peaks or proximal peaks. (d) Distribution of peak size obtained in this study. Fig. S2. Identifying the functional pathway and tissue specificity via RNA-seq. (a) Hierarchical clustering using gene expression. (b) Functional enrichment of tissue-specific expression genes for each tissue based on Gene Ontology (GO) biological processes (left). Distribution of the number of tissue-specific genes (right). (c) Representative examples of heart specific gene which is highly expressed in heart tissue. (d) Bar plot showing representative biological processes enriched in target genes of the system specific peaks and conserved peaks. Different colors refer to different systems, and light yellow indicates biological processes enriched in target genes of conserved peaks. (e) Bar plot showing peak intensity of liver specific peaks which located close to ARG1 (a protein coding gene expressed primarily in the liver and involved in the urea cycle). (f) Functional enrichment of genes for each tissue based on GO biological processes. The columns represent 20 tissues. The rows represent GO terms in each tissue (upper). Representative examples of enriched biological processes in target genes of heart specific peaks (bottom). Fig. S3. Hierarchical clustering inferred from peak intensity and gene expression. (a) Hierarchical clustering using peak intensity. The top 100,000 peaks with high intensity were used for the analysis. (b) Hierarchical clustering using gene expression. The top 5000 highly variable genes were used for the analysis. Fig. S4. The flowchart and outcomes from the correlation between peak intensity and gene expression.
About this article
Cite this article
Jiang, T., Ling, Z., Zhou, Z. et al. Construction of a transposase accessible chromatin landscape reveals chromatin state of repeat elements and potential causal variant for complex traits in pigs. J Animal Sci Biotechnol 13, 112 (2022). https://doi.org/10.1186/s40104-022-00767-3
- Chromatin accessibility
- Tissue specific
- Transposable elements