INTRODUCTION
Vertical transmission of genetic information in diploid organisms through sexual reproduction ensures an equal amount of genetic contribution from male and female parents for the autosomes. Accordingly, the expression of genetic information from offspring is expected to have an equal presence of maternal and paternal alleles. However, a subset of genes shows deviation from the expected equal presentation of parental alleles and preferentially express the allele from a single parent referred as allele-specific expression (ASE) or allele-biased expression. The degree of expression bias varied from complete monoallelic expression to preferential overexpression of an allele from a single parent [
1]. Additionally, the pattern of ASE could be parent-of-origin dependent, namely, genomic imprinting [
2] or autosomal random monoallelic expression (RMAE) [
3].
The mechanisms underlying imbalanced allelic expression could be several-fold including DNA methylation, histone modification, and the influence of cis- and trans- regulatory elements [
4]. The ASE can significantly affect the phenotypes of individual organisms. For example, disruption of the imprinting control elements result in alteration of gene expression and phenotypic abnormalities [
5]. Therefore, understanding the nature of ASE associated with epigenetic regulation and identification of loci involved in the phenomenon is important in animal genetics and developmental biology.
Genomic imprinting has been observed in therian species in animals [
2]. Approximately more than 180 imprinted genes have been reported in mammals to date and most results were from humans and mice [
6]. For livestock species, most studies were of comparative analyses on identified imprinted genes from humans and mice [
7]. Approximately 20 genes were confirmed to be imprinted in pigs [
6,
7]. Therefore, the finding of ASE and subsequent understanding of species variation in livestock species has been limited.
In animal breeding, genomic imprinting could play an im portant role in phenotypes related to economically important traits such as body composition [
8]. As an attempt to further understand the mechanisms of epigenetic regulation such as imprinting in pigs, the methylation pattern of the pig genome was analyzed [
9,
10]. Although many imprinted genes in other species could be still conserved in pigs [
7], genome-wide direct investigation of ASE in pigs could significantly contribute to illuminate the characteristics of genomic imprinting in pigs. However, tracing the parent of origin for the expressed genes at the genome level has been a great challenge in outbred animals [
11].
The list of genes associated with allelic imbalance in gene expression could be larger than those identified currently. High throughput technologies for genome and transcriptome analyses were successfully employed to better understand ASE at the genome level [
12,
13]. High-throughput analysis of the neocortex transcriptome from reciprocal crosses of two different strains of mice showed that a much larger number of genes showed differential allelic expression than that expected [
12]. A similar study was carried out in pigs without parental genome information [
14]; however, studies using both whole genome sequences of parents and the transcriptomes of F
1 offspring from reciprocal crosses have not been reported in pigs.
In this study, we tried to determine the expression level of each parental allele in RNA-seq analysis results of F1 offspring from a pair of reciprocal crosses based on the whole genome sequencing results of parents. We identified nine genes with allele-biased expression, showing both the possibility and limit for the genome-wide identification of genomic imprinting using the reciprocal cross design in outbred animals. Further studies on the newly identified genes of allele-biased expression should expand our current understanding on the ASE in the porcine genome including genomic imprinting.
DISCUSSION
Analysis of genes showing ASE using the reciprocal cross of inbred animals is an effective method to discover genomic imprinting associated genes [
12,
13]. However, the use of similar approaches for outbred animals like pigs is challenging because of inherent difficulty in distinguishing the parental origin of any given allele due to the presence of segregating multi allelic polymorphisms in the breed [
11]. To investigate the efficiency of experimental outcomes for detecting ASE from the reciprocal cross design in outbred animals and newly identify those genes, we carried out a pair of reciprocal crosses using KNP and Landrace pigs, determined the parental lineage of alleles, and analyzed the presence of ASE in genes from F1 animals. Because of the low-depth read coverage of parental genomes (~4×) and transcriptomes (~15×) of offspring, genome-wide evaluation of biased allelic expression was not achieved. However, we were able to present several genes showing allele-biased expression including a well-known imprinted gene,
PEG10. We also compared the efficiency of two different read mapping strategies for the bioinformatic determination of ASE at a low-depth read coverage in outbred animals.
Discovery of the flipped allelic expression pattern at SNP positions from F1 animals of the reciprocal crosses can suggest the presence of allele-biased expression patterns such as genomic imprinting. However, the heterozygous SNP positions are not always informative concerning transmission in outbred strains or lines, and even not all SNP positions are heterozygous. Therefore, determination of parent of origin for a given allele is often unresolvable, which leads to significant restriction in genetic analyses. It has been suggested that a large sample size (at least >30 informative individuals) is necessary for efficient evaluation of allele-biased expression using RNA-seq for outbred or semi-inbred species to achieve genome-level coverage [
24].
In this study, we analyzed four neocortex transcriptomes consisting of pooled RNA from three individuals for each library using 12 F1 animals from KNP×Landrace reciprocal crosses to reduce the number of RNA-seq analyses. However, the lower read depth for mapped genes in our sequencing results does not allow us to clearly determine the origin of parents in the offspring. Thus, we are only able to use the variant information in homozygous status to estimate allele-biased expression. Consequently, only a limited number of genes were evaluated in our results despite the use of whole genome sequences of parents. Our results also suggest that the use of individual sequencing strategies is likely to provide improved results compared to the analysis of pooled samples.
Determination of parental origin of expressed alleles in F
1 individuals from RNA-seq data can be efficiently achieved using bioinformatic analysis tools if parent-specific SNPs are clearly distinguishable. However, variant calling in RNA-seq is still challenging because of experimental limitations such as biases from library preparation, low sequencing read depth, experimental errors, and biological variations such as ASE, splicing variation, and RNA editing [
25]. Therefore, the results of variant calling may significantly differ depending on the analysis tools and statistical values.
To overcome the disadvantage of low sequencing depth, we carried out bioinformatic analysis in two different ways by either mapping the genome sequencing results of each parent individually or of two parents of the same breed together to determine the breed- or parent-specific SNPs. The joint read mapping showed about two-fold increase in the number of candidate SNPs available for evaluating allele-biased expression, but the increase was still limited (
Figure 3A), suggesting that the number of informative individuals is critical for genome wide analysis in outbred animals. However, we also noticed unique SNPs associated with each strategy (
Figure 3A). The difference could be due to a bias in SNP calling from the difference in read depth between the two strategies.
To understand the difference between the strategies, we carried out manual confirmation of the identified candidate SNPs using raw variant data files. Most conflicts in SNP calling either produced false-positive SNPs from homozygotes or failure in detecting SNPs from heterozygotes due to the low read depth (data not shown). However, the error rate was lower in strategy I and results were more consistent compared to those of strategy II which involved joint mapping of two parents of the same breed.
We identified nine allele-biased-expressed genes in the neocortex of pigs using the described bioinformatic procedure in
Table 3. Among them,
PEG10 is a known paternally imprinted gene in both human and pig [
7], and this gene has been reported to be associated with several malignancies, such as hepatocellular carcinoma and B-cell lymphocytic leukemia in human [
26]. ASE of
SLC6A17 and
MANBA has also been reported in previous studies investigating other species [
13,
27]. The protein encoded by
SLC6A17 is a member of the SLC6 family of transporters, which are responsible for the presynaptic uptake of neurotransmitters [
28].
MANBA encodes beta-mannosidase which localizes to the lysosome [
29]. Three out of seven genes (43%) that we observed to show allele-biased expression in this study were reported previously, indicating that the bioinformatic strategy used in this study is suitable for identifying allele-biased expression in outbred strains.
Although further experimental confirmation remains to be carried out to clearly prove the ASE through independent breeding experiments, we suggested a list of new candidate genes for the ASE in pigs. However, the number of animals used for reciprocal crosses and sequencing read depth should be increased to cover a large number of genes as the genome wide analysis.
NUSAP1 is a nucleolar-spindle-associated protein that plays a role in spindle microtubule organization [
30]. However, no information has been available regarding its ASE. The expression pattern of
FAM83H,
SLC6A17,
MANBA, ENSSSCG 00000010703, and ENSSSCG00000010719 was different from that of genomic imprinting, which could be explained by cis-regulating expression quantitative trait loci [
31] or RMAE [
3]. In addition, although statistically less significant,
PPFIBP1, which encodes liprin-beta-1 protein acting functioning in cell adhesion [
32], showed an expression pattern of maternal imprinting (
Table 3,
Supplementary Table S1).
Taken together, our results showed that the strategy and bioinformatics pipeline used in this study are suitable for the identification of genes showing allele-biased expression from reciprocal crosses of outbred animals with some limitations. Experimental validation of candidate genes and further studies on these genes should provide new information on genomic imprinting in pigs.