Genome-wide association study for intramuscular fat content in Chinese Lulai black pigs

Objective Intramuscular fat (IMF) content plays an important role in meat quality. Identification of single nucleotide polymorphisms (SNPs) and genes related to pig IMF, especially using pig populations with high IMF content variation, can help to establish novel molecular breeding tools for optimizing IMF in pork and unveil the mechanisms that underlie fat metabolism. Methods We collected muscle samples of 453 Chinese Lulai black pigs, measured IMF content by Soxhlet petroleum-ether extraction method, and genotyped genome-wide SNPs using GeneSeek Genomic Profiler Porcine HD BeadChip. Then a genome-wide association study was performed using a linear mixed model implemented in the GEMMA software. Results A total of 43 SNPs were identified to be significantly associated with IMF content by the cutoff p<0.001. Among these significant SNPs, the greatest number of SNPs (n = 19) were detected on Chr.9, and two linkage disequilibrium blocks were formed among them. Additionally, 17 significant SNPs are mapped to previously reported quantitative trait loci (QTLs) of IMF and confirmed previous QTLs studies. Forty-two annotated genes centering these significant SNPs were obtained from Ensembl database. Overrepresentation test of pathways and gene ontology (GO) terms revealed some enriched reactome pathways and GO terms, which mainly involved regulation of basic material transport, energy metabolic process and signaling pathway. Conclusion These findings improve our understanding of the genetic architecture of IMF content in pork and facilitate the follow-up study of fine-mapping genes that influence fat deposition in muscle.


INTRODUCTION
Pork is an important dietary source for humans, accounting for about 40% of the world' s meat consumption. Its quality has a great influence on consumer preferences. It is generally accepted that a high level of marbling or intramuscular fat (IMF) content has a positive influence on the eating quality of pork [1]. Over the last decades, pig selection has mainly focused on growth traits and lean meat rate. Due to the reverse correlation between lean meat rate and IMF con tent, current commercial pigs deposit less fat in backfat as well as muscle at market weights than native breeds. Therefore, the pork industry is greatly interested in augmenting the IMF content to satisfy the eating experience of the consumer.
IMF content displays substantial genetic variations between breeds and even lines within breed [2,3]. Although large differences exist, improvement by normal breeding programs is challenging due to the difficulty of their phenotypic measurement. The measurement of the trait is not only timeconsuming and costly, but samples are collected only after slaughtering the pigs. Genetic selection is generally believed to be a promi sing approach to improve the trait. Thus, it is worthwhile to study the genetic mechanisms underlying IMF content, which would allow us to establish novel molecular breeding tools for its optimization in pork. Furthermore, since the pig is an excellent animal model to study various human diseases [4], dissection of the genetic architecture of fat deposition in por cine muscle can also provide information for unveiling the mechanisms that underlie fat metabolism in humans.
Over the past two decades, using microsatellite markers, many quantitative trait loci (QTLs) that contribute to the IMF content [5,6] have been detected in pigs. However, due to the low density of microsatellite markers, these QTLs represent large chromosomal regions, and further finemapping studies are necessary to find the causal mutation responsible for these effects. With the development of highthroughput single nucleo tide polymorphism (SNP) genotyping methods, pigspecific highdensity genotyping chips have become available. Ge nomewide association studies (GWAS), using the chips, makes it possible to detect the genetic variants underlying the traits of interest. Recently, several GWAS have made substantial progresses in identifying genetic factors associated with or underlying a variety of traits, including IMF content [710]. However, compared with other traits, studies dissecting varia tions for IMF content are still rare. Therefore, research should further investigate the genetic mechanism of IMF content, es pecially using pig populations with high IMF content variation.
Lulai black pigs are a newly developed breed which originated from a cross between Chinese Laiwu pigs (a typical Chinese indigenous breed) and Yorkshire pigs, and then artificially selected for more than eight generations. This breed, similar as Laiwu pigs, is known for its high IMF content [11,12]. Here, to unveil the genetic architecture of IMF content of pork, we measured IMF content of Lulai black pigs, and conducted GWAS for the trait.

Ethics statement
The whole study protocols for animal rearing and slaughter were reviewed and approved by the guideline (IACC20060101, 1 January 2006) of the Institutional Animal Care and Use Com mittee of Institute of Animal Science and Veterinary Medicine, Shandong Academy of Agricultural Sciences.

Animal resource
Four hundred and eighty Lulai black pigs were purchased from nucleus farms of the breed in Laiwu, Shandong province. These pigs originated from 12 sires and 95 dams, representing off spring of all boars and most sows in the farm. Boars were castrated before day 60. All pigs had ad libitum access to a corn soybean based diet containing 13.3% crude protein, 12.43 MJ/kg digestible energy and 0.80% lysine under standard manage ment conditions. Out of the 480 pigs, 453 pigs, including 289 males and 164 females, grew well and were slaughtered when their weights were in the range of 70 to 100 kg. About 200 g longissimus dorsi muscle was collected from the last rib of each pig and stored at -20°C for further measurement.

Measurement of intramuscular fat content
After removing adipose and connective tissue, the longissimus dorsi muscle was ground and analyzed for moisture by rou tine ovendrying method. Then, samples that had already been analyzed for moisture were ground into powder, which were further used for measuring IMF content. IMF content was determined using the Soxhlet petroleumether extraction method and expressed as the weight percentage of wet muscle tissue.

Single nucleotide polymorphism array genotyping and quality control
A standard phenol/chloroform method was used to extract genomic DNA from muscle samples for all the pigs. The quality and concentration of genomic DNA fulfilled the require ments for the Illumina SNP genotyping platform. Samples were genotyped with GeneSeek Genomic Profiler Porcine HD BeadChip (Neogen Corporation, Lansing, MI, USA) according to the manufacturer's protocol and genotypes were called using GenomeStudio (version 2011.1; Illumina Inc., San Diego, CA, USA).
Quality controls were implemented by Plink v1.07 [13] ac cording to the following filtering. Firstly, SNPs with GC score below 0.2 were considered failed genotypes, and Fimpute [14] was used to impute the failed loci. Then, SNPs were excluded from the data set if i) SNPs without genome location based on the pig genome assembly Sus scrofa Build 10.2 or located on Y chromosome, ii) its minor allele frequency was <5%, or iii) it departed severely from Hardy-Weinberg equilibrium with a pvalue lower than 10 6 .

Statistical analysis
The association analysis was performed using the software GEMMA [15]. SNPs were individually tested for association with IMF content using the following linear mixed model: where Y is the vector of phenotypic values of IMF content; X is the incidence matrices of fixed effects including population mean, sex and weight, and f is the vector for these fixed effects; Z is the identity matrix, and p is the vector for polygenic effects with distribution of City, State, USA) according to the manufacturer's protocol and genotypes were called using GenomeSt  Quality controls were implemented by Plink v1.07 [13] according to the following filtering. Firstly, SNPs 3 GC score below 0.2 were considered failed genotypes, and Fimpute [14] was used to impute the failed loci. T 4 SNPs were excluded from the data set if i) SNPs without genome location based on the pig genome assembly 5 scrofa Build 10.2 or located on Y chromosome, ii) its minor allele frequency was <5%, or iii) it departed sev 6 from Hardy-Weinberg equilibrium with a p-value lower than 10 6 . The association analysis was performed using the software GEMMA [15]. SNPs were individually tested 10 association with IMF content using the following linear mixed model: where Y is the vector of phenotypic values of IMF content; X is the incidence matrices of fixed effects inclu 15 population mean, sex and weight, and f is the vector for these fixed effects; Z is the identity matrix, and p i at the tested SNP, and g is the additive genetic effect of the tested SNP; and e is the vector of residuals.

20
, where G is the genomic rela tionship matrix that was constructed based on genomewide SNP markers and City, State, USA) according to the manufacturer's protocol and genotypes were called using GenomeStudio  Quality controls were implemented by Plink v1.07 [13] according to the following filtering. Firstly, SNPs with 3 GC score below 0.2 were considered failed genotypes, and Fimpute [14] was used to impute the failed loci. Then, 4 SNPs were excluded from the data set if i) SNPs without genome location based on the pig genome assembly Sus 5 scrofa Build 10.2 or located on Y chromosome, ii) its minor allele frequency was <5%, or iii) it departed severely 6 from Hardy-Weinberg equilibrium with a p-value lower than 10 6 . The association analysis was performed using the software GEMMA [15]. SNPs were individually tested for 10 association with IMF content using the following linear mixed model: where Y is the vector of phenotypic values of IMF content; X is the incidence matrices of fixed effects including 15 population mean, sex and weight, and f is the vector for these fixed effects; Z is the identity matrix, and p is the vector of 0, 1, 2 values, where 0 and 2 correspond to the two homozygous genotypes and 1 represents the heterozygous genotype at the tested SNP, and g is the additive genetic effect of the tested SNP; and e is the vector of residuals. Firstly, permutation was adopted to adjust for multiple test ing for the number of SNPs tested through constructing an empirical distribution of the test statistic under the null hypo thesis. Specifically, the phenotypic observations of each trait were randomly shuffled 10,000 times, and the empirical criti cal values for chromosomewise and genomewise significance were determined by the 95th percentile of the highest test statistic over the 10,000 permutation replicates. Then, to detect significantly associated SNPs in a study with low power, a liberal cutoff (raw pvalue and p<0.001) was used as the cutoff for statistical significance of SNPs. The QQ plot and linkage disequilibrium (LD) analysis of the significant SNPs were con structed using the R software and Haploview [16], respectively.

Characterization of candidate genes
To identify plausible candidate genes, we retrieved annotated genes within a 50 kb region centering each significant SNP from Ensembl Genes 89 Database using BioMart (http://asia. ensembl.org/biomart/martview/). Functional analyses, includ ing gene ontology (GO) and reactome pathway enrichment, were performed using PANTHER 13.1 (Feb. 2018) (http:// www.pantherdb.org/) to reveal the potential biological function of annotated genes. Furthermore, QTLs of IMF were down loaded from the AnimalQTL database (www.animalgenome. org, Release 34, Dec 21, 2017), and compared with those sig nificant SNPs based on the putative location of these QTLs.

Summary of phenotypic and single nucleotide polymorphism data
The mean and variation coefficient of the phenotypic observa tions of the IMF content were 5.18% and 65.07%, respectively. The IMF content increased significantly with the weight in crement (p<0.05), with the average of 4.80% for weight range 70 to 80 kg, 4.94% for 80 to 90 kg and 5.93% for 90 to 100 kg, respectively. On the other hand, out of the 68,516 SNPs geno typed, a final set of 49,383 SNPs passed the filters and was retained in the data set for further statistical analyses. The dis tribution of SNP markers and marker density on autosomes are shown in Figure 1. The number of SNPs on every chro mosome varies from 1,403 on Chr.18 to 4,728 on Chr.1, and the adjacent distance ranges from 32.00 Kb on Chr.12 to 66.65 Kb on Chr.1. Compared with the common used Porcine SNP60 BeadChip in previous study [17,18], this porcine BeadChip has more usable, and evenly distributed SNPs.

Significantly associated single nucleotide polymorphisms identified in genome-wide association study
Linear mixed models implemented in GEMMA [15] were used in the study to perform the association analysis of IMF content and SNPs. To adjust the effect of inbreeding, a genetic relationship matrix was constructed and used during the anal ysis. The profile plot of the p values (in terms of -log 10 (p)) for IMF content is shown in Figure 2. QQ plot of the tested SNPs is provided in Supplementary Figure S1, and the SNPs tested show no evidence of overall systematic bias.
By the cutoff p<0.001, a total of 43 SNPs from 18 autosomes were identified to be significantly associated with IMF con tent, and the details of the significant SNPs are given in Table  1. Among these significant SNPs, the greatest number (n = 19) was detected on Chr.9. The LD block analysis ( Figure 3) by Haploview 4.2 [16] indicated that there are two LD blocks formed among these significant SNPs on Chr.9. Especially in the larger LD, five SNPs are in the complete linkage.

Functional characterization of significant single nucleotide polymorphisms
Among the 43 significant SNPs identified in the study, there are 15 significant SNPs located in the protein coding sequences, while the others are mapped in the noncoding sequences. It is consistent with the previous studies that the clear majority of significantly associated SNPs reside outside the proteincod ing regions, underscoring the important function of noncoding sequences in the genome [19]. Association studies essentially identify a genomic location related to phenotypic traits but provide little functional sig nificance, especially those significant SNPs occurring in non coding sequences. To identify possible candidate genes related with IMF content, a total of 42 annotated genes (Supplemen tary Table S1) centering the significant SNPs detected were obtained from Ensembl database. Overrepresentation test of reactome pathways and GO terms using PANTHER revealed some enriched reactome pathways and GO terms, which are provided in Supplementary Table S2 and S3. Most of the terms and pathways were involved regulation of basic material trans port, energy metabolic process and signaling pathway.
To test whether the significant SNPs were mapping into QTLs of IMF detected in previous studies, we downloaded QTLs of IMF from the pig QTL database and compared by their physical positions. Consequently, these are 17 significant SNPs (Supplementary Table S4) mapped to 6 QTLs of IMF.

DISCUSSION
In the present study, we measured IMF content of 453 Lulai black pigs. Consistent with previous reports [11,20], our results suggested that Lulai black pigs had high capability in deposit ing IMF, with the average of 5.18%. The IMF content increased significantly with the weight increment, which was also reported in the previous studies [20,21], indicating IMF is deposited at a greater rate when pigs get older. Furthermore, the popula tion variation of IMF content was rather large, with variation coefficient of 65.07%, suggesting genotypes of SNPs affecting IMF content are still segregating in the population.
One of the major challenges in GWAS is multiple hypothesis testing. Permutation test is the gold standard in multiple test ing corrections [22]. In our study, permutation was firstly used to adjust multiple testing. However, no significant SNPs were identified to be associated with IMF content at the genome wide significance level, and only three SNPs (green dots in Figure 1) were found at chromosomewide significance. The specific samples used in this study may be one possible reason leading to the few significant SNPs detected. As mentioned above, Lulai black pigs originated from a cross between typical Chinese Laiwu pigs and Yorkshire pigs. So, they have a com plicated genetic background, and strict adjustments were implemented to correct potential effects of population struc ture in the analyses using GEMMA, which may decrease the significant SNPs identified. The other reason may be the smaller sample size used in our study. In our study, the sample size used (453 Lulai pigs) is small, which reduces the power to detect SNPs related to IMF contents. Consequently, to detect sig nificantly associated SNPs in the study, a liberal cutoff (raw p value and p<0.001) was taken instead of generally used multiple testing correction method. This might cause some significant SNPs to be false positives. But, as proved by map ping with previously reported QTLs and functional analyses of annotated genes centering significant SNP, the SNPs de tected by liberal p value in this study can be suggested as SNPs affecting IMF content.
Furthermore, part (n = 17) of the significant SNPs identified are mapped to previously reported QTLs of IMF. However, compared with the previous GWAS, our results do not repli cate the significant SNPs identified by them. The different SNP chips and genome assembly used may be possible reasons. However, even using the same SNP chips, a lack of reproduci bility of genetic associations has been frequently observed in GWAS [23]. For instance, using Illumina PorcineSNP60 Bead chip, Luo et al [8] identified 40 significant SNPs for IMF in a porcine Large White×Minzhu intercross population, while Ma et al [24] detected 7 and 25 significant SNPs for IMF in the Duroc×Erhualian F2 animals and Chinese Sutai pigs. Of note, no common SNPs for IMF were found in the two pop ulations, Duroc×Erhualian F2 animals identified in Large White×Minzhu intercross population was confirmed by Ma et al [24]. One reason for the lower reproducibility is that GWAS can generate some false positive and false negative associa tions, although sophisticated statistical tests have been proposed to reduce false positives. Another possible reason is that IMF content is complex trait, for which genegene and geneenvi ronment interactions possibly have different genetic effects across population. Therefore, further confirmation of these associations in larger independent populations would be war ranted to test if the significant SNPs identified in this study are populationspecific before their incorporation into breeding programs.