INTRODUCTION
Several important evolutionary processes in finite populations, including migration, artificial or natural selection, and genetic drift lead to non-random association of alleles between two different loci, or linkage disequilibrium (LD) [
1]. Recent genomic methods in animal breeding, such as genome-wide association studies (GWAS) and genomic selection using single nucleotide polymorphism (SNP) data, depend on the extent of LD and its rate of decline with distance between loci of the population. Researchers have applied SNP chips for GWAS [
2–
4] and genomic selection studies [
5–
7] in pigs but have found that most traits of interest are complex and more suitable for genomic selection techniques than GWAS for studying significant associations in some genomic regions [
8]. Methods used to study animal breeding genetics rely strongly on the quality of LD and sample size. Therefore, characterization of LD is very important for planning future genomic studies of complex traits relevant to animal breeding.
The LD among loci can provide insights into the evolutionary history of each population by using effective population size (N
e), where N
e is the number of individuals in an idealized population that would yield the same inbreeding degree as the real population [
9]. Thus, we can monitor genetic diversity in each domesticated animal population based on N
e and explain the observed extent and pattern of genetic variation in population genetic terms. Using N
e, we can also predict loss of genetic variation from a prospective point of view and the accuracy of natural genomic selection before the emergence of artificial genomic selection in domesticated animal breeding. Additionally, we can infer ancestral N
e using the strength of LD at different genetic distances between markers. The pattern of historical N
e in each animal population can increase our understanding of the impact of recent strong artificial selection breeding methods on population-level genetic variation. If pedigrees are incomplete or unavailable, we can use inbreeding information of populations of interest with respect to N
e.
In Korea, several pig breeds are economically significant, including Yorkshire, Landrace, Duroc, and Berkshire, though Yorkshire is one of the most important breeds because has an excellent maternal line index and outnumbers all other maternal line pig breeds in Korean Grand-Grand-Parent (GGP) farms. Patterns of LD in Yorkshire populations in other countries have been characterized previously, and their N
e were predicted using SNP chip data. For example, the extent of LD of multiple pig breeds, including Yorkshire, from the United States, Denmark, and The Netherlands were investigated [
10–
12]. In addition, Uimari determined the extent of LD and estimated LD-based actual N
e and ancestral N
e of 32 Finnish Yorkshire boars [
13]. Although the Korean pig industry is active, Korea has no original Yorkshire breed and relies on significant pig imports. As a result, the Korean Yorkshire population has diverse genetic sources from several countries, so N
e could be an important measurement of both the Korean and global pig industries.
In our current study, LD and Ne in a Korean Yorkshire population were estimated using data generated with the Illumina PorcineSNP60 v2 BeadChip (Illumina Inc., San Diego, CA, USA). We also investigated the ancestral Ne of the population. Together with findings from other studies, our results can inform the establishment and implementation of the most effective animal breeding genomic methods for Korean Yorkshire swine.
DISCUSSION
Four GGP farms in this study were operated by each different farm owners, but they were connected by sharing some semen. To show the genetic background of these population, we made principal component analysis plot including PC1 and PC2 (
Figure 6). We could found that four groups were clearly distinguished, but the difference is not large, and the explanation variance of PC1 and PC2 is also very small. So we regarded four representative GGP farms as one Yorkshire population in Korea. We removed 9,303 SNPs by the quality control of Hardy-Weinberg equilibrium test for 2,470 Yorkshire pigs. We thought that the reason why so many SNPs removed by the quality control was related to number of heterozygous SNP alleles. A strong degree of selection in GGP would have reduced the effective population and had an impact on the number of heterozygous alleles, which could increase the degree of loss of heterozygosity. For this reason, we thought that many SNPs have been removed. Additionally, becasue core objective was estimation of N
e of Yorkshire population in Korea, we need one Yorkshire population in Korea and high quality SNPs of this population. And we thought that number of SNPs (33,418 SNPs) after quality control was sufficient for estimation of N
e when compared to finnish Yorkshire (Uimari [
13]). So we thought that regarding 4 GGP farms as one Korean population certainly was appropriate.
We investigated the extent of LD and changes of the N
e of the Korean Yorkshire population based on whole-genome SNP data. The observed LD, measured as
r2, extended for a long distance based on the adjacent 100 SNPs of each SNP studied in the genome. A previous study used large pedigree datasets and small genomic datasets [
13], but instead of pedigree data, we used large-scale genomic data from swine in GGP farms to characterize LD and estimate N
e. Because domesticated pigs have long been strongly and artificially selected, observed LD in the Korean Yorkshire population was higher with shorter genomic distances and more extensive compared to human populations [
21]. Declining LD in the Korean Yorkshire population is consistent with previous studied pigs [
13,
22] as well as other domesticated animals [
17,
18].
We estimated N
e of Korean Yorkshire swine based on a non-linear regression model that describes the relationship between linkage distance and LD. Estimating N
e using this equation raises difficulties in handling values in the limits of the parameter space, because if
r2 = 0.0, the estimated N
e is infinite, and if
r2 = 1.0, N
e is zero. In this study, we calculated
r2 for the adjacent 100 SNPs of each SNP to decrease bias in handling these values. The results from this simplified approach yielded quite similar estimates of N
e as other previous studies. Another concern related to the relationship between estimated LD and distance between SNPs lies in the accuracy of the porcine reference genome used in this study. The order and distance between SNPs in the commercial Illumina PorcineSNP60 v2 BeadChip will likely be refined because the reference genome version will also be updated. However, the bias from incorrectly ordered SNPs or wrong SNP distances between SNPs may be minimized by the large number of SNP pairs used in this study, so some overestimated or underestimated LD can be overlooked. Because the relationship between genetic and physical distances varies across chromosomes and chromosomal regions, we inferred the Mb/cM ratio per chromosome using physical map position information from the porcine reference genome and from a previous Yorkshire pig study [
19]. We then used genetic distances based on physical distances to estimate N
e and could obtain more reliable N
e estimates with such detailed estimates of genetic distances between SNPs. Finally, one study reported that a limited sample size can bias the estimates of
r2 and recommends correcting the estimates of
r2 for sample size
n (
r2–1/2
n) and using the equation of Sved [
15]. However, our sample size was sufficient enough to correct estimates of
r2. To estimate N
e of Korean Yorkshire pigs, we used alternative version of this equation further derived by Tenesa [
23], which adds a new parameter
a to account for mutations. Based on the new formula, the initial value of parameter
a = 2 in estimations of parameters using a non-linear regression model in R. Regarding variance heterogeneity of the observed
r2, the variance declined with increased distances between SNPs, which may have impacted our results when estimating parameter
b in
Equation (2) (
Supplementary Figure S2). A significant, negative relationship between chromosome length and estimates of parameter
b obtained from a non-linear model have been observed [
17], while others have noted a positive relationship in domestic livestock species [
24,
25] or did not investigate directionality of the relationship [
18]. Because the evolutionary history of each species and breed is different, the relationship between chromosome length and parameter
b is also different per population. In this study, all marker pairs were calculated in each bin so that
r2 was not affected by chromosome length. These results are in agreement with the Yorkshire LD characterization findings of Uimari [
13]. We did not observe a significant relationship between chromosome length and estimates of
b in our study population.
Our estimate of
b represents an estimated N
e assuming a constant present population size, because we used genetic data from the Korean Yorkshire population consisting of pigs in major GGP farms. In the calculation of N
e,
b in
Equation (2) represents a conceptual average of N
e over the period inferred from the SNP pair distance ranges per chromosome [
26]. We regarded parameter
b combined by meta-analysis as reflecting the current N
e of the Korean Yorkshire population.
As
Table 1 showed, we produced SNP chip data of 2,470 individuals in between 2011 and 2015. And approximately 87% (2,149 individuals) of total SNP data was produced in between 2014 and 2015. Because sampling period was short, we regarded sampled individuals in this study as “current generation population” in Korean Yorksrhie population. So we thought that we need not to sort 2,470 individuals data according to accurate generation. Instead, we infered long time generation-related change of N
e using another method in this study. Hayes [
16] reported that the degree of linkage equilibrium according to genetic distance had reflected genetic diversity of past generation. After
r2 estimation using SNP data, we divided
r2 by distance and inferred N
e of past generation. The LD over greater genetic distances reflects a population’s recent history, whereas LD over shorter distances depends on the N
e many generations ago [
16,
27] (
Supplementary Table 3). Historical N
e estimation suggests a linear population as reported in a previous study (
Figure 5;
Supplementary Figure S4) [
16]. The observed pattern displayed a consistent decrease in N
e from 100 generations ago to the present, decreasing by 99.6% from 10,000 to 100 generation ago. Several explanations exist for this pattern, including bottlenecks associated with domestication, selection, and breed administration, business strategy, and endangerment of the breed. Therefore, our results should be considered in context of the demographic history of the Yorkshire population in Korea. The reliability of predicting changes in N
e over time depends both on technical implementation and proper iteration based on previous studies using this approach [
17,
18].
We aimed to characterize LD and N
e in a Korean Yorkshire population using genomic data from thousands of individuals. Our observed LD patterns are similar to the average value presented by Du for six commercial pig lines and Uimari for Finnish Yorkshire pig breeds [
13,
22]. The overall LD in Finnish Yorkshire breed appears to be stronger than in Korean Yorkshire pigs. Because the Korean Yorkshire population consists of seed pigs from several original Yorkshire breeds, the genetic diversity of Korean Yorkshire pigs is greater than that of the single Finnish Yorkshire breed, and the LD of Finnish Yorkshire pigs is higher than that of Korean Yorkshires.
The minimum number of breeding animals recommended by the UN Food and Agriculture Organization is 50, although Meuwissen suggested this number is the lower limit for a critical population size, proposing that the actual size should be between 50 and 100 [
28]. The current N
e of the Korean Yorkshire population is 122.87, which is sufficient to maintain the population’s viability. The population’s genetic variation enables an acceptable inbreeding rate, including compromising genetic gain in commercially important traits. This genetic variation is necessary to apply methods that maximize selection efficacy with a fixed rate of inbreeding or optimize the use of genetic resources from the parental generation [
29].
When we apply a new genetic method, such as genomic selection, for estimating breeding values, the N
e may be very small or continually decrease [
8]. Therefore, one must carefully consider appropriate breed management methods to avoid inbreeding. Although attenuated selection can affect short-term genetic gain, it is essential for maintaining the long-term genetic variability of the Korean Yorkshire population. Long-term continuous monitoring would also be needed to maintain the pig population to avoid an unintended reduction of N
e. The best way to preserve a sustainable population is to ensure sure its production populations maintain a sufficient N
e.