### INTRODUCTION

_{e}), where N

_{e}is the number of individuals in an idealized population that would yield the same inbreeding degree as the real population [9]. Thus, we can monitor genetic diversity in each domesticated animal population based on N

_{e}and explain the observed extent and pattern of genetic variation in population genetic terms. Using N

_{e}, we can also predict loss of genetic variation from a prospective point of view and the accuracy of natural genomic selection before the emergence of artificial genomic selection in domesticated animal breeding. Additionally, we can infer ancestral N

_{e}using the strength of LD at different genetic distances between markers. The pattern of historical N

_{e}in each animal population can increase our understanding of the impact of recent strong artificial selection breeding methods on population-level genetic variation. If pedigrees are incomplete or unavailable, we can use inbreeding information of populations of interest with respect to N

_{e}.

_{e}were predicted using SNP chip data. For example, the extent of LD of multiple pig breeds, including Yorkshire, from the United States, Denmark, and The Netherlands were investigated [10–12]. In addition, Uimari determined the extent of LD and estimated LD-based actual N

_{e}and ancestral N

_{e}of 32 Finnish Yorkshire boars [13]. Although the Korean pig industry is active, Korea has no original Yorkshire breed and relies on significant pig imports. As a result, the Korean Yorkshire population has diverse genetic sources from several countries, so N

_{e}could be an important measurement of both the Korean and global pig industries.

_{e}in a Korean Yorkshire population were estimated using data generated with the Illumina PorcineSNP60 v2 BeadChip (Illumina Inc., San Diego, CA, USA). We also investigated the ancestral N

_{e}of the population. Together with findings from other studies, our results can inform the establishment and implementation of the most effective animal breeding genomic methods for Korean Yorkshire swine.

### MATERIALS AND METHODS

### Study sampling and selection of genotypic data

_{e}were characterized.

### Characterization of linkage disequilibrium in Korean Yorkshire pigs

*r*

^{2}). The

*r*

^{2}are respectively equivalent to the covariance and correlation between alleles at two different loci computed as:

*P*

*,*

_{A}*P*

*,*

_{a}*P*

*, and*

_{B}*P*

*are the respective frequencies of alleles*

_{b}*A*,

*a*,

*B*, and

*b*, and

*D*is

*P*

*-*

_{AB}*P*

_{A}*P*

*.*

_{B}*r*

^{2}calculation was performed on a chromosome-by-chromosome basis and illustrated a relationship of physical distance and

*r*

^{2}between the two target loci per chromosome (Supplementary Figure S2). Details of the SNPs’ physical position can be found in the Illumina product literature. To determine LD in relation to the physical distance between SNPs, SNP pairs were divided into distance bins. We established two classes (0 to 0.5 Mb and 0 to 5 Mb), and applicable SNP pairs in each class were placed into 1 of 50 distance bins with class-dependent bin ranges (Supplementary Table S1). The two types of mean

*r*

^{2}for each distance bin were plotted against the median of the distance bin range, which are presented in Figure 1.

### Construction model of linkage disequilibrium with distance

*r*

^{2}[15]:

*N*is N

_{e}, and

*c*is the recombination frequency. We replaced

*c*with linkage distance in morgans [16–18], and our calculations were further supported by approximating the more precise equation for

*E*(

*r*

^{2}) given by Sved [15]. Based on this formula, a non-linear least-squares approach to statistically model the observed

*r*

^{2}was implemented within R using this model:

*y*

*is the value of*

_{i}*r*

^{2}for the SNP pair

*i*at linkage distance

*d*

*in morgans. Parameters*

_{i}*a*and

*b*were estimated iteratively using the least-squares method. Chromosome-specific Mb-to-centimorgan (cM) conversion rates were calculated based on total physical chromosome length stated on the UCSC Web site (https://genome.ucsc.edu/) and each chromosome’s length from a porcine linkage map from maps of four pedigrees (ILL, UIUC, USDA, ROS) [19]. Because UIUC consisted of Meishan and Yorkshire, we selected maps of this pedigree for use in our study. This model was applied to data for each chromosome and estimated parameter. Similar to Corbin and Shin, we combined the estimated parameters into a meta-analysis using an inverse variance method for pooling and a random-effects approach based on the DerSimonian-Laird method [18,20].

### Ancestral N_{e} estimation

*N*

_{T}*(t)*is the N

_{e}at

*t*generations ago,

*c*is the distance between markers in morgans,

*r*

^{2}for SNP pairs

*c*morgans apart, and

*c*= 1/2

*t*when assuming linear growth [16]. To estimate

*N*

_{T}*(t)*, the number of prior generations was selected, and a suitable range for

*c*was calculated. The binning process was designed to ensure sufficient SNP pairs within each bin to obtain a representative

*r*

^{2}mean. This process was performed for SNPs pooled across autosomes. Bin information used for estimating ancestral N

_{e}is presented in Supplementary Table S3.

### RESULTS

### Characteristics of genotypic data

### Estimation of linkage disequilibrium

*r*

^{2}) of 3.2 million SNP pairs (±SD) analyzed in this study was 0.103 (±0.179). Distance of 28,969 pairs from all studied pairs was less than 50 kb. Overall, 51.22% of these 28,969 SNP pairs had

*r*

^{2}>0.3, and 60.75% had

*r*

^{2}>0.2. The average LD of distinct autosomes for SNPs at least 50 kb apart varied from

*r*

^{2}= 0.361 to

*r*

^{2}= 0.508, and the average LD for SNPs at least 5 Mb apart ranged from

*r*

^{2}= 0.071 to

*r*

^{2}= 0.185 (Supplementary Table S4). Although the aim of this study was not to compare LD in different chromosomes, we did observe some variation in the extent of LD by chromosome. We also determined that chromosomes 1, 13, and 14 had the highest average LD values, while chromosomes 10 and 12 showed the lowest average LD values. These results corroborate findings from a previous Yorkshire LD study.

*r*

^{2}decreasing by approximately 40% (Figure 1a). The most rapid decrease was seen over the first five bins with the mean

*r*

^{2}decreasing by approximately 50% (Figure 1b). Mean

*r*

^{2}decreased much more slowly as distanced increased and was constant after 3 Mb of distance. According to our

*r*

^{2}calculations, 3,965 of the 3.2 million SNP pairs were in complete LD.

### Determination of the relationship between linkage disequilibrium and single nucleotide polymorphism distance

*a*and

*b*in Equation (2) are significantly different from zero. For parameter estimation using

*r*

^{2}, the mean estimate and 95% confidence interval by meta-analysis across autosomes for parameters

*a*and

*b*(±SD) were 2.71 (2.61; 2.82) and 122.87 (106.90; 138.84), respectively. Values for estimated parameters

*a*and

*b*in Equation (2) per chromosome are shown in Figure 3. Parameter

*b*showed greater variability between chromosomes than parameter

*a*. No such relationship was observed between each estimated parameter and chromosome length in centimorgans. We estimated predicted r2 by distance of SNP pairs using our estimated parameters

*a*and

*b*in Equation (2) and compared predicted

*r*

^{2}with observed

*r*

^{2}as performed in other studies [17,18]. We determined that the predicted

*r*

^{2}from the non-linear regression equation was similar to the mean observed

*r*

^{2}(Figure 4), indicating our estimated parameters in Equation (2) accurately represent Korean Yorkshire population history.

### Ancestral N_{e} estimation

_{e}at

*t*generations ago. Based on the genomic data, the current N

_{e}of Korean Yorkshire is approximately 122.87 (106.90; 138.84) individuals. Supplementary Figure S4 also shows that the reduction of N

_{e}in Korean Yorkshire populations was continuous and gradual over the last 100 generations, ranging from 208.4 individuals to 122.87 individuals as determined in our study. Additionally, we found that Korean Yorkshire N

_{e}has decreased by 99.6% over the last 10,000 generations (30,380.28 initial individuals) to the present estimate (Figure 5).

### DISCUSSION

_{e}of Yorkshire population in Korea, we need one Yorkshire population in Korea and high quality SNPs of this population. And we thought that number of SNPs (33,418 SNPs) after quality control was sufficient for estimation of N

_{e}when compared to finnish Yorkshire (Uimari [13]). So we thought that regarding 4 GGP farms as one Korean population certainly was appropriate.

_{e}of the Korean Yorkshire population based on whole-genome SNP data. The observed LD, measured as

*r*

^{2}, extended for a long distance based on the adjacent 100 SNPs of each SNP studied in the genome. A previous study used large pedigree datasets and small genomic datasets [13], but instead of pedigree data, we used large-scale genomic data from swine in GGP farms to characterize LD and estimate N

_{e}. Because domesticated pigs have long been strongly and artificially selected, observed LD in the Korean Yorkshire population was higher with shorter genomic distances and more extensive compared to human populations [21]. Declining LD in the Korean Yorkshire population is consistent with previous studied pigs [13,22] as well as other domesticated animals [17,18].

_{e}of Korean Yorkshire swine based on a non-linear regression model that describes the relationship between linkage distance and LD. Estimating N

_{e}using this equation raises difficulties in handling values in the limits of the parameter space, because if

*r*

^{2}= 0.0, the estimated N

_{e}is infinite, and if

*r*

^{2}= 1.0, N

_{e}is zero. In this study, we calculated

*r*

^{2}for the adjacent 100 SNPs of each SNP to decrease bias in handling these values. The results from this simplified approach yielded quite similar estimates of N

_{e}as other previous studies. Another concern related to the relationship between estimated LD and distance between SNPs lies in the accuracy of the porcine reference genome used in this study. The order and distance between SNPs in the commercial Illumina PorcineSNP60 v2 BeadChip will likely be refined because the reference genome version will also be updated. However, the bias from incorrectly ordered SNPs or wrong SNP distances between SNPs may be minimized by the large number of SNP pairs used in this study, so some overestimated or underestimated LD can be overlooked. Because the relationship between genetic and physical distances varies across chromosomes and chromosomal regions, we inferred the Mb/cM ratio per chromosome using physical map position information from the porcine reference genome and from a previous Yorkshire pig study [19]. We then used genetic distances based on physical distances to estimate N

_{e}and could obtain more reliable N

_{e}estimates with such detailed estimates of genetic distances between SNPs. Finally, one study reported that a limited sample size can bias the estimates of

*r*

^{2}and recommends correcting the estimates of

*r*

^{2}for sample size

*n*(

*r*

^{2}–1/2

*n*) and using the equation of Sved [15]. However, our sample size was sufficient enough to correct estimates of

*r*

^{2}. To estimate N

_{e}of Korean Yorkshire pigs, we used alternative version of this equation further derived by Tenesa [23], which adds a new parameter

*a*to account for mutations. Based on the new formula, the initial value of parameter

*a*= 2 in estimations of parameters using a non-linear regression model in R. Regarding variance heterogeneity of the observed

*r*

^{2}, the variance declined with increased distances between SNPs, which may have impacted our results when estimating parameter

*b*in Equation (2) (Supplementary Figure S2). A significant, negative relationship between chromosome length and estimates of parameter

*b*obtained from a non-linear model have been observed [17], while others have noted a positive relationship in domestic livestock species [24,25] or did not investigate directionality of the relationship [18]. Because the evolutionary history of each species and breed is different, the relationship between chromosome length and parameter

*b*is also different per population. In this study, all marker pairs were calculated in each bin so that

*r*

^{2}was not affected by chromosome length. These results are in agreement with the Yorkshire LD characterization findings of Uimari [13]. We did not observe a significant relationship between chromosome length and estimates of

*b*in our study population.

*b*represents an estimated N

_{e}assuming a constant present population size, because we used genetic data from the Korean Yorkshire population consisting of pigs in major GGP farms. In the calculation of N

_{e},

*b*in Equation (2) represents a conceptual average of N

_{e}over the period inferred from the SNP pair distance ranges per chromosome [26]. We regarded parameter

*b*combined by meta-analysis as reflecting the current N

_{e}of the Korean Yorkshire population.

_{e}using another method in this study. Hayes [16] reported that the degree of linkage equilibrium according to genetic distance had reflected genetic diversity of past generation. After

*r*

^{2}estimation using SNP data, we divided

*r*

^{2}by distance and inferred N

_{e}of past generation. The LD over greater genetic distances reflects a population’s recent history, whereas LD over shorter distances depends on the N

_{e}many generations ago [16,27] (Supplementary Table 3). Historical N

_{e}estimation suggests a linear population as reported in a previous study (Figure 5; Supplementary Figure S4) [16]. The observed pattern displayed a consistent decrease in N

_{e}from 100 generations ago to the present, decreasing by 99.6% from 10,000 to 100 generation ago. Several explanations exist for this pattern, including bottlenecks associated with domestication, selection, and breed administration, business strategy, and endangerment of the breed. Therefore, our results should be considered in context of the demographic history of the Yorkshire population in Korea. The reliability of predicting changes in N

_{e}over time depends both on technical implementation and proper iteration based on previous studies using this approach [17,18].

_{e}in a Korean Yorkshire population using genomic data from thousands of individuals. Our observed LD patterns are similar to the average value presented by Du for six commercial pig lines and Uimari for Finnish Yorkshire pig breeds [13,22]. The overall LD in Finnish Yorkshire breed appears to be stronger than in Korean Yorkshire pigs. Because the Korean Yorkshire population consists of seed pigs from several original Yorkshire breeds, the genetic diversity of Korean Yorkshire pigs is greater than that of the single Finnish Yorkshire breed, and the LD of Finnish Yorkshire pigs is higher than that of Korean Yorkshires.

_{e}of the Korean Yorkshire population is 122.87, which is sufficient to maintain the population’s viability. The population’s genetic variation enables an acceptable inbreeding rate, including compromising genetic gain in commercially important traits. This genetic variation is necessary to apply methods that maximize selection efficacy with a fixed rate of inbreeding or optimize the use of genetic resources from the parental generation [29].

_{e}may be very small or continually decrease [8]. Therefore, one must carefully consider appropriate breed management methods to avoid inbreeding. Although attenuated selection can affect short-term genetic gain, it is essential for maintaining the long-term genetic variability of the Korean Yorkshire population. Long-term continuous monitoring would also be needed to maintain the pig population to avoid an unintended reduction of N

_{e}. The best way to preserve a sustainable population is to ensure sure its production populations maintain a sufficient N

_{e}.