### INTRODUCTION

_{e}) (Falconer et al., 1996). Using N

_{e}, we can monitor genetic diversity in livestock populations and explain the observed range and pattern of genetic variation with regard to population genetics. In addition, if pedigrees are incomplete or unavailable, N

_{e}provides information into the inbreeding level of a population. The strength of linkage disequilibrium at different genetic distances between makers can be used to infer ancestral N

_{e}. The pattern of historical N

_{e}in animal populations can increase our understanding of the impact of selective breeding strategies on the genetic variation within the framework of population genetics.

_{e}estimated using SNP chip data have already been made. Flury (2010), based on 128 Swiss Eringer breed genotyped using the Illumina BovineSNP50 BeadChip, estimated linkage disequilibrium-based actual N

_{e}and ancestral pedigree-based N

_{e}(Flury et al., 2010). Other studies have evaluated the historical N

_{e}of a variety of cattle breeds, all of which suggested a continuous decrease in N

_{e}since the time of domestication (Thevenon, et al., 2007; De Roos et al., 2008).

_{e}is an important measure in the global dairy cattle industry as well as the Korean dairy cattle industry in identifying the state of genetic resource. As Korea is a major-semen importing country in the dairy cattle industry (Table S4), dairy cattle in Korea have diverse genetic sources. In this study, we used

*r*corrected by the genomic relationship structure based on SNPs, which is more accurate than the existing

^{2}*r*to estimate linkage disequilibrium. We then estimated the current N

^{2}_{e}and inferred an accurate N

_{e}. These results are compared with other studies and considered in the context of current knowledge for the establishment of genomics methods in dairy cattle in Korea.

### MATERIALS AND METHODS

### Genotypic data

### Linkage disequilibrium

*r*and the

^{2}*r*corrected by the sample structure (Mangin et al., 2012). Estimating the

^{2}*r*of all pairwise per chromosome using high-density SNP chips is time-consuming. To reduce computing time, the input for each chromosome was split into files containing 100 loci, and

^{2}*r*

^{2}was calculated for all syntenic marker pairs in each file as has been done in previous studies (Flury et al., 2010).

*r*are respectively equivalent to the covariance and the correlation between alleles at two different loci (Hill and Robertson, 1968), computed as: where P

^{2}_{A}, P

_{a}, P

_{B}, and P

_{b}are the respective frequencies of alleles A, a, B and b, and D is P

_{AB}−P

_{A}P

_{B.}Sample structure information is required for the corrected

*r*. We used the genomic relationship matrix based on SNP data instead of pedigree data. In this study, the genomic relationship value of individual pairs was the number of common SNP between two individuals/number of total allele sites. In this way, we proposed a simple genomic relationship matrix (96×96). A process of calculating

^{2}*r*corrected by the genomic relationship structure based on SNP was identical to the existing

^{2}*r*calculation.

^{2}*r*for each of the distance bins were then plotted against the median of the distance bin range (Mb). This estimation of

^{2}*r*was performed on a chromosome by chromosome basis; the pooled results are presented in Figure 2.

^{2}*r*was calculated for a random selection of non-syntenic SNP. Across the autosome, 700 SNP were randomly selected and

^{2}*r*values were calculated for only non-syntenic marker pairs. This resulted in a total of 250,096 pairwise comparisons that were not corrected by the genomic relationship structure.

^{2}### Construction model of linkage disequilibrium with distance

*r*(Sved, 1971): where N is the effective population size, and c is the recombination frequency. In this study, as in previous studies, c was replaced by linkage distance in Morgans (Hayes et al., 2003; Tenesa et al., 2007; Thevenon et al., 2007; De Roos et al., 2008; Villa-Angulo et al., 2009; Corbin et al., 2010; Flury et al., 2010; Qanbari et al., 2010). This was justified by the approximation of the more precise equation for E(

^{2}*r*) given by Sved (1971). Based on this formula, a non-linear least-squares approach to statistically model the observed

^{2}*r*was implemented within R (nls function) using this model: where y

^{2}_{i}is the value of

*r*for marker pair i, at linkage distance d

^{2}_{i}in Morgans. Parameters a and b were estimated iteratively using the least-squares method. Chromosome-specific megabase-to-centimorgan conversion rates were calculated based on total physical chromosome length as stated on the UCSC Web site, and total chromosome genetic length derived from the bovine linkage map (Arias et al., 2009) (see Table S2). We used marker pairs for which

*r*values were higher than the mean of

^{2}*r*for non-syntenic marker pairs. This model was applied to each chromosome in turn and the parameters were estimated. Similar to Corbin (2010), estimated parameters were combined by meta-analysis in R using an inverse variance method for pooling and random effects method based on the DerSimonian-Laird method (DerSimonian and Laird, 1986; Corbin et al., 2010).

^{2}### Ancestral effective population size estimation

_{T}is the effective population size t generations ago, c is the distance between markers in Morgans,

*r*

_{c}

^{2}is the mean value of

*r*for marker pairs c Morgans apart, and c = (2t)

^{2}^{−1}when assuming linear growth (Hayes et al., 2003). As previously mentioned, marker pairs with linkage distances less than the mean of

*r*for non-syntenic marker pairs (<0.014248) were excluded from this analysis. To estimate N

^{2}_{T}, the number of prior generations was selected and a suitable range for c was calculated. The binning process was designed to ensure sufficient marker pairs within each bin and obtain a representative

*r*mean (see Table S3). This process was performed for markers pooled across autosomes.

^{2}### Estimation of effective population size based on the genomic relationship structure per chromosome

*r*corrected by the genomic relationship structure per chromosome using “LDcorSV” package implemented in R (Mangin et al., 2012). A process of estimating N

^{2}_{e}based on genomic relationship per chromosome and ancestral N

_{e}was identical to that described above.

### RESULTS

### Genotypic data

### Linkage disequilibrium

*r*decreasing by about half. The mean existing

^{2}*r*decreased more slowly with increasing distance and was constant after 0.2 Mb of distance. The mean

^{2}*r*corrected by the genomic relationship structure based on SNP with distance between marker pairs was slightly less than that of the existing

^{2}*r*. However, change patterns in the

^{2}*r*means with distance for both methods were similar. The decline in linkage disequilibrium showed slight differences with log-transformed distance (Figures S1 and S2). According to the existing

^{2}*r*, 23,797 of the 3,385,800 marker pairs were in complete linkage disequilibrium; 12,225 of these were adjacent pairs. For

^{2}*r*corrected by the genomic relationship structure based on SNP, 23,794 of the 3,385,800 marker pairs were in complete linkage disequilibrium; 12,223 of these were adjacent pairs. The mean (±standard deviation)

^{2}*r*between random non-syntenic marker pairs was 0.014248±0.0197. 835,324 using the existing

^{2}*r*measure and 861,676 using

^{2}*r*corrected by the genomic relationship structure based on SNP are less than the mean of non-syntenic marker pairs.

^{2}### Construction model of linkage disequilibrium with distance

*r*, the mean estimate and 95% confidence interval by meta-analysis across autosomes for parameters a and b were 2.95 (2.84; 3.07) and 106.32 (92.95; 119.70), respectively. The parameters a and b for

^{2}*r*corrected by the genomic relationship structure based on SNP were 2.87 (2.75; 2.99) and 122.28 (107.21; 137.34), respectively. The predicted

^{2}*r*from the non-linear regression equation was similar to the mean observed

^{2}*r*of regions with massed bins with discrepancies occurring in other regions for both two types of

^{2}*r*(Figures S1 and S2). Parameter b showed greater variability between chromosomes than parameter a. No such relationship was observed between the estimated parameters (a, b) and chromosome length (cM) as shown in Figure 3 and 4. This reason for the lack of relationship and interpretation of parameter b in this non-linear regression model as an estimated N

^{2}_{e}is demonstrated in the discussion.

### Ancestral effective population size estimation

_{e}over the past 10 generations with the values increasing fivefold (close to 500) by 10 generations ago (Figure 5). From the past 10 generations to the past 100 generations, N

_{e}increased slowly. Our results suggest that a continuous increase in N

_{e}occurred over the last 100,000 generations (Figure S3). Overall, the tendencies for the two different

*r*were similar, but N

^{2}_{e}based on

*r*corrected by the genomic relationship structure based on SNP was slightly higher than N

^{2}_{e}based on the existing

*r*. In Figure 5, although the recent difference between the two estimated measures was small, the difference in value increased with time.

^{2}### Estimation of effective population size based on genomic relationship per chromosome

*r*corrected by the genomic relationship structure per chromosome with distance between marker pairs was less than the two previous mean but the change pattern was similar. (Figure S5) In total, 23,783 of marker pairs were in complete linkage disequilibrium; 12,220 of these were adjacent pairs. For parameter estimation using

^{2}*r*corrected by the genomic relationship structure per chromosome, the mean estimate and 95% confidence interval by meta-analysis across autosomes for parameters a and b were 2.11 (2.04; 2.18) and 361.73 (335; 339.47), respectively. Parameter b was approximately three times greater than the two previous values for parameter b. The decline in linkage disequilibrium was almost linear with log-transformed distance and the decline in linkage disequilibrium with log-transformed distance was better fitted with predicted values than that of the other two. No such relationship was observed between parameters (a, b) and chromosome length (cM) in

^{2}*r*corrected by the genomic relationship structure per chromosome as shown in Figure S6. Similar to previous patterns, we observed a rapid increase in N

^{2}_{e}over the past 10 generations and then a slow increase up to 100 generations ago. However, the values increased fivefold (close to 1,500) at 10 generation than two kinds of

*r*(Figure S7). From 10 generation ago, our results suggest that continuous increase in N

^{2}_{e}has occurs over the 100,000 generations. (Figure S8) Overall, these tendencies were similar with that of the previous two but N

_{e}based on

*r*corrected by the genomic relationship structure per chromosome was greater than the two previous kinds of

^{2}*r*measures. Although recent difference with two estimated measures was big, their difference in value decreased with generations ago.

^{2}### DISCUSSION

*r*correlation between 100 and 1,000 samples is 0.94 (Khatkar et al., 2008). Thus, we can obtain an unbiased picture of linkage disequilibrium (3,385,801 pairs) in our population of 96 dairy cattle. The pattern of decline of linkage disequilibrium in this population is consistent with that reported by Flury (2010) in 128 Swiss cattle (Flury et al., 2010) with existing

^{2}*r*remaining higher than approximately 0.3 for distance up to 0.05 Mb. The linkage disequilibrium observed is higher at short distances and more extensive than that observed in human populations (Shifman et al., 2003). Overall,

^{2}*r*values corrected by the genomic relationship structure based on SNP were slightly lower than the existing

^{2}*r*because the correction process prevented the overestimation of linkage disequilibrium.

^{2}*r*was corrected for by inferring N

^{2}_{e}to take into account the non-independence of loci due to population differentiations (Yu et al., 2005). Not corrected

*r*can be a biased estimation of linkage disequilibrium, which could lead to estimated genetic parameters that reduce the power of analysis (Mangin et al., 2012). We used an R package (“LDcorSV”) to calculate

^{2}*r*corrected by the genomic relationship structure based on SNP, which was slightly smaller than the existing

^{2}*r*. This was expected, as we speculated that the calculated

^{2}*r*corrected by the genomic relationship structure based on SNP was more accurate. Thus we can infer an accurate N

^{2}_{e}of dairy cattle in Korea.

*r*between non-syntenic markers was 0.014248, which provides an approximation of the linkage disequilibrium that can be expected by chance. The value observed here is higher than the value observed by Khatkar (2008) in a sample of over 1,500 cattle (0.0032) (Khatkar et al., 2008). Sample size and genetic sampling (drift) affect the mean of non-syntenic

^{2}*r*values, and hence the mean may be expected to decrease with an increase in sample size. The larger non-syntenic value observed by Khatkar (2008) may be more affected by large populations. Since the sample size was smaller in this study than the other studies, the larger non-syntenic values of our dataset are reasonable. We used marker pairs with more than mean of

^{2}*r*for non-syntenic marker pairs as a standard for estimating linkage disequilibrium. Many low

^{2}*r*can cause overestimated N

^{2}_{e}more than expected. Therefore we decided to use marker pairs more than mean of non-syntenic marker pairs for inferring N

_{e}.

*r*, a nonlinear regression model was fitted to the data to describe the relationship between linkage distance and linkage disequilibrium. Our estimate of parameter a supports an alternative version of Sved’s (1971) equation (Sved, 1971), derived by Tenesa (2007), which accounts for mutation and puts a = 2 (Tenesa et al., 2007). While estimating parameters, the initial value of parameter a was two with this approach. The estimated parameter a ranged from 2 to 3. For the heterogeneity of variance of the observed

^{2}*r*, variance of

^{2}*r*declined with increasing distances between markers, which may have impacted our results. A significant negative relationship between chromosome length (cM) and estimates of parameter b from the nonlinear model have been observed (Corbin et al., 2010), while others have observed a positive relationship in domestic livestock species (Khatkar et al., 2008; Muir et al., 2008). In this study, all maker pairs were calculated in each bin so

^{2}*r*was not affected by chromosome length. Thus, we could not observe a relationship between chromosome length (cM) and estimates of b.

^{2}_{e}assuming a constant population size. However, this assumption is difficult to maintain, b represents a conceptual average of N

_{e}over the period inferred from the range of marker pairs distances (Toosi et al., 2010). Two measures of

*r*resulted in two different estimates of N

^{2}_{e}. N

_{e}based on the existing measure of

*r*is about 106 and N

^{2}_{e}based on

*r*corrected by the genomic relationship structure based on SNP was about 122. Assuming that

^{2}*r*corrected by the genomic relationship structure based on SNP was more accurate than the existing

^{2}*r*, we predict that N

^{2}_{e}of dairy cattle in Korea is about 122. Figure 5, S3, S7, and S8 show historical N

_{e}assuming a linear population following Hayes et al. (2003). The observed pattern shows a rapid increase in N

_{e}up to around 10 generations ago. Several explanations exist for this pattern including bottlenecks associated with domestication, selection and breed formation, and endangerment of the breed. Therefore, it is useful to consider our results in the context of the demographic history of the dairy cattle in Korea. The reliability of this method depends both on the technical implementation and approached used in a previous study approach (Corbin et la., 2010).

_{e}estimation based on

*r*corrected by the genomic relationship structure per chromosome resulting in an underestimation of inbreeding and thus an overestimation of the current population size. The estimates for recent N

^{2}_{e}for dairy cattle in Korea based on

*r*corrected by the genomic relationship structure based on SNP was around 120 individuals in contrast to estimates in the range of 361 using

^{2}*r*corrected by the genomic relationship structure per chromosome. Interesting thing is that recent N

^{2}_{e}for Swiss Eringer breed using pedigree information covering 15 generations (the range of 110 to 321) is three times higher than the linkage disequilibrium-based estimates for recent N

_{e}(around 100 individuals). Thus, we guess that the number of genomic relationship matrix that covers generation depends on how we establish the genomic relationship structure. These differences are important for inferring N

_{e}.

*r*>0.3 will permit reasonable sample sizes for genome-wide association studies (Ardlie et al., 2002; Du et al., 2007; Khatkar et al., 2008). In this data set, markers 200 kb apart achieved an average linkage disequilibrium of

^{2}*r*= 0.303 excluding marker pairs less than the mean of non-syntenic marker pairs. However, marker pairs with

^{2}*r*= 1, which represent the high variability of

^{2}*r*values at small distances are typically excluded in genomic selection. This is likely due to underestimation of the actual number of SNP needed. Genomic selection appears to be effective at lower average

^{2}*r*with simulation results demonstrating accuracies of up to 0.65 with an average

^{2}*r*between adjacent markers as low as 0.2 and a trait heritability of 0.1 (Calus et al., 2008). Deterministic equations demonstrates that the accuracy of genomic selection can be expressed as a function of the effective number of loci in a population (Daetwyler et al., 2010). The effective number of loci in a population relates to the number of independent chromosome segments and assumes a random mating population. Our dataset covered an effective number of loci. However, because the estimated N

^{2}_{e}was greater than our sample size, we required more preparation to predict the potential accuracy of genomic selection in dairy cattle populations.

_{e}for a sample of dairy cattle. In the population studied, linkage disequilibrium extended for long distances, reaching baseline levels at more than 5 Mb. From the decay in linkage disequilibrium with genetic distance, we inferred the ancestral N

_{e}and observed a recent rapid increase in N

_{e}which reached approximately 500, 10 generations ago followed by a decrease until the present time. The final results were that we used correction by the genomic relationship structure to ensure accurate derivation of N

_{e}, resulting in 122 individuals.