Accuracy of genomic-polygenic estimated breeding value for milk yield and fat yield in the Thai multibreed dairy population with five single nucleotide polymorphism sets

Objective The objectives were to compare variance components, genetic parameters, prediction accuracies, and genomic-polygenic estimated breeding value (EBV) rankings for milk yield (MY) and fat yield (FY) in the Thai multibreed dairy population using five single nucleotide polymorphism (SNP) sets from GeneSeek GGP80K chip. Methods The dataset contained monthly MY and FY of 8,361 first-lactation cows from 810 farms. Variance components, genetic parameters, and EBV for five SNP sets from the GeneSeek GGP80K chip were obtained using a 2-trait single-step average-information restricted maximum likelihood procedure. The SNP sets were the complete SNP set (all available SNP; SNP100), top 75% set (SNP75), top 50% set (SNP50), top 25% set (SNP25), and top 5% set (SNP5). The 2-trait models included herd-year-season, heterozygosity and age at first calving as fixed effects, and animal additive genetic and residual as random effects. Results The estimates of additive genetic variances for MY and FY from SNP subsets were mostly higher than those of the complete set. The SNP25 MY and FY heritability estimates (0.276 and 0.183) were higher than those from SNP75 (0.265 and 0.168), SNP50 (0.275 and 0.179), SNP5 (0.231 and 0.169), and SNP100 (0.251and 0.159). The SNP25 EBV accuracies for MY and FY (39.76% and 33.82%) were higher than for SNP75 (35.01% and 32.60%), SNP50 (39.64% and 33.38%), SNP5 (38.61% and 29.70%), and SNP100 (34.43% and 31.61%). All rank correlations between SNP100 and SNP subsets were above 0.98 for both traits, except for SNP100 and SNP5 (0.93 for MY; 0.92 for FY). Conclusion The high SNP25 estimates of genetic variances, heritabilities, EBV accuracies, and rank correlations between SNP100 and SNP25 for MY and FY indicated that genotyping animals with SNP25 dedicated chip would be a suitable to maintain genotyping costs low while speeding up genetic progress for MY and FY in the Thai dairy population.


INTRODUCTION
Genomic selection refers to selection of animals based on genomic estimated breeding values (GEBV) [1,2]. Current genomic selection procedures utilize a combination of phenotypes, pedigree, and genotype information to improve accuracy of genetic predictions [3]. An important advantage of genomic selection is that it can be implemented to evaluate ec onomically important traits that are difficult to measure such as traits that are expensive to measure in a large number of animals (e.g., feed efficiency), expressed very lateinlife (e.g., length of productive life), and sexlimited traits (e.g., milk yield). In addition, because genomic selection can be performed early in life, it can decrease generation interval, increase intensity of selection, and reduce the cost of proving dairy bulls [1]. Genomic selection is currently widely used for dairy ge netic evaluation in many countries. The Thai dairy genomic polygenic evaluation for economically important traits started in 2015 [4]. Implementation of the genomicpolygenic eval uation system increased EBV accuracies for milk yield (MY) and fat yield (FY) from 5.2% to 7.2% in the Thai multibreed population [4]. Continued genotyping and selection of parents based on genomicpolygenic EBV is expected to accelerate the rate of genetic progress for MY and FY and other eval uated traits in this population.
Prices for highdensity genotyping chips are still too high for widespread use in the multibreed dairy cattle population in Thailand. The current option in the Thai dairy popula tion of genotyping sires and highly represented dams in the pedigree with highdensity chips and the rest of the cow popu lation with lowdensity chips may need to remain in place until a suitable cheaper and effective option is found. One such option would be to design lowdensity chips with small num bers of single nucleotide polymorphism (SNP) that account for a reasonable percentage of the genetic variation for the traits of interest. Use of these lowdensity assays containing selected SNPs could potentially be as effective as highdensity chips for a fraction of the price [5,6]. Determination of an appro priate set of SNP needs to consider the genetic architecture of the traits of interest, population structure, amount of link age disequilibrium, proportion of ancestors genotyped with highdensity SNP genotypes, and genetic relationships be tween animals used to estimate SNP effects [5]. However, the optimal number of SNP for a lowdensity chip for dairy cattle in Thailand as well as an appropriate strategy for selecting these SNP have not been studied. Thus, the objectives of this research were to compare estimates of variance components, genetic parameters, accuracies of prediction, and rankings of genomicpolygenic animal EBV for MY and FY in the Thai multibreed dairy population computed using five sets of SNPs from GeneSeek GGP80K chip.

Animals, data and traits
Cattle used in this research were from the Thai multibreed dairy population, a Holsteinupgraded population composed of purebred and crossbred cows, sires, and dams. Breeds repre sented in Thai dairy cattle are Holstein, Brahman, Jersey, Red Dane, Red Sindhi, Sahiwal, and Thai Native [7]. The largest breed fraction for most animals in the population is Holstein. Although, percentage Holstein per animal ranged from 46.9% to 100%, 91% of cattle in the population was over 75% H plus small percentages of other breeds. The dataset contained mon thly testday MYs and FYs of 8,361 firstlactation cows that calved from 1989 to 2014. These cows were daughters of 1,210 sires and 6,992 dams located in 810 farms across five regions in Thailand (North, Northeastern, Western, Central, and Southern). Traits were 305d firstlactation MY (kg) and 305d firstlactation FY (kg). Testday MYs were measured, and milk samples were taken from each individual cow once a month from calving until drying off. Cow testday milk samples were sent to a laboratory (Artificial Insemination and Biotechnology Research Centre laboratory, Saraburi pro vince, Thailand) to determine fat percentages. Testday FYs were computed as the product of testday MYs times testday fat percentages. Monthly testday MYs and FYs were used to compute MY and FY using the testinterval method [8,9].

Climate, management, and nutrition
The majority of Thailand is hot and humid most of the year. Thailand's climate is highly influenced by the seasonal mon soon weather characterized by wind pattern changes and heavy precipitation. Seasons are classified as winter (Novem ber to February), summer (March to June), and rainy (July to October). Yearly means across regions and seasons range from 17°C to 36°C for temperature, 1,200 to 1,600 mm/yr for rainfall, and 73% to 80% for relative humidity [10]. Cows were housed in open barns and milked twice a day (morning at 5 to 6 am; afternoon at 2 to 3 pm). Farmers used either a bucket system or a pipeline system for milking. Morning and afternoon raw milk was collected in bulk tanks before transporting it to a collection center owned by either dairy cooperative or a private organization. Cows were fed a com bination of roughage and concentrate aimed at a rate of 1 kg of feed (16% protein) per 2 kg of milk. Roughage consisted mainly of fresh grasses including Guinea grass (Penicum maximum), Ruzi grass (Brachiaria ruziziensis), Napier grass (Pennisetum purpureum), and Para grass (Brachiaria mutica). Other sources of fiber were rice straw and agricultural by products (corn cobs, cassava leaves, corn silage).

Samples and genotypes
Blood or semen samples were collected from 2,661 animals (89 sires and 2,572 dams). Genomic DNA was extracted from semen samples using a GenElute Mammalian Genomic DNA Miniprep Kit (SigmaAldrich, St. Louis, MO, USA), whereas a MasterPure DNA Purification Kit (Epicentre Biotechnologies, Madison, WI, USA) was used for blood samples. Concen tration and purity of DNA per sample was measured using a Thermo Fisher NanoDrop 2000 spectrophotometer (Thermo Fisher Science Inc., Wilmington, DE, USA). The minimum acceptable DNA concentration was 15 ng/μL with an absor bance ratio of 1.8 at 260/280 nm.
The DNA samples were dried using a Freezedry machine for 12 hours Dried DNA samples of 50 μL were airmailed to GeneSeek (Lincoln, NE 68521, USA) for genotyping with GeneSeek Genomic Profiler BeadChips (GGP). Budgetary restrictions determined the type of genotyping chip used for each DNA sample. The GeneSeek Genomic Profiler 80K chip (GGP80K) was used to genotype sires and highly repre sented cows in the pedigree (n = 139), and lower density chips were used to genotype the remaining cows ( [4]; 1,412 with GGP9K; 570 with GGP20K, and 540 with GGP26K). The numbers of SNP markers were 8,590 for the GGP9K, 19,616 for the GGP20K, 25,979 for the GGP26K, and 76,519 for the GGP80K. Nighty five percent of SNP from GGP20K, 47% of SNP from GGP20K, and 54% of SNP from GGP26K were present in GGP80K. Cows genotyped with the three lowdensity GGP chips were imputed to GGP80K using FImpute 2.2 [11]. The genotype file included SNPs with call rates ≥90% and minor allele frequency ≥0.04 from the 29 autosomes and the X chromosome. After these control restrictions, the genotype file contained 74,148 SNPs per genotyped animal.

Estimation of variance components for five single nucleotide polymorphism sets
The five SNP sets were defined in terms of the values of SNP vari ances for MY and FY [12] estimated with program POSTGSF90 (a member of the BLUPF90 Family of Programs) [13]. Firstly, a 2trait singlestep genomicpolygenic model [3,14] was used to estimate variance components for MY and FY using the complete dataset (phenotypes, pedigree, and genotypes) with program AIREMLF90 [15]. Secondly, SNP variances for MY and FY were computed using program POSTGSF90 and ordered from largest to smallest. Then, the following 5 SNP sets (Table 1) were defined: i) complete SNP set (SNP100; 76,519 SNPs); ii) top 75% set (SNP75; 57,390 SNPs); iii) top 50% set (SNP50; 38,260 SNPs); iv) top 25% set (SNP25; 19,130 SNPs); and v) top 5% set (SNP5; 3,826 SNPs).
Estimates of variance components for MY and FY for the five sets of genotypes were computed with AIREMLF90 [15] using the same 2trait genomicpolygenic model. This model contained herdyearseason, heterozygosity of the cow, and age at first calving as fixed effects, and animal additive genetic and residual as random effects. The mean of the animal ad ditive genetic and the residual effects was assumed to be zero. The variance of the animal additive genetic effects was equal to H 6 romosome. After these control mal.
re computed using program POSTGSF90 and ordered from sets (Table 1)  V e , where I is an identity matrix, and V e is a 2×2 matrix of environmental variances and covariances between MY and FY. The convergence cri terion for program AIREMLF90 was 10 -12 . The estimates of additive genetic and environmental variances and covariances at convergence for MY and FY were used to compute phe notypic variances and covariances, heritabilities, and genetic, environmental, and phenotypic correlations using the usual expressions. Standard errors of additive genetic and environ mental variances and covariances were computed as square roots of diagonal elements of the inverse of the average infor mation matrix computed at convergence. Standard deviations of phenotypic variances and covariances, heritabilities, and genetic, environmental, and phenotypic correlations were computed with a repeated sampling procedure [17] using a set of 5,000 samples.

Animal estimated breeding values, accuracies, and rankings
Animal EBV for MY and FY for the five SNP sets (SNP5, SNP25, SNP50, SNP75, and SNP100) were computed using the estimates of variance components at convergence for each set of genotypes. The EBV accuracies for each trait were ob five SNP sets were compared using the Spearman's rank correlation procedure of SAS 9.0 (SAS Inst.

170
Inc., Cary, NC, USA).  Table 2 for additive genetic effects, Table 3 for environmental effects, and in PEV is the prediction error variance (square of the standard error of prediction) and Animal estimated breeding values, accuracies, and rankings Animal EBV for MY and FY for the five SNP sets (SNP5, SNP25, SNP50, SNP75, and SNP100) were computed using the estimates of variance components at convergence for each set of genotypes. The EBV accuracies for each trait were obtained as the correlation between EBV and true breeding values computed using the formula: √ 1 − 2 ×100, where PEV is the prediction error variance (square of the standard error of prediction) and 2 is the estimate of the additive genomic-polygenic variance for each SNP set. After computing animal EBV for MY and FY with the five SNP sets, animal EBV were ranked from largest to smallest for MY and from smallest to largest for FY. Subsequently, animal rankings from five SNP sets were compared using the Spearman's rank correlation procedure of SAS 9.0 (SAS Inst. Inc., Cary, NC, USA).

Variances components, heritabilities, and correlations
Estimates of variances and covariances for MY and FY using genomic-polygenic models with five sets of SNP are shown in Table 2 for additive genetic effects, Table 3 for environmental effects, and in Table   4  is the estimate of the additive ge nomicpolygenic variance for each SNP set. After computing animal EBV for MY and FY with the five SNP sets, animal EBV were ranked from largest to smallest for MY and from smallest to largest for FY. Subsequently, animal rankings from five SNP sets were compared using the Spearman's rank cor relation procedure of SAS 9.0 (SAS Inst. Inc., Cary, NC, USA).

Variances components, heritabilities, and correlations
Estimates of variances and covariances for MY and FY using genomicpolygenic models with five sets of SNP are shown in Table 2 for additive genetic effects, Table 3 for environmental effects, and in Table 4  Estimates of additive genetic variances and covariances for MY and FY were higher for models with SNP75 (variances:   , and SNP25 were nearly identi cal (differences were near zero or below one percent) to the corresponding values from model with SNP100, whereas the corresponding differences for the model with SNP5 were all negative and mostly higher than one percent (variances: -2.30% for MY, -0.77% for FY; covariance: -1.97%). The slightly higher additive genetic variances and covari ances but lower environmental variances and covariances for MY and FY obtained with SNP75, SNP50, and SNP25 than with the complete SNP set indicated that these SNP subsets may have been able to more accurately accounted for MY and FY additive variability in this population than the complete SNP set. This may have occurred because the SNP markers in these subsets were on the average more closely associated with quantitative trait locus (QTL) affecting MY and FY than the complete set of SNP [18].
Heritabilities and additive genetic, environmental, and phenotypic correlations between MY and FY are presented in Table 5. Heritability estimates ranged from 0.231±0.030 (SNP5) to 0.276±0.039 (SNP25) for MY and from 0.159±0.047 (SNP100) to 0.183±0.044 (SNP25) for FY. The SNP25 MY and FY heritability estimates were slightly higher (0.4% to 20%) than those from SNP100, SNP75, SNP50, and SNP5. These differences among heritability estimates across SNP sets may be related to differences in linkage disequilibria between the SNP in each set and QTL affecting MY and FY determined by number of SNP and proximity of SNP in each set to MY and FY QTL [18]. Higher SNP25 heritability es timates for MY and FY indicated that faster selection responses for these traits could be expected with SNP25 than with SNP100, SNP75, SNP50, and SNP5 in the Thai dairy popu lation.
Heritability estimates for MY and FY across the five SNP sets were similar to values estimated in previous studies in the Thai dairy population using various SNP sets (0.19 to 0.26 for MY; 0.15 to 0.18 for FY [4]). Heritabilities for MY in the Thai dairy population were within the range of heritability esti mated for MY in various Holstein populations in temperate regions (0.25 to 0.30 [6,19,20]), but somewhat higher than an estimate in Holstein under tropical conditions in Brazil (0.13 [21]). Conversely, heritability estimates for FY were somewhat lower than values obtained for Holstein in temperate regions (0.25 to 0.30 [19,20,22]) Estimates of additive genetic, environmental and pheno typic correlations between MY and FY across the five sets of SNP were virtually identical (Table 5). Correlation estimates between MY and FY ranged from 0.718±0.115 (SNP100) to 0.748±0.087 (SNP25) for additive genetic, from 0.673±0.023 (SNP25) to 0.684±0.023 (SNP100) for environmental, and from 0.682±0.009 (SNP5) to 0.686±0.010 (SNP75) for phe notypic. The positive genetic correlations between MY and FY obtained here were similar to values previously reported for this Thai dairy population (0.66 to 0.79 [4]), and in agree ment with estimates for Holstein in other tropical (0.70 to 0.75 [23,24]), and in temperate regions (0.70 to 0.88 [25,26]).
The comparable or slightly higher additive genetic vari ances and heritabilities for MY and FY from SNP25, SNP50, and SNP75 than from SNP100 indicated that selecting a sub set of SNP genotypes with the approach used here would be Table 5. Heritabilities and genetic, environmental, and phenotypic correlations between MY and FY estimated with five SNP sets using genomic-polygenic models

Accuracy of genomic-polygenic estimated breeding values and animal rankings with five single nucleotide polymorphism sets
The accuracies genomicpolygenic EBV for MY and FY with the five sets of SNP genotypes (SNP100, SNP75, SNP50, SNP25, and SNP5) are shown in Figure 1. The SNP25 had the highest mean EBV accuracy for all animals (39.76% for   for FY). Further, the mean EBV accuracies from the four SNP subsets (SNP75, SNP50, SNP25, SNP5) were mostly higher than the mean EBV accuracy from the complete SNP set (SNP100) for all animals, sires, and cows. The percentage superiority of the mean EBV accuracies of SNP75, SNP50, SNP25, and SNP5 over SNP100 for MY were 0.58%, 5.21%, 5.34%, and 4.18% for all animals, 0.54%, 4.92%, 5.01%, and 3.76% for sires, and 0.58%, 5.24%, 5.36%, and 4.21% for cows. Similarly, the percentage superiority of the mean EBV accu racies of SNP75, SNP50, and SNP25 over SNP100 for FY were 0.98%, 1.77%, and 2.21% for all animals, 0.81%, 1.48%, and 1.84% for sires, and 0.99%, 1.80%, and 2.34% for cows. How ever, the mean EBV accuracies of SNP5 for FY were slightly lower (-1.91% for all animals, -2.18% for sires, and -1.89% for cows) than those of SNP100. The mostly higher mean EBV accuracies of the four SNP subsets were largely due to the higher MY and FY additive genetic variances explained by these SNP subsets than by the complete SNP set. Further, the fact that SNP25 yielded the highest mean EBV accuracy in dicated that choosing the top 25% of SNP from GeneSeek GGP80K based on percent of additive genetic variance ex plained for MY and FY (19,130 SNP) would be a suitable alternative to the complete SNP set for genomicpolygenic evaluation and selection in the Thai dairy multibreed popu lation.
The higher mean EBV accuracies obtained with four GG P80K subsets than with the complete SNP set supported the findings from previous research in dairy [5,2729] and in beef cattle [30] that SNP subsets can yield comparable or higher levels of EBV accuracy than complete SNP sets while lower ing genotyping costs.
Pairwise Spearman rank correlations between MY and FY EBV from of the complete SNP set and each of the four SNP subsets are shown in Table 6. All rank correlations between SNP100 and the four SNP subsets were above 0.98 (p<0.0001) for both traits, except for the correlation between SNP100 and SNP5 (MY, 0.93; FY, 0.92; p<0.0001). Rank correlations between SNP75 and SNP100 and between SNP50 and SNP100 for MY and FY were above 0.99 (p<0.0001), followed closely by rank correlations between SNP25 and SNP100 for MY (0.98; p<0.0001). Rank correlations indicated a high degree of agree ment between EBV from genomicpolygenic evaluations with the four SNP subsets and the complete SNP set. The high SNP25 estimates of genetic variances, heritabilities, EBV ac curacies, and rank correlations between SNP100 and SNP25 for MY and FY indicated that SNP25 would be expected to produce higher selection responses for MY and FY than any of the other SNP subsets and the complete GeneSeek 80K set. This indicates that a strategy to keep genotyping costs rea sonably low while speeding up genetic progress for MY and FY would be to genotype animals in the Thai multibreed dairy population with a de dicated chip constructed with the sub set of SNP markers in the SNP25 set. Thai dairy producers could decrease their genotyping costs before the utilization of a dedicated chip likely without reducing their ability to se lect replacement animals based on genomicpolygenic EBV by utilizing lowerdensity commercial genotyping chips.

CONCLUSION
Estimates of additive genetic variances, heritabilities, and EBV accuracies for MY and FY from genomicpolygenic models with SNP subsets were higher than those with complete SNP set. Genomicpolygenic evaluation with SNP25 had the high est estimates of additive genetic variances, heritabilities, and EBV accuracies for MY and FY. Further, genomicpolygenic EBV obtained using SNP subsets and complete SNP set had high rank correlations. Thus, utilization of the SNP25 set would be a suitable alternative to reduce genotyping costs and increase selection response for MY and FY in this dairy population.

CONFLICT OF INTEREST
We certify that there is no conflict of interest with any financial organization regarding the material discussed in the manu script.

ACKNOWLEDGMENTS
This research was a part of the project for development of a dairy geneticgenomic evaluation system in Thailand (P11 00116) funded by the National Science and Technology Development Agency, Kasetsart University, and the Dairy Farming Promotion Organization, and partly supported by the Kasetsart University Research and Development Institute Table 6. Spearman rank correlations between SNP100 and SNP75, SNP100 and SNP50, SNP100 and SNP25, and SNP100 and SNP5 for MY and FY