# Models for Estimating Genetic Parameters of Milk Production Traits Using Random Regression Models in Korean Holstein Cattle

## Article information

## Abstract

The objectives of the study were to estimate genetic parameters for milk production traits of Holstein cattle using random regression models (RRMs), and to compare the goodness of fit of various RRMs with homogeneous and heterogeneous residual variances. A total of 126,980 test-day milk production records of the first parity Holstein cows between 2007 and 2014 from the Dairy Cattle Improvement Center of National Agricultural Cooperative Federation in South Korea were used. These records included milk yield (MILK), fat yield (FAT), protein yield (PROT), and solids-not-fat yield (SNF). The statistical models included random effects of genetic and permanent environments using Legendre polynomials (LP) of the third to fifth order (L3–L5), fixed effects of herd-test day, year-season at calving, and a fixed regression for the test-day record (third to fifth order). The residual variances in the models were either homogeneous (HOM) or heterogeneous (15 classes, HET15; 60 classes, HET60). A total of nine models (3 orders of polynomials×3 types of residual variance) including L3-HOM, L3-HET15, L3-HET60, L4-HOM, L4-HET15, L4-HET60, L5-HOM, L5-HET15, and L5-HET60 were compared using Akaike information criteria (AIC) and/or Schwarz Bayesian information criteria (BIC) statistics to identify the model(s) of best fit for their respective traits. The lowest BIC value was observed for the models L5-HET15 (MILK; PROT; SNF) and L4-HET15 (FAT), which fit the best. In general, the BIC values of HET15 models for a particular polynomial order was lower than that of the HET60 model in most cases. This implies that the orders of LP and types of residual variances affect the goodness of models. Also, the heterogeneity of residual variances should be considered for the test-day analysis. The heritability estimates of from the best fitted models ranged from 0.08 to 0.15 for MILK, 0.06 to 0.14 for FAT, 0.08 to 0.12 for PROT, and 0.07 to 0.13 for SNF according to days in milk of first lactation. Genetic variances for studied traits tended to decrease during the earlier stages of lactation, which were followed by increases in the middle and decreases further at the end of lactation. With regards to the fitness of the models and the differential genetic parameters across the lactation stages, we could estimate genetic parameters more accurately from RRMs than from lactation models. Therefore, we suggest using RRMs in place of lactation models to make national dairy cattle genetic evaluations for milk production traits in Korea.

**Keywords:**Random Regression Model; Test Day Yield; Milk Production; Heritability; Holstein

## INTRODUCTION

Holstein cattle produce the majority of dairy milk in Korea. Similar to the breeders in other countries, Korean local breeders also show great interests for the genetic improvement of milk production traits that are directly related to the profitability of dairy herds. In general, milk yield, fat yield, and protein yield from a dairy cow are considered the primary selection traits (Bahreini Behzadi et al., 2013) in a herd. In this regard, test-day milk production records are widely used in many countries for the genetic analysis of dairy cattle. Genetic evaluation studies using test-day production records through various statistical models are used across numerous countries, such as repeatability models treating equal genetic correlation between each test-day record, multiple-trait models using each test-day record as different traits, or random regression models (RRMs) considering a covariate function of repeated test-day records over time (Meyer et al., 1989; Ptak and Schaeffer, 1993; Schaeffer and Dekkers, 1994).

The RRMs are advantageous over other models in that they analyze each test-day record assuming that genetic and non-genetic variances vary with days in milk and parity, as do genetic and non-genetic correlations (Liu et al., 2000a; Schaeffer et al., 2000). Theoretically, these models can extract more information from the data and allow for a more accurate modeling. Although they involve a large number of records and are more sensitive to the large number of parameters than the lactation model, they are more precise and flexible than the single-trait repeatability models or the multiple-trait models (Henderson, 1982; López-Romero and Carabaño, 2003; Andonov et al., 2013). For this reason, complex RRMs approaches are more popular and tend to be used as an international reference for national genetic evaluation of production traits of dairy cattle in many countries (Rupp and Biochard, 2003; Strabel et al., 2005). Nonetheless, many studies that have used RRMs and have compared models with homogeneous and heterogeneous variance classes, have proposed the use of heterogeneous residual variances by class because it fits the data in the same order of coefficients as Legendre polynomials (LP) (Jamrozik and Schaeffer, 1997; Mayer, 1999; Olori and Hill, 1999; Bignardi et al., 2009; Takma and Akbas, 2009; Hurtado-Lugo et al., 2013).

In Korea, national genetic evaluations for dairy production traits are mostly based on the 305-day lactation average model, using the first five lactation records as separate traits. Therefore, in the present study, we estimated the additive and permanent environmental (co)variances for the first lactation milk production traits and compared the goodnesses of RRMs fit using homogeneous and heterogeneous residual variances from test-day data.

## MATERIAL AND METHODS

### Animals and data recording

A total of 3,190,654 raw data test-day records were collected from first parity of Holstein cows under the Dairy Cattle Improvement Centre of National Agricultural Cooperative Federation (NACF) in South Korea from 2007 to 2014. Test-day records included milk yield (MILK), fat yield (FAT), protein yield (PROT), and solids-not-fat (SNF). Initial edits were performed on the data. Records of cows were discarded if they had not calved between 18 and 48 months of age. In addition, records with test dates collected before 5 days in milk (DIM) and after 305 DIM, and data on cows with less than 8 DIM records from the respective lactation, were removed. Initial edits also included the removal of any data on herd-test day (HTD) classes with less than 10 records, and the removal of data on cows with unknown pedigree information (both parents unknown). This left a total of 126,980 test day records for the study. The total test day period was divided into 60 classes (from 5 to 305 DIM at 5 day intervals) and 15 categories (5 to 9, 10 to 14, 15 to 19, 20 to 29, 30 to 49, 50 to 59, 60 to 164, 165 to 194, 195 to 229, 230 to 259, 260 to 264, 265 to 279, 280 to 294, 295 to 300, and 300 to 305 DIM) of the heterogeneous residual variances.

### Animal pedigree

Animal pedigrees for cows with production records were obtained from the Korean Animal Improvement Association (KAIA). A total of 39,855 animals were found in the pedigree dataset. Respective pedigree was traced back 21 generations. Approximately 46% of the animals were found to be inbred and the average inbreeding coefficient of the inbred animals was 0.019.

### Data analysis

We considered RRMs with both homogeneous and heterogeneous residual variance components. For homogeneous residual variances, the original HTD factor was fitted as a contemporary group. However, a fixed factor of 60 HTD classes (HET60) and 15 HTD classes (HET15) were fitted to obtain heterogeneous residual variances at different stages of lactation. The fit for LP were considered up to the third order (L3), the fourth order (L4), and the fifth order (L5), and also included the intercept for genetic and permanent environments. Thus, a total of 9 models (3 orders of polynomials×3 types of residual variance) were obtained for variance component analyses, such as L3-HOM, L3-HET15, L3-HET60, L4-HOM, L4-HET15, L4-HET60, L5-HOM, L5-HET15, and L5-HET60. Four seasons of calving (spring [March to May]; summer [June to August]; autumn [September to November]; winter [December to February]) were defined and fitted accordingly as a fixed effect of year-season in the model. A term for fixed regression of DIM was also included in the models. The (co)variance components from various models were obtained using the WOMBAT 6.0 software package (Meyer, 2007). The RRMs used were as follows:

where y_{ijklm} is the test-day record *m* of cow *l* made on day *k* within HTD effect *i*, belonging to year-season of calving class *j*; HTD_{i} is the i^{th} HTD effect; YS_{j} is the j^{th} year-season of calving; ϕ_{klm} is the LP for the test day record of cow *l* made on day *k*; β_{k} is the coefficient of fixed regression; u_{kl} is the coefficient for random regression of the additive genetic effect of the *l*^{th} cow; pe_{kl} is the coefficient for random regression of the permanent environment effect of the *l*^{th} cow; e_{ijklm} is the random residual effect; and n is the number of fitted coefficients of LP.

The (co)variance structure for models was assumed to be:

where G and P are the (co)variance matrices of the random regression coefficients for additive genetic and permanent environmental effects, R is a diagonal matrix of residual variance, A is the additive genetic relationship matrix among cows, I is an identity matrix, and ⊗ is the Kronecker product.

The goodnesses of fit among different models were tested using the logarithm of the likelihood function (−2*log*L), and Akaike information criteria (AIC) and Schwarz Bayesian information criteria (BIC) values (−2*log*L+2K [Akaike, 1973]; −2*log*L+K*log*n [Schwarz, 1978]). The K and n in the formula of AIC and BIC indicate the number of parameters and the number of records, respectively. The goal was to identify appropriate models through model selection using AIC or BIC by-production traits.

## RESULTS AND DISCUSSION

Descriptive statistics on the test-day milk production traits are shown in Table 1. The average yields of MILK, FAT, PROT, and SNF were 29.65±6.47 kg, 1.12±0.30 kg, 0.94±0.20 kg, and 2.59±0.56 kg, respectively. Kim et al. (2009) reported similar averages for Korean Holstein cattle: MILK (28.25 kg), FAT (1.07 kg), PROT (0.90 kg), and SNF (2.50 kg). Similar averages for MILK have also been reported by Lee et al. (2003) and Cho et al. (2005). We found relatively higher phenotypic averages in this study, which could be an effect of the increased use of elite breeding bulls and cows over the past few years.

Table 2 represents the deviation between the estimated value (−2logL, AIC or BIC) of a given model and the lowest value for the respective parameters among the models. Most of the estimates decreased as the order of LP fit increased in the models of a particular residual type, except the BIC for FAT. The smallest −2*log*L and AIC estimates were observed in HET60 models (60 heterogeneous residual classes) with fifth order polynomials (LP5) for all traits; this also showed the best fit among other models. Homogeneous residual models, on the other hand, were the least fit models for these above criteria. The best-fit models based on the smallest BIC were slightly different than the best-fit models using −2*log*L and AIC. Also, the smallest BIC estimates on traits were obtained with HET15 models only, even though their LP fits were different. Similarly, in a previous study, the lowest AIC and BIC were obtained from Brazilian Holsteins (Costa et al., 2008) for milk yield using a fifth order LP with 29 heterogeneous classes. They also showed that heterogeneous models with increased polynomial orders (from third to fifth) obtained the best model estimate. This is in agreement with another previous study on Brazilian Holsteins (Bignardi et al., 2009). Generally, the measures of AIC and BIC differ because of the penalty weights which are considered in their calculations. The AIC, as a penalty, considers a value twice the number of total model parameters (2K; see Materials and Methods), whereas the penalty for BIC includes a value of total model parameters multiplied by the natural logarithm of total records (K*log*n; see Materials and Methods). Because the BIC accounts for both the number of model parameters and total records under study, it penalizes a model more rigorously than AIC and obtains a more parsimonious model (Cobuci et al., 2011; Dziak et al., 2012; Hurtado-Lugo et al., 2013). Therefore, we used BIC as the best fit model in this study.

The trends of estimated variances for genetic, permanent environment and residual components for MILK, FAT, PROT, and SNF traits according to DIM are shown in Figure 1. In general, the genetic variances for MILK, FAT, PROT, and SNF yield showed gradual decrease initially (5 to 30 DIM), followed by a somewhat steady variance up to the middle of lactation (30 to 185 DIM), and then showed a subtle increase (185 to 265 DIM) before a final descent until the end of lactation. Decreased genetic variances at the beginning and the end of the lactation curve have been reported for dairy cattle (López-Romero and Carabaño, 2003; Cobuci et al., 2011). We observed the same trend at the two extremes of the lactation curve. Similar trends of permanent environmental variances for studied traits up to mid-lactation were observed, except for the gradual increases until the end of lactation and higher values during the extreme period of lactation compared to the mid-lactation. These results are similar to those observed by Pool et al. (2000), Cobuci et al. (2011), Bignardi et al. (2011), and Kheirabadi et al. (2014) for MILK of Holstein. However, residual variances for heterogeneous models decreased gradually until the end of lactation, although they remained steady during the middle period. Studies by López-Romero et al. (2003) and Herrera et al. (2013) on milk production of Holstein cattle reported similar patterns for residual variances using models of 5 and 30 heterogeneous classes fitting fourth and fifth order LPs, respectively.

Figure 2 shows the changes in heritability estimates for milk production traits derived from the best-fit model variances. The heritabilities for traits varied as lactation progressed, where h^{2} estimates of MILK, PROT, and SNF started with a slight decrease followed by a gradual increase at 35 DIM until reaching the highest estimates at 259 DIM, but again ended with a steady decrease until 305 DIM. These results are in agreement with many previous reports (Liu et al., 2000b; Jakobsen et al., 2002; Bormann et al., 2003; De Roos et al., 2004; Costa et al., 2008; Cobuci et al., 2011) but not with Kheirabadi et al. (2014), which reported higher estimates at the beginning and the end of lactation. The heritability estimated for milk yield using the L5-HET15 model ranged from 0.08 (27 DIM) to 0.15 (255 DIM). Heritabilities in this study were significantly lower than those reported by other studies on Holstein (Olori et al., 1999; Pool et al., 2000; Kim et al., 2009). This is because the genetic variances were relatively lower compared to the permanent environmental variances and residual variances. The FAT heritability estimates obtained using the L4-HET15 model were also low, ranging from 0.06 to 0.14. Heritabilities for PROT ranged between 0.08 at 10 DIM and 0.12 at 305 DIM. Although a higher FAT h^{2} was observed at the beginning of lactation, eventually it decreased to the lowest point at about 59 DIM and then remained somewhat steady after a slight increase until 305 DIM. However, heritability found in SNF yield in the present study ranged between 0.07 and 0.13 at 29 and 252 DIM, respectively. The results of h^{2} estimates for FAT, PROT, and SNF were lower than for MILK. Similar results have been reported by Jakobsen et al. (2002), Kim et al. (2009), and Kheirabadi et al. (2014). Heritabilities for milk production traits in our study were lower than previous estimates that used RRMs on Korean Holstein (Cho et al., 2005; Kim et al., 2009). However, similar results for MILK were observed by Lee et al. (2003). The plausible cause for these differences in h^{2} among studies may be the use of different LP models and sample sizes subjected to data restriction in each study.

The genetic correlation estimates (r_{G}) between DIM for milk production traits are shown in Figure 3. These estimates ranged from almost no correlation to a complete correlation for MILK (−0.01 to 1.00), FAT (0.06 to 1.00), PROT (0.12 to 1.00), and SNF (−0. 04 to 1.00). Genetic correlations were stronger between adjacent DIM records and relatively weaker between distant DIM records. These results are in close agreement with genetic correlations that were reported previously for Holsteins using RRMs with test day records (Cobuci et al., 2011; Herrera et al., 2013; Kheirabadi et al., 2014). However, the genetic correlations for MILK mostly varied during the early lactation stage; for example, between 0.06 and 1.00 until 115 DIM. After 115 DIM, the r_{G} estimates were greater than 0.90 throughout lactation. These higher genetic correlations between 115 and 305 DIM indicate that the estimated breeding values were similar along the later lactation stages. This raises the possibility for inclusion of cow (lacking complete records between 100 to 305 DIM) phenotypes in the model to obtain breeding values, especially when genetic evaluations using test day RRMs are performed. Likewise, high genetic correlations were obtained for other traits until the end of lactation, despite the fact that the length of the stage differed slightly (FAT [151 to 305 DIM], PROT [140 to 305 DIM], SNF [153 to 305 DIM]).

## CONCLUSION

An RRMs analysis of longitudinal trait in dairy cattle to study genetic changes over time can increase understanding of the genetics of lactation. In this study, a variety of models were considered to find the goodness of models for milk production traits. The results of BIC among models showed that the L5-HET15 model fitted data best for MILK, PROT, and SNF, whereas the L4-HET15 model fitted data best for FAT. This suggests that the order of LP and type of residual variances affected the goodness of model. Also, the heterogeneity of residual variances should be considered on the test-day analysis as they varied during the lactation; especially, they declined as lactation progressed. Heritability estimates of the traits, even though were relatively small, showed similar patterns with previous reports, throughout the lactation period. Genetic correlations among various DIMs were concordant with previous reports. Higher genetic correlation estimates during the middle and end lactation stage also suggested the inclusion of cows for breeding value estimation that had missing records for those periods. This RRMs study has provided us a good insight into the changes in genetic merit of milk production traits in Korean Holstein cattle. Regarding the fitness of the models and differential genetic variance components across the lactation stages, parameter estimates of RRMs seemed to be more accurate and more informative than those of lactation models. Therefore, we suggest using RRMs in place of lactation models to make national dairy cattle genetic evaluations for milk production traits in Korea.

## ACKNOWLEDGMENTS

This research was supported by the funding from the project “Development of selection program in dairy bull using genomic information (Project No: PJ009260)” and the “Postdoctoral Fellowship Program” of the National Institute of Animal Science, RDA, Cheonan, Korea.

## Notes

**CONFLICT OF INTEREST**

We certify that there is no conflict of interest with any financial organization regarding the material discussed in the manuscript.