# Random Regression Models Using Legendre Polynomials to Estimate Genetic Parameters for Test-day Milk Protein Yields in Iranian Holstein Dairy Cattle

## Article information

## Abstract

The objective of this study was to estimate the genetic parameters of milk protein yields in Iranian Holstein dairy cattle. A total of 1,112,082 test-day milk protein yield records of 167,269 first lactation Holstein cows, calved from 1990 to 2010, were analyzed. Estimates of the variance components, heritability, and genetic correlations for milk protein yields were obtained using a random regression test-day model. Milking times, herd, age of recording, year, and month of recording were included as fixed effects in the model. Additive genetic and permanent environmental random effects for the lactation curve were taken into account by applying orthogonal Legendre polynomials of the fourth order in the model. The lowest and highest additive genetic variances were estimated at the beginning and end of lactation, respectively. Permanent environmental variance was higher at both extremes. Residual variance was lowest at the middle of the lactation and contrarily, heritability increased during this period. Maximum heritability was found during the 12th lactation stage (0.213±0.007). Genetic, permanent, and phenotypic correlations among test-days decreased as the interval between consecutive test-days increased. A relatively large data set was used in this study; therefore, the estimated (co)variance components for random regression coefficients could be used for national genetic evaluation of dairy cattle in Iran.

## INTRODUCTION

Improvements in the modeling of genetic and environmental effects that influence the productive traits of dairy cattle can be achieved by using the test-day models (TDM). These models evaluate differences in the lactation curves of animals (Bormann et al., 2003). Among the models that consider test-day production, the random regression model (RRM) has been shown to increase the accuracy of breeding value predictions and has already been implemented in official genetic evaluations of dairy cows in many countries (Strabel et al., 2004). Use of the TDM approach allows for the development of a more detailed statistical model that accounts for environmental variation specific to individual TD yields and the genetic effects associated with individual animals.

The general concept of using RRMs for analysis of covariance in an animal breeding context was suggested by Henderson (1982) and Laird and Ware (1982). Schaeffer and Dekkers (1994) suggested their use in dairy cattle breeding for the analysis of test-day production records. There have been numerous studies applying RRM to the genetic evaluation of dairy cattle for test-day milk production. Since then, RRM has become the standard for the genetic analysis of longitudinal data (Schaeffer, 2004).

There are different functions that can be used to fit the yield trajectory. Among those used in random regression, the most often used are parametric functions and Legendre orthogonal polynomials. Legendre polynomials are flexible, enabling the fitting of curves independent from the trait of interest. In turn, parametric functions are based on components of the typical curve and tend to impose a particular shape. This can result in satisfactory fits when the data follows one of the typical curves but fits unsatisfactory when the trait’s data follow a different path. Some studies have also been shown in favor of a lower degree needed for the genetic compared with the permanent environmental components (Van der Werf et al., 1998; Pool et al., 2000). However, all studies considered an equal order of fit in order to provide equal opportunity for variation of both components. In general, the Legendre polynomials have largely been used to fit random curves due to their ability to describe the variation over a period.

Thus, the objective of the present study was to determine the parameters required for the use of RRMs in the evaluation of test-day milk protein yield in first lactation Holstein cows, using the restricted maximum likelihood (REML) method, to estimate breeding values for test-day milk protein yield.

## MATERIALS AND METHODS

A total of 1,112,082 test-day milk protein yield records from 167,269 first lactations of Holstein cows, recorded from 1990 to 2010 by the Animal Breeding Center of Iran, were analyzed. Age at first calving varied from 19 to 35 months. Test days prior to the 5th day and after the 305th day of lactation were excluded. The records were divided into 12 lactation stages, and only data from cows with at least four records were kept in the data set. The structure of the data set after editing is summarized in Table 1.

Analysis was performed using a single-trait RRM. Legendre polynomials were used because they are orthogonal, normalized, and resulted in a better convergence with more accurate results, as compared to conventional polynomials (Kirkpatrick et al., 1990). The model included direct genetic and permanent environmental as random effects and the fixed effects of contemporary group and age of cow as covariables (linear and quadratic effects). The choice of fixed effects to be considered was statistically significant with the general linear model procedure of SAS v9.2 software package (SAS Institute Inc., 2009). Therefore, milking times, herd, age of recording, year, and month of recording were included as fixed effects in the model.

In matrix notation, the model can be represented as:

where y is the vector of the N observations measured in Nd animals; b includes fixed effects; a is the vector of solutions for the coefficients of the additive genetic random effects; pe is the vector of solutions for the coefficients of the permanent environmental random effects; e is the vector of residual effects; and X, Z, W are the correspondent incidence matrices for fixed and additive genetic and permanent environmental random effects, respectively.

The assumptions regarding these components were:

and

Where, Ka and Kpe are the (co)variance matrices between the random regression coefficients of the additive genetic and permanent environmental effects, respectively, A is the relationship matrix between the individuals, Ind is the identity matrix of dimension nd, ⊗ is the Kronecker product, and R represents a diagonal matrix containing residual variances.

The (co)variance components and the genetic parameters were estimated using the REML method with the Wombat program (Meyer, 2007).

The order of fit is usually chosen so that the observed variance-covariance (VCV) structure can be appropriately fitted with as few parameters as possible (Van Der Werf et al., 1998). Convergence was assumed when the difference between the −2log values of the likelihood functions obtained in consecutive iterations was smaller than 10 to 11.

## RESULTS AND DISCUSSION

The correlations and variance and covariance components of regression coefficients for random additive genetic and permanent environmental effects obtained in the data set are presented in Table 2. As shown in Table 2, genetic correlations between the intercept and linear regression coefficients were positive, close to 0.4. The largest genetic variances were associated to the intercept (11.161) and the linear (1.711) coefficient, whereas the variances were close to zero for the other coefficients. In general, absolute values of genetic correlations between coefficients varied from almost null to high and were higher than those between permanent environmental coefficients, except for correlation between the linear and cubic coefficient which were −0.05 and −0.43 for genetic and permanent correlation, respectively. Estimated genetic and permanent environmental variance components for random regression coefficients were in agreement to those obtained by Olori et al. (1999) and El Faro et al. (2008).

To describe the best structure to model the residual variances, the order of fit for the other random effects (genetic direct and permanent environmental) was kept constant and equal to four. In general, other studies of milk yield have shown that a lower order of covariance function could be sufficient to fit the data. Jamrozik and Schaeffer (2002) found that the TDM with Legendre polynomials outperformed the TDM with a lactation curve function, considering the same number of parameters in terms of statistics on the goodness of fit. Legendre orthogonal polynomials seem to efficiently describe the evolution of milk yields, during a complete lactation, of dairy cows in different management conditions (Gengler et al., 1999; Rekaya et al., 1999; Brotherstone et al., 2000). The first Legendre polynomial coefficient (a1), associated with the additive genetic effect, is related to total milk production, and the second (a2) is related to persistence in the lactation curve (Strabel and Jamrozik, 2006). According to this interpretation, the genetic correlation estimates between the additive genetic regression coefficients (a1 and a2) indicate low genetic association between total milk production and persistency. However, selection based on components related to different phases of the lactation curve is complex, because the association between these components and the phases of the lactation curve is not well understood (Rekaya et al., 1999). Meyer (1998) and Albuquerque and Meyer (2001) proposed that high correlations between random regression coefficients can cause some of the eigenvalues to be negligible and set to operational zero.

The estimates of heritability, additive genetic, phenotypic, permanent environmental, and residual variances for milk protein yields are shown in Table 3. Heritabilities for test-day milk protein yields ranged from 0.086 to 0.213. The heritability estimates decreased during the first two stages of lactation and then slowly increased afterward. These results follow the trend reported in the literature for Holsteins (Gengler et al., 1999; Olori et al., 1999). In Brazil, however, Cobuci et al. (2005), who used the exponential function of Wilmink (1987), and Araújo et al. (2006), who used Legendre polynomials, found higher heritability estimates at the end of lactation. Higher heritability estimates for milk yield during different lactation periods were also observed in dairy cows by Jamrozik and Schaeffer (1997), and Kettunen et al. (2000). In addition, Druet et al. (2005) reported the highest estimated values in dairy cows during mid-lactation (0.51) and the lowest during early lactation (0.08). Alternately, Strabel and Misztal (1999) reported that the maximum additive genetic variance estimate was in early lactation of Polish Black and White cattle. However, Pool et al. (2000) and Druet et al. (2003) found higher additive genetic variance estimates in mid-lactation in their Holstein populations. Residual variance was highest at the beginning of lactation and relatively constant in mid-lactation. Uncontrolled environmental factors were more likely to occur at the onset of the lactation process. This might be a possible explanation for the observed trend. The estimates of phenotypic and permanent environmental variances were high in the first and last stages of lactation, which is in accordance with many other similar studies Olori et al. (1999), Pool et al. (2000) and Kettunen et al. (2000). It’s possible that the high estimate for phenotypic variance in the first and last stages was due to the smaller number of records in these stages (in the last stage not all of the cows had test-day records because of the high frequency of short lactations in the breed) (Meyer, 1999).

Generally, the trends for permanent environmental variance were more regular throughout first lactation, compared to residual variances. Based on the results of Zavadilová et al. (2005) in Holstein cows, the highest permanent environmental variance for dairy traits occurred in the first and last days of lactation. The additive genetic, permanent environmental, and phenotypic correlations between test-day protein yields at different stages of lactation are shown in Table 4 and 5. The additive genetic correlations were higher than the phenotypic correlations. While the additive genetic correlations varied from 0.21 to 0.99, the phenotypic correlations for test-day protein yields varied from 0.21 to 0.63. The higher values were observed between adjacent test-day records in the middle and at the end of lactation and the lower estimates between test-day records at the beginning of lactation that was in accordance with study by Druet et al. (2003).

Permanent environmental correlations were always positive with a range from 0.38 to 1.00, which were in agreement with results found in dairy cows by Olori et al. (1999) and Kettunen et al. (2000). These findings might be attributed to the difficulty in modeling test-day protein yields at the beginning of lactation when the cow is still suffering from the stress of calving and presents an energy deficit.

Higher additive genetic correlations between the extreme test-days were also obtained by Cobuci et al. (2005). Pereira et al. (2013), using Legendre polynomials for Gyr dairy cows, found estimates of additive genetic correlations between daily milk yields close to 0.99 between adjacent test days, decreasing to about 0.40 between records obtained at the beginning and at the end of lactation. This result suggests that a fourth order polynomial is sufficient to model the additive genetic effect for the data in the present study, as observed in other studies (Olori et al., 1999; Kettunen et al., 2000). Kirkpatrick and Heckman (1989) suggested the use of covariance functions defined by orthogonal Legendre polynomials to describe the (co)variance structure of longitudinal observations and more precisely estimate the genetic and environmental components of variance. Other procedures have been used in this area such as the character process models (Pletcher and Geyer, 1999) and hierarchical models based on nonlinear functions that use coefficients with more biological sense (Rekaya et al., 2000). These procedures require more complex computing techniques than RRMs and have not been employed in large-scale applications.

Alternative RRMs, including lactational and/or Legendre polynomial submodels, have been compared in several studies (Jamrozik et al., 1997; Van der Werf et al., 1998; Olori et al., 1999; Brotherstone et al., 2000; Kettunen et al., 2000; Pool et al., 2000). For RRMs based on Legendre polynomials, previous studies propose that at least a three coefficient polynomial is needed to model the (co)variance structure of the random components of the data (Olori et al., 1999; Pool et al., 2000).

## IMPLICATIONS

Considering the capacity of RRMs to provide mechanisms for the estimation of individual lactation curves, it seems feasible to predict the genetic merit of animals using random regression coefficients. The results showed that the genetic correlation estimate between adjacent records was high, but with an increase in distance between records, the genetic correlations decreased. Genetic correlations were generally larger than phenotypic correlations. The highest heritability for milk protein yields occurred in the last lactation stage. The existence of 0.37 to 0.96 of additive genetic variability for milk protein trait during lactation stages indicated that the use of RRM is justified for selection on the level and shape of lactation curve in dairy cattle.

## ACKNOWLEDGMENTS

The authors thank the Animal Breeding Centre of Iran for providing the data used in this study.

## Notes

**CONFLICT OF INTEREST**

We certify that there is no conflict of interest with any financial organization regarding the material discussed in the manuscript.

## References

*Bos indicus*) cattle. J Dairy Sci 96:565–574.