INTRODUCTION
Gene pyramiding aims to design a superior trait through combining favorite alleles into an ideal genotype. Currently, molecular dissection of complex traits has striven to explain the genetic architecture of agronomic traits in plants or economic traits in animals (
Doerge, 2002;
Ljungberg et al., 2002;
Chen and Kendziorski, 2007). Many quantitative trait loci and linked markers have been identified. The rapidly growing molecular information will provide great opportunities for practical applications of crops and farm animals using marker assisted selection as well as marker assisted gene pyramiding (
Fadiel et al., 2005).
Marker assisted gene pyramiding is an important branch of marker assisted selection. It has been successfully applied in many plant breeding schemes most of which involved pyramiding of disease resistance genes with the main effects (
Huang, 1997;
Singh et al., 2001;
Saghai Marrof, 2008;
Kameswara Rao et al., 2010). Although, in recent years, some theoretical studies of marker assisted selection have been done (
Lande and Thompson, 1990;
Ruane and Colleau, 1995;
Moreau et al., 1998;
Lange and Whittaker, 2001;
Hu, 2007), the theoretical study of marker assisted gene pyramiding has just begun.
Servin et al. (2004) investigated the theoretical issues of gene pyramiding and proposed general principles for designing gene pyramiding schemes in plants. They proposed that if the location and a series of genes of interested were known, the selection problem may be reduced to a “building block” problem. The estimate of pyramiding efficiency is based on gene transmitted probability and the minimum population size needed for obtaining the individual with an ideal genotype. In consideration of the features of an animal population, such as the long generation interval and limitation of fertility,
Zhao et al. (2009) extended these theories to design some representative gene pyramiding schemes for pyramiding three and four target genes, and proposed two criteria to select the optimal scheme in certain conditions. However, these theoretical studies did not take into account the initial target gene frequencies in the base population and selection strategies. In practice, animal breeding populations are segregating populations. So the likelihood that a favorable allele is completely absent in a breed is small. Hence, gene pyramiding breeding theory for animals needs further study.
Within the field of evolutionary computation, there have been some studies using animal breeding strategies to design algorithms to search for optimal solutions to problems (
Muhlenbein and Voosen, 1993;
Podlich and Cooper, 1998). Inspired by the science of evolutionary computation (
David, 1989), the algorithms of gene pyramiding breeding are developed based on the same theoretical foundation of the building block hypothesis from the evolutionary algorithms perspective. Selection over several generations promotes the superior allele pyramided at all target loci. Considering the segregating population in animal breeding practices, we designed four types of cross programs for pyramiding two, three and four target genes. In these programs, we used the population hamming distance and superior genotype frequencies to measure the pyramiding efficiencies in the process of gene pyramiding. There are also some other factors considered, which include the initial favorite allele frequencies in base populations, base population sizes and the selection strategies.
MATERIALS AND METHODS
General concept of gene pyramiding breeding
Marker-assisted gene pyramiding aims to produce individuals with superior economic traits in optimal breeding schemes through selecting and pyramiding favorite target alleles or linked markers into a single genotype.
Servin et al. (2004) proposed that gene pyramiding breeding consisted of two basic steps, the pyramiding step and the gene fixation step. In our studies, we designed four types of cross programs for pyramiding two, three and four target genes in the pyramiding step (
Figure 1), and in this step the target genes existing in different populations with different favorite allele frequencies are cumulated into one cross population. The fixation step begins with the cross population, then the selected parents are intercrossed to fix all the target genes into an ideal genotype individual (
Servin et al., 2004;
Zhao et al., 2009).
Population and individual genotype simulation
Our studies assume gene pyramiding design is a process of searching the optimal genotype combination, the target trait was mainly controlled by several major genes, and the individual’s genotype was coded by a string of 0 or 1. We coded the genotype of one locus using two characters (0 or 1), and the string population represents the genotype of all individuals in the population. Initial base population is represented by N×M Matrix (N denotes the number of individuals in the base population, M/2 denotes the number of loci). The favorite allele frequencies of the initial base population are set at various levels. In each generation, individuals are evaluated by genotypic scores and phenotypic values using different selection strategies.
Discrete recombination is used to combine (mate) two individuals (parents) to produce new offspring by the crossover of the selected parents. Discrete recombination uses a crossover mask to indicate which parents will supply bits (alleles) to the offspring, and a crossover mask is the same length as the individual structure, which is randomly generated by 0 or 1 with equal probability. Crossover mask 1 indicates the allele of offspring at this locus is inherited from parent 1, crossover mask 0 indicates the allele of offspring at this locus is inherited from parent 2. Discrete recombination at each locus is used to produce offspring with a new genotype combination. Offspring1 is produced by mast1, and offspring 2 is produced by mast 2, the allele inherited from parent 1 is marked with underline (see as follow).
In our simulations, the supposed ideal population is the population with fixation of favorite alleles at all target loci. For example, as to four loci, the ideal genotype is 11-11-11-11, and the ideal population is coded as 1s matrix, in which all individuals carry ideal genotypes. In information theory, the Hamming distance, named after Richard Hamming, is the number of positions in two strings of equal length for which the corresponding elements are different. Hamming distance has been used to measure the number of nucleotide differences between two genetic sequences (
Pilcher, 2008). In this research, we borrow this idea to measure the distance between two populations, which is called the population Hamming distance (PHD). PHD is the total number of different alleles at target loci in the population at each generation compared to the ideal population. For the following example, pop (t) and pop (ideal), both populations with four target loci (two alleles at each locus) and population size is 6. Matrix column represents target loci, row represents individuals of the population. Population hamming distance between pop (t) and pop (ideal) is 19.
Genotypic selection and phenotypic selection strategy
In the genotypic selection strategy, genotype 11 is scored 2, genotype 10 is scored 1, and genotype 00 is scored 0. The genotypic selection score is the sum of the score of genotype at all loci, and the score is used as the selection criterion in subsequent generations, and the additive genetic effects are assumed here.
In the phenotypic selection strategy, the phenotypic observation of each individual is modeled as:
Where
pi is the phenotypic observation of individual i,
μ0 is the overall mean,
gj is the gene effect at jth locus (j = 1,2,…,
m, where
m is the number of target genes),
xij is an indicator variable of genotype
j with value 0, 1, 2, and is the residual error following the distribution N(0,
σɛ2). The values of genotypes are defined in terms of the midpoint (m), additive (a) and dominance (d) genetic parameters. The numerical coding of three genotypes 11, 10, 00 are 5, 4, 1 respectively in the model (
3). For an analysis of genotypes in a single environment, heritability on an individual basis will be estimated as
equation (4). From the defined heritability an estimate of
σɛ2 is obtained by calculating
σɛ2 and re-arranging
equation (4) to
(5).
Cross programs and gene pyramiding design breeding
In this study, we designed four types of cross programs for gene pyramiding breeding, which are represented by II, III, III.C, IIII.S. For each cross program, various schemes are also designed given various levels of initial favorite allele frequencies and trait heritabilities, the schemes are denoted by cross program-X-h/G (X is an indicator variable with letter A, B, C, etc, h denotes trait heritability 0.2, 0.4 or 0.6 and G denotes genotypic selection). II represents pyramiding two target genes from popA and popB (
Figure 1a), A1/A2 denotes favorite allele frequencies in the first/second loci in the popA, B1/B2 denotes favorite allele frequencies in the first/second loci in the popB, N denotes the base population size. The base population sizes of popA and popB vary from 500, 1,000 to 2,000. The initial favorite allele frequencies A1/A2 and B1/B2 at first/second loci are set as 0, 0.25, or 0.50, respectively. The popAB is produced by crossing popA with popB. The top 500 individuals based on genotypic score are selected for the next generation and each pair of parents is assumed to produce four offspring with the sex ratio 1:1. Then, the selected parents are randomly intercrossed to produce the subsequent generations until two target genes are pyramided into an ideal genotype.
III represents pyramiding three target genes from popA, popB and popC (
Figure 1b), which we called a three population cascading cross, A1/A2/A3 denotes favorite allele frequencies in the first/second/third loci in the popA, B1/B2/B3 denotes favorite allele frequencies in the first/second/third loci in the popB, C1/C2/C3 denotes favorite allele frequencies in the first/second/third loci in the popC. The initial favorite allele frequencies A1/A2/A3, B1/B2/B3 and C1/C2/C3 at first/second/third loci are set as 0, 0.25, or 0.50, respectively. The base population size of popA, popB and popC vary from 500, 1,000 to 2,000. The popA and popB are crossed to produce the popAB, and each pair of parents is assumed to have four offspring with the sex ratio 1:1. The top 500 individuals are selected based on genotype scores for the next generation. The initial population size of popC is set as 2×N, the top 500 of popAB and popC are crossed to produce popABC. Then each pair of parents are randomly intercrossed to produce the subsequent generations until three target genes are pyramided into an ideal individual.
IIII represents pyramiding four target genes from popA, popB, popC and popD, A1/A2/A3/A4 denotes favorite alleles frequencies in the first/second/third/fourth loci in the popA, B1/B2/B3/B4 denotes favorite allele frequencies in the first/second/third/fourth loci in the popB, C1/C2/C3/C4 denotes favorite allele frequencies in the first/second/third/fourth loci in the popC, D1/D2/D3/D4 denotes favorite allele frequencies in the first/second/third/fourth loci in the popD. The base population sizes (N) are set as 500 and 1,000, respectively. Other breeding parameters are as the same as schemes II and III. For four population cascading cross, denoted IIII.C (
Figure 1c), the base population size of popA, popB, popC and popD are N, N, 2×N and 4×N, PopA and popB are crossed to produce popAB, the top 500 of popAB cross with popC to produce population popABC, than the top 500 of popABC cross with popD to produce popABCD. For four population symmetric cross, denoted IIII.S (
Figure 1d), the base population size of popA, popB, popC and popD are N, N, N and N respectively, PopA and popB are crossed to produce popAB, and popC and popD are crossed to produce popCD, then the top 500 of popAB cross with the top 500 of popCD to produce popABCD in the next generation. Each pair of parent is assumed to produce four offspring with the sex ratio 1:1. In the population PopAB, PopCD, popABCD, individuals are selected based on genotypic scores or phenotypic values, the top 500 individuals are selected as the parents, the selected parents are randomly intercrossed in the subsequent generations until the four target gene are pyramided into an ideal individual.
In this study, we designed four types of cross programs, the base population size and initial favorite allele frequency are set at different levels in each cross program, and trait heritability is also considered in phenotypic selection. The gene pyramiding generation, population hamming distance and the superior genotype frequency are used to measure the process of gene pyramiding breeding. We performed Monte Carlo simulation for each cross scheme, and simulations are repeated 1,000 times. Our computer programs are implemented via Matlab and run on the Inter(R) Core(TM) 2 Duo CPU. Microsoft Windows XP.
DISCUSSION
Our studies provide a new insight into the pyramiding of multiple genes into a single genotype from evolutionary perspectives. The objective of gene pyramiding breeding is to improve the trait for an entire population by selecting the most optimal genotype combinations. Evolutionary computation (
David, 1989;
John, 1992) is most appropriate for studying the combinatorial optimization of genotypes. As for gene pyramiding breeding, we assumed a complex trait was controlled by a series of major genes, and gene pyramiding aimed to select individuals with the optimal genotype combination to realize the optimization of a target economic trait. Inspired by the science of evolutionary computation (
David, 1989), we used the metaphor of hill-climbing to model the dynamic behavior of gene pyramiding and to build the connection between gene pyramiding and evolutionary computation.
Servin et al. (2004) designed the algorithm for the theory of marker-assisted gene pyramiding based on probability and statistics. They calculated gene transmission probabilities through a pedigree and minimum population sizes necessary to obtain the individual with the ideal genotype.
Zhao et al. (2009) extended these theories to design some representative gene pyramiding schemes in animals by taking their reproductive capacity into account. However, their studies made some simplifying assumptions that the genotype of founding parents was homozygous for the favorable allele at each target locus. The assumptions are suitable for laboratory animals rather than farm animals. In practice, animal breeding populations are segregating populations. Therefore, our studies start the base population with various levels of favorite allele frequencies at each target locus. Allele frequencies are set to be 0, 0.25, 0.5 to represent zero, low and medium allele frequency levels in the base population, and it is possible to study gene pyramiding from an arbitrary population given the variable allele frequencies and population sizes.
Servin et al. (2004) and
Zhao et al. (2009) described their framework for the design of gene pyramiding by computing the minimum population sizes necessary to obtain the ideal single genotype. The design of these strategies is from an ideal genotype of offspring to minimum population sizes of the base population. From the opposite perspective, our studies predict the offspring genotype by simulating the process of gene pyramiding breeding, given the specialized base populations. Our strategies can be used to integrate various populations (including population size and favorite allele frequency) and different selection strategies.
In comparison with plants, the difficulties in conducting gene pyramiding in animals come from the lower fertility and longer generation intervals. With the development of animal genome projects and new reproduction technologies (artificial insemination and super ovulation), it is possible to produce a large enough number of offspring carrying superior genetic information in each generation to facilitate the selection of subsequent generations. For the sake of demonstration, our studies use discrete recombination to produce offspring with the various recombination types possible for gene pyramiding studies from parental genotypes. Discrete recombination is the basic genetic operator in evolutionary computation; therefore, it is used for the studies of gene pyramiding in order to keep consistency with evolutionary computation.
In order to investigate two-genes, three-genes, four-genes pyramiding, we designed four types of cross programs, II, III, IIII-S and IIII-C, which may represent the general demand in farm animal breeding. There are two target genes segregating in the population for program II, three target genes segregating for program III. As to program IIII-S and IIII-C, there are four target genes segregating in the population.
Using genotypic selection, the results produced from the simulation of four types of gene pyramiding breeding programs indicate that initial favorite allele frequencies are the most important factor affecting the process of gene pyramiding, rather than the population size, but the larger population size increases the possibility of selecting top individuals as parents at the first generation. As for the two-genes and three-genes pyramiding, initial allele frequency and population size do not have a significant influence on the schemes design of gene pyramiding, but for three gene and four-genes pyramiding, the hybrid parents order must be considered in our schemes design. In four genes pyramiding, our studies show that three generation needed to gain the popABCD (
Figure 1c), and only two generations needed using the symmetric cross programs (
Figure 1d). For symmetric cross program, it was not necessary to consider the cross order because of the particularly symmetric cross structure. But in a cascading cross program, parent cross order is shown to be not very important factor affecting the gene pyramiding breeding.
In addition to genotypic selection strategy, we also investigated the phenotypic selection strategy as many economic traits of animals are quantitative traits, controlled by several major QTL. The difference between the phenotypic and genotypic selection is selection criterion, genotypic selection based on genotypic score and phenotypic selection based on phenotypic value predicted from a genotype -phenotype model. We use two selection strategies in the consideration of different character of target genes and the trait heritability.
Some geneticists think that traditional mass selection strategies also results in gene pyramiding. Phenotypic selection strategy is used to investigate a target gene controlling a quantitative trait, and moreover, we compare the gene process of gene pyramiding using genotypic selection and phenotypic selection. Initial favorite allele frequencies greatly affect the process of gene pyramiding breeding using phenotypic selection, and another important factor is the trait heritability. From the
Figure 2,
3,
4 and
5, we can conclude that for trait with high heritability, gene pyramiding breeding using a phenotypic selection strategy needs fewer generations, while more generations are needed when considering a low heritability trait. In order to achieve gene pyramiding successfully, a breeder should select from a large size base population with high favorite allele frequencies. In phenotypic selection, we set trait heritability to 1, which is equivalent to genotypic selection derived from the formula (
3). The results indicate that genotypic selection is superior for gene pyramiding than phenotypic selection. Design of a cross scheme should concern the initial favorite allele frequency, cross order and the trait heritability. Trait heritability is the main factor affecting the effective gene pyramiding breeding for the quantitative traits. When the genotypic value is preset, trait heritability would have a direct impact on the average phenotypic value predicted by the model and would finally affect the process of gene pyramiding. As to the trait with a larger heritability, the dominant components in the model are the gene effects, so gene pyramiding breeding would be a process of selecting individuals with the optimized genotype combination over generations.
In this paper, genotypic selection and phenotypic selection ignored gene-gene interactions and gene-environment interactions. The current strategies for revealing the genetic basis of complex traits are to carry out a genome wide association studies (
Wang et al., 2005;
McCarthy et al., 2008;
Moore et al., 2010), which would supply us with a amount of genetic information and finally help us to build the precise selection model considering the complex relationship between genotype and phenotype.
The limitation of gene pyramiding in animals is due to the generation intervals and reproductive capability, especially to animals (dairy or beef cattle) with the long generation intervals and low fertility. In our studies, we suppose the potential advantages of gene pyramiding can be applied to any farm animal, but from a practical point of view it may be a challenge.
Our studies made some simplifying assumptions that the animal population is a segregating population and there exist several favorable target genes in different populations. If the multi-tier system (population) meets these assumptions in our studies, we can predict the process of gene pyramiding considering different strategies. Our studies did not take in consideration the positions of most genes, because the location of those genes can be detected through PCR technology. Some examples of gene pyramiding successfully applied can be found in plant breeding. In practice, the position of most genes may be not the key point, how to chose the target gene or linked markers and how to perform selection are of greater importantance.
Our studies provide a flexible simulation platform for exploring gene pyramiding breeding using genotypic selection and phenotypic selection. Base population sizes and the initial favorite allele frequencies can be set at various levels. The results presented by population hamming distance, superior allele frequency and average phenotypic value would provide some theoretical reference for the breeding practice. Further studies can be conducted to build and compare different cross programs and selection strategies.
As to marker-assisted gene pyramiding breeding, how to design the optimal genotype combinations through different cross schemes and selection strategies would have great practical significance. Animal breeders will be eager to design the optimal cross scheme and selection strategy. We hope that breeding by design would be realized through the collaboration of biologists, bioinformatics and breeding scientists with the aid of powerful computer technology and user-friendly software.