Whole Genome Association Study to Detect Single Nucleotide Polymorphisms for Behavior in Sapsaree Dog (Canis familiaris)

The purpose of this study was to characterize genetic architecture of behavior patterns in Sapsaree dogs. The breed population (n = 8,256) has been constructed since 1990 over 12 generations and managed at the Sapsaree Breeding Research Institute, Gyeongsan, Korea. Seven behavioral traits were investigated for 882 individuals. The traits were classified as a quantitative or a categorical group, and heritabilities (h2) and variance components were estimated under the Animal model using ASREML 2.0 software program. In general, the h2 estimates of the traits ranged between 0.00 and 0.16. Strong genetic (rG) and phenotypic (rP) correlations were observed between nerve stability, affability and adaptability, i.e. 0.9 to 0.94 and 0.46 to 0.68, respectively. To detect significant single nucleotide polymorphism (SNP) for the behavioral traits, a total of 134 and 60 samples were genotyped using the Illumina 22K CanineSNP20 and 170K CanineHD bead chips, respectively. Two datasets comprising 60 (Sap60) and 183 (Sap183) samples were analyzed, respectively, of which the latter was based on the SNPs that were embedded on both the 22K and 170K chips. To perform genome-wide association analysis, each SNP was considered with the residuals of each phenotype that were adjusted for sex and year of birth as fixed effects. A least squares based single marker regression analysis was followed by a stepwise regression procedure for the significant SNPs (p<0.01), to determine a best set of SNPs for each trait. A total of 41 SNPs were detected with the Sap183 samples for the behavior traits. The significant SNPs need to be verified using other samples, so as to be utilized to improve behavior traits via marker-assisted selection in the Sapsaree population.


INTRODUCTION
Among the domestic animals, the dog takes a significant position in human society due to its very intimate behavioral exhibition with humans. Moreover, the dog is one of the most diverse domestic species in terms of morphology and behavior (Wayne and Ostrander, 1999). Although each breed shows general uniformity for behavior and morphology, individuals within a breed have diverse heritages because of different haplotypes related to the traits (Vila et al., 1997), which can be supported by analysis of protein alleles (Ferrell et al., 1978), as well as with hypervariable microsatellite loci (Fredholm and Wintero, 1995).
Currently, there are more than 400 registered dog breeds around the world that have been bred for various purposes, e.g. hunting, guarding, guides for blind, pets, etc. The Sapsaree is one of the aboriginal breeds in Korea, with medium body size and body height ranging 49 to 55 cm (Kim et al., 2001). Adult coat hair is long and abundant with two typical variations in color, i.e. blue and yellow. The Sapsaree dogs are very gentle, protective and loyal to their owner. Generally, the dogs are not aggressive, but express aggression, if other dogs enter their territory. The Sapsaree population was close to extinction during Japanese colonization. Afterwards, in 1986, eight individuals with similar color and body conformation to Sapsaree breed were collected across the country by local Sapsaree lovers in Daegu, Korea. Successively, systematic mating and reproduction generated the current population of about 3,000 individuals including five hundred at the Sapsaree Breeding Research Institute in Gyeongsan, Gyeongbuk province.
Recent sequencing technologies allowed the sequencing of the entire canine genome (Eggen, 2012), which provides abundant genetic resources to search for genetic variants underlying diseases and phenotypes such as body size (Patterson et al., 1982;Ostrander et al., 1997;Galibert et al., 1998). Whole genome association (WGA) studies are now routinely practiced due to the availability of high density canine single nucleotide polymorphism (SNP) chips such as the Illumina SNP arrays (Karlsson and Lindblad-Toh, 2008;Spady and Ostrandar, 2008). Further, most dog breeds are less than 200 years old, and, thus, have high linkage disequilibrium (LD) and long haplotype blocks (Sutter et al., 2004), which enables WGA study with lower density marker maps, compared to humans (Sutter and Ostrander, 2004;Karlsson and Lindblad-Toh, 2008). According to Karlsson and Lindblad-Toh (2008), about 15,000 SNPs proved to be sufficient for WGA mapping. After the first report of a WGA study in a dog by Karlsson et al. (2007), numerous WGA studies in dogs have been carried out for disease traits. However, there were few reports about GWA analysis on morphology and behavior.
The objectives of this study were to genetically characterize Sapsaree breed and to detect SNPs or quantitative trait loci (QTL) for various behavior traits by WGA analysis in the Sapsaree population.

Pedigree and phenotypes
The pedigree of the Sapsaree population (n = 8,256) was constructed from 1989 to 2007. A set of 1,014 individuals were recorded for morphology and behavior traits. The behavior tests were first introduced in 1998 and carried out once a year. Each animal was tested only once for nerve stability (NST), affability (AFB), wariness (WRN), adaptability (ADP), sharpness (SRP), activity (ACT) or energy level or temperament and reactions during blood drawn (RBD). Behavior definitions and measurements are described in Tables 1 and 2. All the individuals passed through a sufficient adjustment period before they underwent behavior tests and, the behavior measures were recorded mainly by one experimenter. Dog's behaviors were graded according to intensities, for which the lowest to the greatest scores reflected the least to the most desired expression, respectively.

Variance component estimation
The behavior traits that were recorded at hedonic scales were analyzed with either quantitative or continuous measurements. For Some behavior traits, the records were transformed to binary pattern, i.e. absence (0) or presence (1). To do that, two or more categories were abridged together (Table 2). Sex, season of birth (summer and winter), birth year groups (10 levels) and age at testing (<15, 20, 25, and, >25 months) were considered as fixed factors for analysis. Then, the SAS general linear model (GLM) procedure (Release 9.1, SAS Institute Inc., NC, USA, 1999) was applied to determine the fixed effects to be fitted in the Animal model at α = 0.1 level.
A restricted maximum likelihood approach was applied  (NST) 1 The appropriateness of the dog's reaction to a certain situation. This includes the dog's ability to adapt to various types of situations, to concentrate when highly aroused or in a situation of conflict, as well as its ability to relax and to overcome a frightening situation. Affability (AFB) 1 The dog's willingness to make contact with people.

Wariness (WRN)
The dogs' approach to be cautious about the probable upcoming threat. Adaptability (ADP) Coping ability of the dog's with changes in the physical or natural environment. Activity/energy level/temperament (ACT) 1 The degree of liveliness. Dogs with high activity or temperament are more responsive to all types of stimuli. 2 In other words, temperament is the physical flexibility and intensity of reactions to environment with a big radius of action. Sharpness (SRP) 2 The ability to react in an aggressive way towards a serious or serious looking attack. The sharpness is desired only when the dog is threatened. After the end of the threatening period, the dog has to be calm down immediately and has to be friendly towards the participating people and the judge. This kind of sharpness is defined as desired sharpness. Reaction during blood drawn (RBD) The dog's reactions while they are brought under treatment such as reactions after seeing needles during collection of blood from the animal's body etc. * The traits were originally scored from high to low but were inverted for all traits to have a more intuitive score from low to high. 1 Adapted from Wilsson and Sundgren (1997). 2 Adapted from Ruefenacht et al. (2002). under the Animal model using ASReml 2.0 software (Gilmour et al., 2001). A univariate analysis for each trait with quantitative measurements was conducted to estimate variance components, direct and maternal heritability (Equation 1) Where, Y = phenotype, μ = overall mean, s i = effect of sex, sb j = effect of season of birth, yb k = effect of year of birth; ta l = effect of age at testing, a m = animal effect with N(0, Aσ 2 d ), where A = the additive relationship matrix; m n = maternal genetic effect with N(0, Aσ 2 m ); e ijklmn = the random error term with N(0, Iσ 2 e ), where I = the identity matrix. σ 2 d , σ 2 m , and σ 2 e are direct and maternal genetic and residual (error term) variances, respectively. No interaction (covariance component) between maternal and direct effects was assumed. Direct and maternal heritabilities were defined as . An alternative model was applied without the maternal effect (Equation 2), in which the overall heritability was: Under the model, phenotypic and genetic correlations were calculated using ASReml 2.0, according to Falconer and Mackay (1996). Multivariate analyses were performed to estimate genetic and environmental correlations between the behavior traits. The behavior traits with binary data formats were also modeled with a generalized linear mixed model (GLMM) using '!logit' function and then fitted under an Animal model (Equation 1). The heritabilities for categorical traits with binary scale (0 or 1) were estimated by the Equation 3 (Lynch and Walsh, 1998): variance on the observed scale. However, a multivariate analysis with this binary data set was not available due to the lack of fit for appropriate logistic models, when ASReml 2.0 was applied. Thus, correlation estimates between the traits only with quantitative measures were obtained.

Genome-wide association test
Genotyping was performed using two types of canine SNP chips, i.e. CanineSNP20 BeadChip and CanineHD BeadChip. The former and the latter contained more than 22,000 and 170,000 evenly spaced and validated SNPs derived from the CanFam2.0 assembly. Initially, a total of 134 individuals were genotyped by CanineSNP20 BeadChip using the Illumina's Infinium Assay. Another 60 individuals were genotyped with CanineHD BeadChip using the Infinium HD Assay Protocol (Illumina Inc., San Diego, CA, USA). The genotyping analyses were performed at GeneSeek (Ltd.) and the SNP data were

obtained.
A thorough screening of the SNP were performed on three different data; Sap134, Sap60, and Sap183, of which the first two data sets were based on CanineSNP20 and CanineHD BeadChip panels, respectively, and the last set was derived from the SNPs that were embedded on both SNP chips. The genotyped animals were excluded if any particular genotype(s) was entirely missed. No SNP with <0.05 minor allele frequency or with more than 10% missing genotype was included in the following association tests. The genome-wide association tests were carried out mainly on 38 autosomes with the Sap183 data set, because of its greater sample size compared to the Sap60 and Sap134.
For association studies with quantitative traits, sex and year of birth was fitted as fixed factors, and the residuals for each trait were obtained in the process. A generalized linear model procedure was implemented using PROC GLM in SAS statistical software (SAS version 9.1, SAS Inst., Inc., Cary, NC, USA) to calculate the error variances.
In a random mating population with no population structure the association between a marker and a trait can be tested with single marker regression as: Y = 1 n +β a X a +β d X d +e Where, Y is a vector of phenotypes residuals for any given trait, 1 n is a vector of 1s, X a and X d are coefficient matrices allocating records to the additive and dominance effects of the SNP, respectively, β a and β d are the respective coefficients of the marker effect, and e is a vector of random deviates, N(0, σ e 2 ), where σ 2 e is the error variance. For X a (X d ), 1, 0, and -1 (0, 1, 0) were assigned for the SNP genotypes, AA, AB, and BB, respectively. The null hypothesis is that the marker has no effect on the trait, while the alternative hypothesis is that the marker does affect the trait, due to LD of the SNP with a QTL.
The variation explained by each SNP (S 2 SNP ) was , where i denotes each genotype, α i is allele substitution effect (= -a, d, and +a for BB, AB, and AA, respectively, in which a and d were estimated from the simple regression analysis for the SNP), f i is the frequency of i th genotype, μ is the population mean that can be expressed as (f AA -f BB )a+f AB d (Falconer and Mackay, 1996). Proportion of phenotypic variance due to the SNP was then estimated as S 2 SNP /S 2 P , in which S 2 P was obtained from residual values of the trait after adjusting fixed effects. Therefore, the estimate of the proportion of phenotype variance due to all of the significant SNPs was ∑S 2 SNPi /S 2 P . To determine significant SNP for the behavior traits, a significance threshold of 1% point-wise p value from F distribution was applied for each trait. Then, a best set of significant SNPs were selected from the SNPs using a stepwise regression procedure (Neter et al., 1990), due to the fact that some of the significant SNPs, if closely linked to each other (LD), would yield redundant information in implementing a marker assisted selection program. Inclusion and exclusion of each SNP out of the stepwise model was determined at 0.001 level.

Heritability and correlations between behavior traits
Heritability estimates (h 2 ) for behavior traits were presented in Table 3 and 4. The h 2 estimates fell within a  Famula (2001) reported that heritability of dog behaviors were low to high (0.10 to 0.60) depending on the characters and breeds. Wilsson and Sundgren (1997), Van der Waaij et al. (2008), and Ruefenacht et al. (2002) reported a low to medium h 2 for NST (0.15 to 0.25), AFB (0.03 to 0.38), and SRP (0.09 to 0.19) of dogs. For ACT trait studied by Wilsson and Sundgren (1997) and related behavior traits such as temperament by van der Waaij et al. (2008) and Ruefenacht et al. (2002) and energy by Bartlett (1976), the h 2 s ranged from 0.05 to 0.53. Reuterwall and Ryman (1973) reported h 2 s of 0.09 to 0.17 for AFB and 0.00 to 0.04 for ADP, respectively. Goddard and Beilharz (1982) estimated h 2 of 0.10 for suspicion in Labrador retrievers. Our estimates were, in general, concordant with the above reports. Ruefenacht et al. (2002) reported that h 2 estimates under quantitative measures for behaviors were lower or similar to those of categorical estimates, which was consistent with this study (Tables 3 and 4). Wilsson and Sundgren (1997) reported small or negligible maternal effects for behaviors in old dogs. Our results showed that, in general, the maternal effects for behavior with quantitative measures were small, except for RBD (Table 3), but were moderate for AFB, ADP, and RBD with binary values (Table 4).
Genetic correlations (r G ) were, in general, greater than phenotypic correlations (r P ) ( Table 5). Moderate positive r P s were observed between NST and AFB (0.68), between NST and ADP (0.6) and between AFB and ADP (0.62). There were strong genetic correlations between all the behavior traits, except for RBD (Table 5). More specifically, NST, AFB, WRN, and ADP had strong genetic correlation, while RBD had negative genetic correlations with other behavior traits, e.g. with ACT (-0.52). Ruefenacht et al. (2002) reported positive r P s (0.31 to 0.57) and r G s (0.34 to 0.83) among NST, SRP, and temperament (or ACT in this study) in German Shepherd dogs. Van der Waaij et al. (2008) also reported a range from 0.12 to 0.36 (r P ), and -0.51 to 0.64 (r G ) among the comparable traits (i.e., NST, AFB, SRP, and ACT) in this study, with the greatest r P (0.36) between NST and temperament, and the greatest r G (0.64) between NST and AFB in GS dogs, which was similar to the results in this study (Table 5).

Significant single nucleotide polymorphisms for behavior by whole genome association studies
For the Sap183 data, a set of 15,825 SNPs were chosen after the quality control tests. The number of available SNPs on each chromosome was proportionate to its chromosomal length, e.g. 865 SNPs on CFA1 and 199 SNPs on CFA38, respectively (results not shown). The physical map of SNPs spanned about 2,181 Mbps, with an average distance of 137.8±147.4 Kb between adjacent SNPs.
A total of 41 significant SNPs for the seven behavior traits were determined from the stepwise regression analyses ( Table 6). The set of SNPs for each trait explained a large proportion of total phenotypic variance (29% to 67%), partly due to the fact that the SNP effect was overestimated with the small sample size in this study (n = 183). In general, the SNP effects for the behavior measures had both additive and dominance effects. However, WRN and SRP had mainly additive and dominance effects, respectively ( Table 6). The SNPs were distributed across the canine genome, i.e. 24 chromosomes (CFA). The greatest number (6) of SNPs was detected on CFA18, among which four SNPs for NST, AFB, ADP, and ACT were located at 23 or 26 Mbs (Table 6). The significant SNPs in this study need to be verified using other samples, so as to be utilized to improve behavior traits via marker-assisted selection in the Sapsaree population.