### INTRODUCTION

### MATERIALS AND METHODS

### Animals and phenotypes

### Genotyping and SNP quality

### Statistical analysis

*var(g-)*will be re-estimated each time when a chromosome is excluded from calculating the genomic relationship matrix (GRM) and e is the residual. For the ease of computation, the genetic variance,

*var(g-)*, is estimated based on the null model i.e. y = a + g- + e and then fixed while testing for the association between each SNP and the trait. We thereby used a designed option called ‘mlma-loco’ for the association analysis as implemented in the genome-wide complex trait analysis (GCTA) tool [14].

_{10}(

*P*) genome-wide association plots (Manhattan plots) and quantile-quantile (QQ) plots. Manhattan plot represented the p values on the y-axis against genomic order by chromosome and position on the x-axis. Because of local correlation of the genetic variants, arising from infrequent genetic recombination, groups of significant p values tend to rise up high on the Manhattan plot. The QQ plot was used to assess the number and magnitude of observed associations between genotyped SNPs and trait under study, compared to the association statistics expected under the null hypothesis of no association [15]. Observed association statistic (−log

_{10}) p-values were ranked in order from smallest to largest on the y-axis and plotted against the expected values from a theoretical

*x*

^{2}-distribution under the null hypothesis of no association on the x-axis of the QQ plot. Deviations from the identity line suggested either that the assumed distribution was incorrect or that the sample contained values arising in some other manner, such as by true association. Since the underlying assumption in GWA studies is that the clear majority of assayed SNPs are not associated with the trait, strong deviations from the null suggest either a very highly associated and excessively genotyped locus (i.e. an associated gene with many genotyped SNPs) or spurious associations. On the other hand, very little deviation from the expected values could be attributed to cryptic relatedness in the study population, in the genotyping, or either true association.

**y**is an n×1 vector of phenotypes, n is the sample size,

**β**is a vector of fixed covariate herd-year season (HYS) and eigenvectors from principal component analyses (PCAs). The HYS had 4 levels of season (winter, summer, spring, and fall) since 2007 to 2014. Individual GRMs were used to calculate the first 20 eigenvalues and eigenvectors and the eigenvector of the covariance matrix was subsequently used to calculate the PCA (PC1 and PC2), where U is a vector of SNP effects with

**u**~N (0,

**I**σ

^{2}

_{u}),

**I**is an nxn identity matrix, and ɛ is a vector of residual effects with ɛ ~N (0,

**I**σ

^{2}

_{ɛ}).

**W**is a standardized genotype matrix with the ij

^{th}element

_{ij}(coded as 0, 1, or 2) is the number of copies of the reference allele for the i

^{th}SNP of the j

^{th}individual, and p

_{i}is the frequency of the reference allele of SNP i. If we define

**A**=

**WW**

**′**/N, and define σ

^{2}

_{g}as the variance explained by all the SNPs, i.e., σ

^{2}

_{g}= Nσ

^{2}

_{u}, with N being the number of SNPs then the equation I will be equivalent to

^{2}

_{g}), and A is interpreted as the GRM between individuals with its kjth element being

^{2}

_{g}by the restricted maximum likelihood (REML) approach, relying on the GRM estimated from all SNPs, and the model in general is represented by the following equation:

_{i}is a vector of random genetic effects, which could be the total genetic effect for the whole genome or for a single chromosome. In this model, the phenotypic variance (σ

^{2}

_{p}) is partitioned into the variance explained by each of the genetic factors and the residual variance,

### Genome partitioning

^{2}) was estimated by generating one GRM for each chromosome using only SNPs on that chromosome and fitting this GRM alongside a second GRM based on all genome-wide SNPs apart from those on that chromosome. A likelihood- ratio test (LRT) was then carried out between the full model and the model excluding the chromosome-specific component. The threshold of an LRT would depend both on the log-likelihood values as well as parameter numbers of the two models. In case of positive LRT (>10) values the generated probability is often large and conclusive.

### RESULTS AND DISCUSSION

### Genome-wide association

*VRTN*(vertebrae development associated). This gene had been suggested to be related to the total number of vertebrae, a trait correlated with the total number of teats [20]. Additionally, rs81476910 and rs81405825 were found within the reproduction QTL (140,282,531 to 148,491,826 bp) on SSC8 for pig ovary weight. On the other hand, rs81332615 were seen nearby the QTLs for total number born alive (13,012,892 to 13,038,269 bp) and age at puberty (9,843,648 to 45,496,370 bp) on SSC13 for pig reproduction traits (http://www.animalgenome.org/cgi-bin/QTLdb/SS/download?file=gbpSS_10.2). Therefore, significant SNPs along with the genomic regions derived from our genome-wide association analysis should undergo for further studies to explore the genomic underpinnings of total number of teats in Yorkshire pigs.