Genomic selection refers to the use of genomewide dense markers for breeding value estimation and subsequently for selection. The main challenge of genomic breeding value estimation is the estimation of many effects from a limited number of observations. Bayesian methods have been proposed to successfully cope with these challenges. As an alternative class of models, non- and semiparametric models were recently introduced. The present study investigated the ability of nonparametric additive regression models to predict genomic breeding values. The genotypes were modelled for each marker or pair of flanking markers (i.e. the predictors) separately. The nonparametric functions for the predictors were estimated simultaneously using additive model theory, applying a binomial kernel. The optimal degree of smoothing was determined by bootstrapping. A mutation-drift-balance simulation was carried out. The breeding values of the last generation (genotyped) was predicted using data from the next last generation (genotyped and phenotyped). The results show moderate to high accuracies of the predicted breeding values. A determination of predictor specific degree of smoothing increased the accuracy.
Summary Using genome‐wide SNP data, we calculated genomic inbreeding coefficients ( F ROH > 1 Mb , F ROH > 2 Mb , F ROH > 8 Mb and F ROH > 16 Mb ) derived from runs of homozygosity ( ROH ) of different lengths (>1, >2, >8 and > 16 Mb) as well as from levels of homozygosity ( F HOM ). We compared these values of inbreeding coefficients with those calculated from pedigrees ( F PED ) of 1422 bulls comprising Brown Swiss (304), Fleckvieh (502), Norwegian Red (499) and Tyrol Grey (117) cattle breeds. For all four breeds, population inbreeding levels estimated by the genomic inbreeding coefficients F ROH > 8 Mb and F ROH > 16 Mb were similar to the levels estimated from pedigrees. The lowest values were obtained for Fleckvieh ( F PED = 0.014, F ROH > 8 Mb = 0.019 and F ROH > 16 Mb = 0.008); the highest, for Brown Swiss ( F PED = 0.048, F ROH > 8 Mb = 0.074 and F ROH > 16 Mb = 0.037). In contrast, inbreeding estimates based on the genomic coefficients F ROH > 1 Mb and F ROH > 2 Mb were considerably higher than pedigree‐derived estimates. Standard deviations of genomic inbreeding coefficients were, on average, 1.3–1.7‐fold higher than those obtained from pedigrees. Pearson correlations between genomic and pedigree inbreeding coefficients ranged from 0.50 to 0.62 in Norwegian Red (lowest correlations) and from 0.64 to 0.72 in Tyrol Grey (highest correlations). We conclude that the proportion of the genome present in ROH provides a good indication of inbreeding levels and that analysis based on ROH length can indicate the relative amounts of autozygosity due to recent and remote ancestors.
Table S6. Biological networks associated with top diseases and functions that were overrepresented among microRNAs differentially expressed between bovine monocyte-derived macrophages infected with Streptococcus agalactiae strains ST103 or ST12, and the respective uninfected controls. (XLSX 10 kb)
Table S4. The lists of microRNAs identified in blood monocyte-derived macrophages infected in vitro with live Streptococcus agalactiae strains ST103 and ST12, LPS, and uninfected (controls), respectively. Mean reads number normalized across all samples were calculated using DESeq2. (XLSX 71 kb)
In dairy cattle, current genomic predictions are largely based on sire models that analyze daughter yield deviations of bulls, which are derived from pedigree-based animal model evaluations (in a two-step approach). Extension to animal model genomic predictions (AMGP) is not straightforward, because most of the animals that are involved in the genetic evaluation are not genotyped. In single-step genomic best linear unbiased prediction (SSGBLUP), the pedigree-based relationship matrix A and the genomic relationship matrix G are combined in a matrix H, which allows for AMGP. However, as the number of genotyped animals increases, imputation of the genotypes for all animals in the pedigree may be considered. Our aim was to impute genotypes for all animals in the pedigree, construct alternative relationship matrices based on the imputation results, and evaluate the accuracy of the resulting AMGP by cross-validation in the national Norwegian Red dairy cattle population. A large-scale national dataset was effectively handled by splitting it into two sets: (1) genotyped animals and their ancestors (i.e. GA set with 20,918 animals) and (2) the descendants of the genotyped animals (i.e. D set with 4,022,179 animals). This allowed restricting genomic computations to a relatively small set of animals (GA set), whereas the majority of the animals (D set) were added to the animal model equations using Henderson’s rules, in order to make optimal use of the D set information. Genotypes were imputed by segregation analysis of a large pedigree with relatively few genotyped animals (3285 out of 20,918). Among the AMGP models, the linkage and linkage disequilibrium based G matrix (G LDLA0 ) yielded the highest accuracy, which on average was 0.06 higher than with SSGBLUP and 0.07 higher than with two-step sire genomic evaluations. AMGP methods based on genotype imputation on a national scale were developed, and the most accurate method, GLDLA0BLUP, combined linkage and linkage disequilibrium information. The advantage of AMGP over a sire model based on two-step genomic predictions is expected to increase as the number of genotyped cows increases and for species, with smaller sire families and more dam relationships.
The success of conservation of genetic variability and/or prediction of breeding values by genomic selection is based on the existence of LD between SNP markers and the QTVs. The existing LD is the result of several driving forces acting in each population along their history. Commercial SNP panels have been designed based on the genomic information of a reduced number of breeds. Within the Gen2Farm project framework, communalities and singularities of LD of 12 breeds from 8 countries were used to improve the design of existing SNP panels to fulfill the conservations and/or breeding needs in a breed and/or multibreed context. We analyzed the genomic information provided by the Illumina's BovineHD Beadchip of a total of 1534 individuals from: Asturiana de los Valles (AST, N = 75), Avileña-Negra Ibérica (ANI, N = 72), Brown Swiss (BS, N = 418), Bruna del Pirineus (BP, N = 75), Fleckvieh (Fl, N = 317), Guersey (GUE, N = 28), Morucha (Mo, N = 75), Norwegian Red (NR, N = 100), Pirenaica (Pi, N = 72), Retinta (Re, N = 72), Rubia Gallega (RG, N = 72) and Simmental (Si, N = 158). After editing, 604,551 phased SNP markers per animal were available for the analysis. LD matrices were obtained for each breed-chromosome (348 in total). TagSNPs defined in terms of independency and representativeness, were obtained by a graphical clustering algorithm. After setting aside singletons in each chromosome-breed, a minimum set of TagSNPs along the genome was obtained by maximizing the distance between TagSNPs and minimizing the distance between TagSNPs (centers of clusters) and markers of the same LD block. This was so provided some threshold values obtained from the empirical distribution of the LD values. Communalities and connectivity as a measure of the ratio of the number of tight links present to the maximum number possible were calculated. Connectivity varied between 0.00018 and 0.0013 for the first chromosome in AST and the chromosome 21 in RE, respectively. All breeds shared a total of 17,720 TagSNPs, with values ranging from 421 TagSNPs in chromosome 25 to 1130 in chromosome 19. Moreover, there was also a high number of private TagSNPs present only in one breed, ranging from 1225 in chromosome 21 to 5827 in chromosome 1. Finally singletons were incorporated to the set of identified TagSNPs. Singletons represented more than 50% of TagSNPs in most cases. However, as the LD between singletons and QTVs is unknown, the maintenance of singletons in the SNP array may be considered as a choice to prevent loosing information.
Genomic selection uses genome-wide dense SNP marker genotyping for the prediction of genetic values, and consists of two steps: (1) estimation of SNP effects, and (2) prediction of genetic value based on SNP genotypes and estimates of their effects. For the former step, BayesB type of estimators have been proposed, which assume a priori that many markers have no effects, and some have an effect coming from a gamma or exponential distribution, i.e. a fat-tailed distribution. Whilst such estimators have been developed using Monte Carlo Markov chain (MCMC), here we derive a much faster non-MCMC based estimator by analytically performing the required integrations. The accuracy of the genome-wide breeding value estimates was 0.011 (s.e. 0.005) lower than that of the MCMC based BayesB predictor, which may be because the integrations were performed one-by-one instead of for all SNPs simultaneously. The bias of the new method was opposite to that of the MCMC based BayesB, in that the new method underestimates the breeding values of the best selection candidates, whereas MCMC-BayesB overestimated their breeding values. The new method was computationally several orders of magnitude faster than MCMC based BayesB, which will mainly be advantageous in computer simulations of entire breeding schemes, in cross-validation testing, and practical schemes with frequent re-estimation of breeding values.
With the availability of high-density marker maps and cost-effective genotyping, genomic selection methods may provide faster genetic gain than can be achieved by current selection methods based on phenotypes and the pedigree. Here we investigate some of the factors driving the accuracy of genomic selection, namely marker density and marker type (i.e., microsatellite and SNP markers), and the use of marker haplotypes versus marker genotypes alone. Different densities were tested with marker densities equivalent to 2, 1, 0.5, and 0.25N(e) markers/morgan using microsatellites and 8, 4, 2, and 1N(e) markers/morgan using SNP, where 1N(e) markers/morgan means 100 markers per morgan, if effective size (N(e)) is 100. Marker characteristics and linkage disequilibria were obtained by simulating a population over 1,000 generations to achieve a mutation drift balance. The marker designs were evaluated for their accuracy of predicting breeding values from either estimating marker effects or estimating effects of haplotypes based upon combining 2 markers. Using microsatellites as direct marker effects, the accuracy of selection increased from 0.63 to 0.83 as the density increased from 0.25N(e)/morgan to 2N(e)/morgan. Using SNP markers as direct marker effects, the accuracy of selection increased from 0.69 to 0.86 as the density increased from 1N(e)/morgan to 8N(e)/morgan. The SNP markers required a 2 to 3 times greater density compared with using microsatellites to achieve a similar accuracy. The biases that genomic selection EBV often show are due to the prediction of marker effects instead of QTL effects, and hence, genomic selection EBV may need rescaling for practical use. Using haplotypes resulted in similar or reduced accuracies compared with using direct marker effects. In practical situations, this means that it is advantageous to use direct marker effects, because this avoids the estimation of marker phases with the associated errors. In general, the results showed that the accuracy remained responsive with small bias to increasing marker density at least up to 8N(e) SNP/morgan, where the effective population size was 100 and with the genomic model assumed. For a 30-morgan genome and N(e) = 100, this implies that about approximately 24,000 SNP are needed.
The objective of this study was to compare partial least squares regression (PLSR), multivariate regression analysis using least absolute shrinkage and selection operator (LASSO), two Bayesian approaches (BayesA, BayesB) and an ordinary BLUP method (GS-BLUP) for the estimation of genome-wide breeding values for dual purpose Simmental Fleckvieh in Austria. A forward prediction and cross validation were carried out for fat percentage, protein yield, somatic cell count, and non return rate after 56 days in cows. Using cross validation, accuracies of genome-wide breeding values were in the range of 0.36 to 0.76. In forward prediction, obtained accuracies were between 0.20 and 0.61.