Exploring the performance of genomic prediction models for soybean yield using different validation approaches

2019 
Genomic selection is a valuable breeding tool that has a great potential for implementation in a real breeding program, as long as prediction model performance is carefully evaluated for each specific scenario. The performance of genomic prediction models has been commonly evaluated by standard cross-validation that can lead to an overestimation of the model performance, by using the same genetic material and their performances that were included in the model development. Besides cross-validation, this study explored the efficiency of yield prediction models for soybean (Glycine max (L.) Merr.) by using historical data for external model validation. Historical data represents a valuable source for evaluation of model performance, simulating the real breeding process. In general, results indicate a modest influence of statistical model and marker number on the prediction ability cross-validation and external validation. In both considerations, non-parametric random forest (RF) model showed an overestimation of genomic estimated breeding values (GEBVs). Overall, genomic prediction ability for soybean yield for historical data across years was relatively high (0.60), implicating that the model has the potential to predict broad adaptation of breeding lines. The model, however, had variable ability to predict phenotypic performance in separate years, with especially high prediction ability in years not impacted by yield-limiting factors, when the genetic potential was fully achieved. General improvement of model performance in both cross-validation and external validation was achieved by increasing the phenotyping intensity that must reflect the target environment variability in terms of different climatological conditions.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    82
    References
    9
    Citations
    NaN
    KQI
    []