Predictive Modelling with Machine Learning in Plant Breeding

Lade...
Vorschaubild

Datum

Weitere Beteiligte

Herausgeber

Zeitschriftentitel

ISSN der Zeitschrift

Bandtitel

Verlag

Zusammenfassung

Genomic prediction, originally proposed as a solution to the limitations of marker-assisted selection for complex traits, has become the standard for estimating breeding values in both inbred and hybrid crops. While linear models such as GBLUP and RR-BLUP remain effective in many cases, especially when assuming an additive genetic architecture, recent years have seen a growing interest in applying machine learning (ML) methods to overcome some of their constraints, including their limited capacity to model non-additive effects and nonlinear interactions. This thesis explored the influence of three key aspects on the success of genomic prediction: The choice of input features, the statistical model used, and the target trait or crop.
In terms of input features, marker data was compared to minimalist parentage-based models, haplotype blocks, and features generated using autoencoders. It was shown that even simple ML models using parentage-based information can rival marker-based GBLUP under certain conditions, which holds potential for small breeding programs with large amounts of historical, but ungenotyped, records. At the same time, dimensionality reduction techniques, especially a novel haplotype-based autoencoder that was developed during this thesis, were introduced to compress genomic data while preserving prediction accuracy and successfully accelerated model training.
Concerning the model aspect, a variety of ML algorithms were benchmarked using different approaches for hyperparameter tuning. Although no single model outperformed others across all traits and crops, ensemble approaches typically performed better than the individual models they were based on. Support vector machines seemed to be relatively unstable when compared to other ML based algorithms, such as tree-based models.
Finally, results showed that the accuracy of the genomic predictions was strongly dependent on differences between traits, crops with different breeding schemes, and different populations. For hybrids, ML performed well when SCA was more important for determining the hybrid yield than GCA. Large differences were observed for different fungal diseases in wheat, while differences among methods for the same disease were relatively similar.
While ML has not yet provided a significant improvement over traditional methods in many scenarios, its flexibility and potential for multi-modal data integration remain promising. The development of plant breeding-specific model architectures, such as haplotype-based autoencoders, may represent a more promising path than the general application of standard ML models.

Beschreibung

Inhaltsverzeichnis

Anmerkungen

Erstpublikation in

Sammelband

URI der Erstpublikation

Forschungsdaten

Schriftenreihe

Erstpublikation in

Zitierform