Maximizing information content of SNP arrays for genomic prediction



Weitere Beteiligte



ISSN der Zeitschrift




Genomic prediction is a promising tool for improving genetic gains in various crops, serving as a valuable tool for plant breeders. SNP arrays are the preferred genotyping tool for breeders of most major crops, however the limited predefined marker number associated with SNP arrays has the potential to impede achievable prediction accuracy in genomic prediction. The objective of this study was to evaluate cost-effective methods for maximizing the information content of SNP arrays. Three methods were explored and their information content was assessed using prediction accuracies from six genomic prediction models across diverse crops and agronomic traits. Independently of the method used to increase the information content of SNP arrays, the applied genomic prediction models consistently demonstrated similar performance in terms of prediction accuracy within traits, making them equally suitable for genomic prediction across a variety of crops and traits. The first method to maximize the information content of SNP arrays involved constructing haplotype blocks with various methods and parameters and utilizing their haplotypes for genomic prediction. Analyzing data from rapeseed, maize, wheat and soybean in genomic prediction models revealed only marginal improvements in genomic prediction accuracy across most traits. Notably, haplotype blocks demonstrated effectiveness in compensating for poorly performing models in scenarios with highly variable prediction accuracies across prediction models. Nevertheless, the absence of a consistent ideal method or parameter for constructing haplotype blocks makes them a hyperparameter requiring careful tuning. Furthermore, failed allele calls from SNP arrays were examined for their information content in genomic prediction of agronomic traits in maize and rapeseed. Two statistical pipelines were developed and tested to filter non-random failed allele calls from random technical errors. Surprisingly, failed allele calls, potentially originating from genome structural variants, exhibited prediction accuracies comparable to genome-wide SNP datasets. However, the combination of SNPs and failed allele calls did not enhance genomic prediction.
As an alternative to whole-genome sequencing marker data, imputation of whole-genome sequencing marker data from SNP arrays was explored. While there was a considerable improvement in LD and marker density, no increase in prediction accuracy was observed. This can likely be attributed to erroneous haplotypes and marker calls resulting from imputation errors. A suitable hypothesis to explain this observation is that these errors are introduced by the high complexity and redundancy of crop plant genomes.
Across all three methods, relationships emerged as an explanation for the lack of improvement in genomic prediction accuracy. Relationship estimates exhibited a high correlation between those obtained from SNP array data and methods to increase the information content of SNP arrays, contributing predominantly redundant information. Moreover, it can be assumed that markers on arrays generally exhibit sufficient LD with adjacent QTL. In conclusion, SNP arrays were proven to be a reliable genotyping technology, offering a representative sample of the genome for estimating relationships. Furthermore, this study reaffirms the potential of genomic prediction as a breeding tool to improve genetic gain in several crops.




Erstpublikation in


URI der Erstpublikation



Erstpublikation in