Maximizing information content of SNP arrays for genomic prediction

Weber, Sven Ernst

Maximizing information content of SNP arrays for genomic prediction

dc.contributor.advisor	Snowdon, Rod
dc.contributor.advisor	Frisch, Mathias
dc.contributor.author	Weber, Sven Ernst
dc.date.accessioned	2024-06-11T12:19:55Z
dc.date.available	2024-06-11T12:19:55Z
dc.date.issued	2023
dc.description.abstract	Genomic prediction is a promising tool for improving genetic gains in various crops, serving as a valuable tool for plant breeders. SNP arrays are the preferred genotyping tool for breeders of most major crops, however the limited predefined marker number associated with SNP arrays has the potential to impede achievable prediction accuracy in genomic prediction. The objective of this study was to evaluate cost-effective methods for maximizing the information content of SNP arrays. Three methods were explored and their information content was assessed using prediction accuracies from six genomic prediction models across diverse crops and agronomic traits. Independently of the method used to increase the information content of SNP arrays, the applied genomic prediction models consistently demonstrated similar performance in terms of prediction accuracy within traits, making them equally suitable for genomic prediction across a variety of crops and traits. The first method to maximize the information content of SNP arrays involved constructing haplotype blocks with various methods and parameters and utilizing their haplotypes for genomic prediction. Analyzing data from rapeseed, maize, wheat and soybean in genomic prediction models revealed only marginal improvements in genomic prediction accuracy across most traits. Notably, haplotype blocks demonstrated effectiveness in compensating for poorly performing models in scenarios with highly variable prediction accuracies across prediction models. Nevertheless, the absence of a consistent ideal method or parameter for constructing haplotype blocks makes them a hyperparameter requiring careful tuning. Furthermore, failed allele calls from SNP arrays were examined for their information content in genomic prediction of agronomic traits in maize and rapeseed. Two statistical pipelines were developed and tested to filter non-random failed allele calls from random technical errors. Surprisingly, failed allele calls, potentially originating from genome structural variants, exhibited prediction accuracies comparable to genome-wide SNP datasets. However, the combination of SNPs and failed allele calls did not enhance genomic prediction. As an alternative to whole-genome sequencing marker data, imputation of whole-genome sequencing marker data from SNP arrays was explored. While there was a considerable improvement in LD and marker density, no increase in prediction accuracy was observed. This can likely be attributed to erroneous haplotypes and marker calls resulting from imputation errors. A suitable hypothesis to explain this observation is that these errors are introduced by the high complexity and redundancy of crop plant genomes. Across all three methods, relationships emerged as an explanation for the lack of improvement in genomic prediction accuracy. Relationship estimates exhibited a high correlation between those obtained from SNP array data and methods to increase the information content of SNP arrays, contributing predominantly redundant information. Moreover, it can be assumed that markers on arrays generally exhibit sufficient LD with adjacent QTL. In conclusion, SNP arrays were proven to be a reliable genotyping technology, offering a representative sample of the genome for estimating relationships. Furthermore, this study reaffirms the potential of genomic prediction as a breeding tool to improve genetic gain in several crops.
dc.description.sponsorship	Bundesministerium für Bildung und Forschung (BMBF); ROR-ID:04pz7b180
dc.identifier.uri	https://jlupub.ub.uni-giessen.de/handle/jlupub/19268
dc.identifier.uri	https://doi.org/10.22029/jlupub-18629
dc.language.iso	en
dc.relation.haspart	https://doi.org/10.3389/fpls.2023.1217589
dc.relation.haspart	https://doi.org/10.3389/fpls.2023.1221750
dc.relation.haspart	https://doi.org/10.1139/gen-2023-0126
dc.rights	In Copyright	*
dc.rights.uri	http://rightsstatements.org/page/InC/1.0/	*
dc.subject	genomic prediction
dc.subject	SNP marker
dc.subject	haplotype block
dc.subject	structural variations
dc.subject	breeding
dc.subject.ddc	ddc:500
dc.subject.ddc	ddc:580
dc.title	Maximizing information content of SNP arrays for genomic prediction
dc.type	doctoralThesis
dcterms.dateAccepted	2024-05-17
local.affiliation	FB 09 - Agrarwissenschaften, Ökotrophologie und Umweltmanagement
local.project	BreedPatH
thesis.level	thesis.doctoral

Files

Original bundle

Now showing 1 - 1 of 1

Name:: WeberSvenErnst-2024-05-17.pdf
Size:: 15.06 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 7.58 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Dissertationen/Habilitationen