Exploring the potential of machine learning methods and selection signature analyses for the estimation of genomic breeding values, the estimation of SNP effects and the identification of possible candidate genes in dairy cattle
The objective of this thesis was to study a variety of factors that affect the accuracy of genomic predictions applying random forest methodology (RF), genomic BLUP (GBLUP) and single step genomic BLUP (ssGBLUP) method with strong focus on training set design. In the following, selection signature through variation in linkage disequilibrium (LD) within and between dual-purpose black and white (DSN) and Holstein populations was identified.In chapter 2 a stochastic simulation was applied for genomic predictions of binary disease traits based on cow training set. Composition of training and testing sets were modified in different allocating schemes. In addition, different scenarios were studied according to the quantitative-genetic background of the trait, the genetic architecture as well as low and high density of SNP chip panel. The highest genomic prediction accuracies were achieved when disease incidences within training sets was close to the population disease incidence of 0.20. Decreasing the traits heritability and QTL reduction were associated with decreasing genomic prediction accuracies.In chapter 3, different disease traits from 6,744 cows with genotypes from 58 large-scale contract herds was used to study the impact of training set composition, the impact of response variable as well as the impact of RF, GBLUP and ssGBLUP methodology on genomic prediction accuracies. Using de-regressed proofs (DRP) as response variables, accuracies were larger compared to pre-corrected phenotypes (PCP) for both methods GBLUP and RF. A further increase in genomic prediction accuracies was realized via ssGBLUP method compared to corresponding scenarios with RF or GBLUB. In addition, RF identified significant SNP close to potential positional candidate gene, i.e., GAS1, GPAT3, and CYP2R1 for clinical mastitis, SPINK5 and SLC26A2 for laminitis, and FGF12 for infertility.Genetic variation between the Holstein and the DSN population as well as between sub-populations was inferred by using XP-EHH method in chapter 4. The analysis was performed on 2,076 genotyped Holstein cows and 261 genotyped DSN cows. The most outstanding XP-EHH score that revealed the regions under recent selection was on chromosome 6 and on chromosome 12 for DSN and on chromosome 20 for Holstein population. Annotation of selection signature regions revealed various genes associated with production traits such as CLU and WARS2. Furthermore, several hub genes associated with dermatitis digitalis resistance was detected including FARS2, ACTR8 and CRY1.
Verknüpfung zu Publikationen oder weiteren Datensätzen