Transcriptome analysis in preterm infants developing bronchopulmonary dysplasia : data processing and statistical analysis of microarray data

Lade...
Vorschaubild

Datum

Betreuer/Gutachter

Weitere Beteiligte

Beteiligte Institutionen

Herausgeber

Zeitschriftentitel

ISSN der Zeitschrift

Bandtitel

Verlag

Zusammenfassung

Bronchopulmonary dysplasia is one of the most common chronic lung diseases and contributes greatly to morbidity of preterm infants. While moderate and severe forms of BPD are the most common forms under investigation little is known about the development of mild BPD. The aim of this work is to identify mechanisms and biomarkers, which make it possible to predict at birth whether a preterm infant is prone to develop no BPD, mild BPD, or a stronger form of BPD.Transcriptome and in particular microarray analysis plays an important role in the generation of hypotheses regarding underlying mechanisms and diagnostic tools. Microarrays are able to examine a multitude of transcripts simultaneously. In order to obtain reliable results, however, a number of data preparation steps are necessary. The statistical analysis has some peculiarities due to the high number of parameters collected and a comparatively small number of patients. In the present study, a standardized workflow for the statistical analysis of transcriptome data is developed and used to predict BPD in very preterm infants.First, background correction and normalization steps are performed to prepare the data. This on the one hand, separates signal from noise in the gene expression, and on the other hand makes the microarrays comparable. Then informative transcripts are iteratively selected. Transcripts are reviewed for missing values, low expression levels, and extreme values and if necessary eliminated. Then remaining missing values are estimated using an imputation algorithm.Data preparation was particularly facilitated through the implementation and automation of workflow using the programming language R. In comparison to a preparation that is based on different independent programs and tools a considerable advantage in terms of data amount that can be processed, processing time, and actuality of the algorithms can be achieved. Existing programs have been replaced by Bioconductor packages where possible to avoid data transmission errors.The instruments for data preparation can be used for the analysis of either predefined groups (supervised) as well as without predetermined groups (un-/ semi-supervised). This way it is possible to take the nature and prerequisites of the different statistical analyses into account. The group-based (supervised) data analysis is used to work out differences between the examined groups. For the presented study two methods (Limma, PAM) were used to identify differentially regulated genes. While Limma determined individual transcripts that are differentially regulated in isolation from other transcripts, the focus of PAM is on the interplay of the transcripts to explain the different expressions of the phenotypes.The aim of the transcriptome analysis without prior definition of groups (unsupervised) is to identify groups solely based on gene expression. Since in this case a very large number of transcripts will be taken into account, this approach is only suitable to draw conclusions about underlying diseases affecting the whole gene expression. Therefore in a semi-supervised approach the data preparation is performed without groups. However, only a selection of transcripts is used. The selection is based on clinical data associated with the phenotype. With this selection clustering techniques are then used to identify groups. In the present case different maturities of preterm infants at time of birth caused particular difficulties while forecasting BPD groups. Frequently the gene expression patterns differ with maturity. To address this issue in particular the gestational age of preterm infants is used as a secondary variable in the selection of transcripts. In addition it is beneficiary to have only transcripts selected that show an effect in mechanical ventilation and oxygen requirement but not in GA or in addition to the effect of GA. As this cannot be achieved with the usual methods of gene selection (Limma, PLS), a multiple linear regression is performed here, which allows filtering only transcripts with additional effects.The gene expression analysis of the present study comprising neonates born before 32 weeks of gestation shows that consideration of processes at birth significantly augments the understanding of BPD in general and its classification in different severity grades. With the help of the presented gene expression analysis tools for data preparation, data analysis and functional gene expression analysis, it is possible to predict BPD severity grades at birth and identify cytokines as biomarkers.Our results showed that the combination of oxidative stress and inflammation at birth contributes to the severity of BPD. In light of the duration of mechanical ventilation and the duration of oxygen supply considered, it becomes evident that processes responsible for the T-cell development are associated with the development of BPD. Furthermore, the importance of tumor necrosis factor alpha (TNF alpha), interleukin 6 (IL6), interleukin 1 and interleukin 10 in the regulation of the differential gene expression in BPD becomes apparent.

Verknüpfung zu Publikationen oder weiteren Datensätzen

Beschreibung

Anmerkungen

Erstpublikation in

Erstpublikation in

Sammelband

URI der Erstpublikation

Forschungsdaten

Schriftenreihe

Zitierform