Application of machine learning for the prediction of stable isotopes of water concentrations in streams and groundwater
This dissertation introduces the application of machine learning to isotope hydrology. The recent development of laser spectroscopy has made it feasible to measure stable isotopes of water in high temporal resolutions up to sub-hourly scales. High-resolution data provides the opportunity to identify the fine-scale, short-term transport and mixing ... processes that are not detectable at coarser resolutions. Despite such advantages, routine and long-term sampling of streams and groundwater sources at high temporal resolution is still far from being widespread. Novel approaches that are able to predict and interpolate infrequently measured data at multiple sources would be a major breakthrough. This dissertation focuses on the application of machine learning and hyperparameter optimization to efficiently predict high-resolution isotope concentrations of multiple stream and groundwater sources in the Schwingbach Environmental Observatory (SEO), Germany. In a first step, an automated mobile laboratory was utilized to automatically sample and analyse stable isotopes and water quality for multiple water sources at 20 min intervals in situ. Prompt responses of isotope concentrations to precipitation revealed that shallow subsurface flow pathways rapidly delivered water to the stream. A Spearman rank analysis indicated that precipitation is a main driver of event and pre-event water contribution. In a second step, Artificial Neural Network (ANN) and Support Vector Machine (SVM) were optimized to predict maximum event water fractions in streamflow on independent precipitation events using only precipitation, soil moisture and air temperature as input features. The optimized SVM outperformed that of ANN with an RMSE of 9.43%, MAE of 7.89%, R2 of 0.83, and NSE of 0.78. A systematic hyperparameter optimization approach showed that an adequate number of hidden nodes and a suitable activation function enhanced the ANN performance, whereas the performance of SVM was directly related to the selection of the kernel function. Finally, a Long Short-Term Memory (LSTM) deep learning model was optimized using a Bayesian optimization algorithm to predict high-resolution time series of isotope concentrations in multiple stream and groundwater sources using a set of explanatory data that are more straightforward and less expensive to measure compared to the stable isotopes. The LSTM could successfully predict isotope concentrations of stream and groundwater sources using only short-term sequence (6 h) of measured water temperature, pH and electrical conductivity with an RMSE of 0.7‰, MAE of 0.4‰, R2 of 0.90, and NSE of 0.70. In conclusion, machine learning methods are promising tools for the prediction of variables that are difficult, expensive or cumbersome to measure.