Neurohydrological prediction of water temperature and runoff time series

In this paper we give an overview of experiments with artificial neural networks on the Hungarian reach of the Danube River carried out by the Hungarian Hydrological Forecasting Service. Two areas were selected: rainfall-runoff modelling and water temperature simulation. The statistical machine learning method is a universal interpolation and classification tool, but showed poor performance when applied for correlation in complex hydrological situations. Despite very strong learning skills of neural networks even a conceptual model gave more consistent and superior results through validation, and the statistic method is more sensitive to overlearning than deterministic methods. Despite deterministic models being superior artificial neural networks still provide satisfactory results that confirms their application.


Introduction
The machine learning type artificial intelligence approaches have a long history reaching back to the work of McCulloch and Pitts (1943). Later on Rosenblatt (1957) presented the concept of a perceptron but it got great interest only after its rediscovery by Rumelhart et al. (1986). The nonlinearity of Rumelhart's multilayer perceptron (MLP) was a significant advantage compared to the linear behaviour of the Rosenblatt perceptron, and by the back error propagation method it was possible to find the weights of the internal synapses. An artificial neural network (ANN) is non-parametric statistical method, it is considered as a black box model and by opening this box one does not find the representation of the physical processes, and thus method extraction is not yet possible. It is an artificial architecture consisting of neurons and synapses resembling the biological cognition and its information processing characteristics. The information is distributed throughout the network, and the information processing is a parallel process, so even a significant change in some of the weights of the synapses has only a minor effect on the performance of the entire network. The information processing and learning capabilities of the ANN are utilized in the main application areas. General application areas are for example interpolation, categorization, classification, while higher level applications are for example image processing, image classification, language processing and language generation, machine translation, spell checking, named entity recognition, part-of-speech tagging, semantic parsing and question answering, paraphrase detection. However certain application areas require more complex approaches than regular artificial networks, such as the convolutional neural networks for high resolution image processing. The first published hydrological application of the artificial neural network (ANN) was by Daniel (1991) but it was followed by a great number of studies and it is still an intensively studied area. Comprehensive reviews on neural hydrology (or neurohydrology) were made by Govindaraju (2000a and2000b), Tanty and Desmukh (2015) and Lange and Sippel (2020), while numerous case studies are also available (Rabi et al., 2015;Temizyurek and Dadaşer-Çelik, 2018;Zhu et al., 2018;Zhu et al., 2019;Van Leeuwen et al., 2020;Jakab et al., 2021). The aim of this paper is to compare the performance of an artificial neural network with regular hydrological calculations of deterministic and conceptual types. Rainfall-runoff modelling of a small lower mountainous catchment was one of the areas where a conceptual approach was compared to the ANN. The second experiment was the water temperature simulation of a river where both a deterministic and a conceptual method were compared to the ANN.

Artificial neural networks
The feed forward multi-layer perceptron (MLP) stands of at least three layers of neurons: input, hidden and output layers. Classifications and regressions are two possible applications of ANNs. Speaking of either application some inputs are given at the input layer and outputs are received at the output layer. These outputs are compared to target values and based on the errors the ANN is able to learn (Rumelhart et al., 1986). This training method requires a great number of training data carefully selected to describe the general range of the resembled process because the extrapolation capabilities of an ANN are poor (Govindaraju, 2000b). We used the widely applied error back-propagation with added momentum for the training process. The algorithms used for this study were programmed in C# in Microsoft Visual Studio, but nowadays more and more ANN related tools and solutions are available online that need no programming skills to apply. A schematic representation of a neuron is presented on Fig. 1. From left to right the j-th neuron (Nj) has n number of inputs (In) with wjn weights of the synapses and it has a b bias value. These are summed as: where Iinput neuron, wweights of the synapses, bbias value , Ssum of input synapses and bias value.
All neurons have a transfer or activation function (Aj) which is usually of sigmoid type. If we take the aforementioned logistic function, the Oj output of the neuron will be written as: where Ooutput of the neuron, Aactivation function, Ssum of input synapses and bias value.
The feed forward MLP stands of at least three layers of neurons: input, hidden and output layers. The neurons of the input layer have only one input per neuron and boththe input and output layers have linear transfer function, thus: where Ooutput of the neuron, Ssum of input synapses and bias value.
The schematic diagram of a feed forward multilayer perceptron is shown on Fig. 2. Classifications and regressions are two possible applications of ANNs. Speaking of either application some inputs are given at the input layer and outputs are received at the output layer. These outputs are compared to target (T) values and based on the errors (E) the ANN is able to learn (Rumelhart et al. 1986).
This training method requires a great number of training data carefully selected to describe the general range of the resembled process because the extrapolation capabilities of an ANN are poor (Govindaraju, 2000b). We used the widely applied error back-propagation with added momentum for the training process. For any weight (Formula 5) and bias (Formula 6) in the network the formula is written as: where αlearning rate, βmomentum, ttraining step, Eerror value, bbias value. We used the tangent hyperbolic function as an activation function, thus the partial derivative of the error after the application of the chain rule in the output layer is: In the hidden layer it is: where Eerror value, Ttarget value, Ooutput of the neuron, Ssum of input synapses and bias value, wweights of the synapses.
The formulas for the biases (b) are similar except they do not have any inputs so the multiplication with the output (O) of the previous layer is missing.

Rainfall-runoff modelling
The study area for rainfall-runoff modelling was the lower mountainous catchment of Galga River in Hungary upstream of Galgamácsa gauging station. This station is 29.8 kilometres upstream from the confluence with its recipient, the Zagyva River, the area of its catchment is 288 km 2 . It is a small catchment with complex hydrological behaviour where conceptual hydrological modelling has not provided satisfactory results yet. The applied rainfall-runoff model was of conceptual type developed and operated by the Hungarian Hydrological Forecasting Service.
The input of the model is a rainfall time series already processed by a spatially distributed snow calculation to add molten and extract frozen snow water. The potential and the actual evapotranspiration values are then estimated by the Thornthwaite equation (Thornthwaite 1948) and the Mintz-Walker formula (Mintz and Walker 1993). The output time series of the distributed module are surface water income or active precipitation series. The active precipitation values of the distributed model are then summed on every subcatchments of the model domain such as the Galga subcatchment used in this study.
The rainfall-runoff model uses a modified version of the Horton equation (Horton, 1933) to calculate infiltration. The modification in the Horton equation is the substitution of the exponential term with the relation of the saturated water content and the actual water content of the specific soil, and it also takes frozen soil into account. The active precipitation is separated by the above formulas to evapotranspiration, and three runoff components: surface runoff, rootzone runoff and groundwater.
The separated runoff components are then tranformed to the outflow section of the subcatchment by discrete linear cascade models (DLCM). DLCM is a flood routing method originally published by Kalinin and Milyukov (1957), and later by Nash (1960), details are also available in Chow et al. (1988). The analitic solution of the DLCM is done with a state-space approach and with a non-integer number of cascades based on the work of Szilagyi (2006). Calibrations were carried out on the data from 2015 to 2019, validation on 2013-2014 and 2020.

Water temperature simulation
The selected site for water temperature simulations was on the Danube River at Paks gauging station. This station is of high importance due to the cooling water outlet of the nuclear power plant located nearby. Two methods were compared to the artificial neural network, a physically based energy balance estimation and a simplified conceptual model ignoring all the terms but the sensible heat with an added transfer coefficient.

The estimation of the energy balance
First of the methods is the estimation of the terms of the energy balance of water published by Starosolszky (1969) and many other e.g. Mohseni and Stefan (1999). The model uses the observed water temperature as an initial condition and estimates the daily changes based on the energy balance driven mainly by the predicted air temperature of the ECMWF meteorological model. The governing equation of the energy balance is: where Eototal energy emitted or absorbed, Euenergy flux through the river bed, Eswnet solar (shortwave) radiation, Aalbedo, Elwatmospheric (longwave) radiation, Esemitted longwave radiation of water surface, Ecsensible heat transfer on the water-air boundary layer (convective heat flux), Eeenergy loss or gain from evaporation or vapor condensation (latent heat transfer or evaporative heat flux), Epenergy from precipitation.
Equations for direct solar radiation are available (Dozier, 1979), its value can be estimated with sufficient accuracy based on the formulas of spherical trigonometry. However the estimation of the diffuse component of shortwave radiation is not as exact, there are formulas also available (Liu and Jordan, 1960;Gates, 1980;Linacre, 1992;Oliphant et al., 2006). The longwave atmospheric radiation is based on the Stefan-Boltzmann Law (Stefan, 1879;Boltzmann, 1884).

The weighted mean temperatures
The theory, published by the Swedish meteorologist Olof Bertil Rodhe in 1952 andlater in 1955, was an answer to the uncertain approach of temperature sum based methods (Östman, 1950;Nusser, 1950;Palosuo, 1951) of that era. This approach was specifically developed for marine application, to predict the appearance and later the extension of shore ice along the Baltic coastline. The U.S. Army Cold Regions Research and Engineering Laboratory experimented with river application and presented promising results (Bilello, 1963), and also recommended Rodhe's method for further testing on river ice prediction. Liptay (2018a) also applied the method for the Hungarian Danube reach with promising results. Although the theory of energy balance had been known for a long time, meteorological measurements were unable to provide the necessary input data for the calculation of the energy balance till the end of the 1960s (Csoma, 1968). Rhode assumed that all terms of the full energy balance are neglected but the direct energy transfer between water and air.
where After the solution and discretization of the basic differential equation (10), the final form is equation 11.
where In the theory of weighted mean temperatures k is considered to be a constant, and its value is selected to drive Eq. 11. to zero when ice is expected. It is also possible to find a value for k where the series of calculated and observed water temperatures have the highest correlation (Liptay, 2018b). Mohseni and Stefan (1999) presented that the air temperature and water temperature has an S-shaped relation. This relation of the weekly average values from 01. 07. 2008 to 31. 08. 2017 is presented for Paks on Fig. 3. A polynomial trend line is also shown with the coefficient of determination of 0.92. We consider this relation as a sigmoid function (σ(x)) such as the standard logistic function, the normal cumulative distribution function, or the tangent hyperbolic function just to name a few. The first derivative of σ(x) gives the measure of the rate at which water temperature changes with respect to the change of air temperature as a continuous function. Considering a more general form of the logistic function: where σ(x)logistic function, ascale, bsteepness, cshift for the transformation of the standard function.
Finally the formula for k is written as: where ktime inverse coefficient [1 s -1 ] or a constant with inverse dimension of time, kmmaximum value for k at the lowest and highest water temperatures, σ(x)logistic function.
A significant increment in model performance was achieved by substituting the local air temperature with a simple formula representing the convection of wate temperature from an upstream station. Another major advance was the introduction of the appearance on the surface of rivers, because the process of energy exchange is influenced. In order to describe this effect in a simple yet appropriate way we introduced an additional factor for k when the calculated water temperatures are under 0°C. The data set for calibration was from summer 2015 to 2017, while the validation was done on data from summer 2017-2018, and summer 2018-2019 periods.

Rainfall-runoff modelling
The ANN based method provided superior results on the learning data, but showed poor performance during any validation attempts. The conceptual rainfall-runoff calculation showed consistent performance during both of the calibration and the validation runs. The observed and simulated discharges were compared at Galgamácsa station based on the RMSE and water balance (accumulated discharges) values. Results are in Table 1.

Water Temperature Simulation
We used two different neural network architectures for water temperature simulation. The first approach was a single output multilayer perceptron (MLP) predicting only one day at a time. The inputs include the latest observed water temperature and the daily air temperature predictions regarding the lead time. For example for 1 day lead time it had two input neurons, while for 10 days it had 11 neurons on the input layer. This approach resulted in 10 separate neural networks for all the 10 days. The second approach was an architecture for the entire 10 days of lead time, so the number of input neurons was 11 (1 observed water temperature and 10 daily air temperature predictions) and the number of output neurons was 10. Zhu et al. (2019) analysed the possibilities of artificial neural networks in water temperature simulation and documented increased efficiency when introducing streamflow and the day of year (DOY) as inputs. They also highlighted that streamflow has a decreasing impact on the accuracy on rivers with lower bed slope. We also analysed the possibilities of these two additional inputs but did not obtain any positive results. The greater details of the methods and calculations are available in Liptay and Gauzer (2021). The estimation of the energy balance showed consistent performance in every situation always giving satisfactory results. The conceptual model performed slightly poorer and showed less consistency on the validation time series. The ANN reached the tightest fit during calibration even better than the physical model, but did not perform well in the validation scenarios.

Conclusion
The time series forecasting capabilities of feed-forward multilayer neural networks were compared to deterministic models in this paper. The main difference between the non-parametric statistic and deterministic methods is the consistency of the performance. While the ANN is able to achieve superior calibration results, it is easily overtought. ANN water temperature and rainfall-runoff models were good tools to analyse a set of data, but any step outside of the already introduced set of information leads to extrapolation which is a weakness of ANNs. This makes ANNs a very good tool for the quick evaluation of time series and for the analysis of correlation between data. Statistical models do not need any information about the catchment or the physical environment of the data and this advantage is very important in case of insufficiently monitored areas. Time series forecasting with deep learning methods is a continuously growing and developing field such as almost any application of artificial intelligence methods. For example convolutional neural networks (CNNs) are widely applied also in time series forecasting besides image processing and are also discussed by Lange and Sippel (2020).