Comparison of spatial interpolation methods of hydrological data on example of the Pripyat river basin (within Ukraine)

The article considers four methods of spatial interpolation: method of inverse weighted distances (IDW), triangulation (TIN), Spline interpolation and Kriging. The Pripyat basin was chosen as the study area, and the regularities of the spatial distribution of hydrological characteristics across the territory were assessed. For this territory, maps of spatial distribution of the specific discharge by four chosen methods were created; the accuracy of the obtained results was assessed. Based on the results of the work it was determined that the IDW method with a distance coefficient P=2 gives better results for the generalization of hydrological data over the studied area. The next most reliable methods are Kriging, which shows small errors, and Spline, with smooth transitions. The least suitable among the studied methods is TIN method. To study boundaries and territories that are outside the boundaries defined by GIS based on input data, in this case – Pripyat basin, IDW method is recommended to use, while every other can be used to study the central part of the catchment, with different reliability for the boundary territories.


Introduction
In order to obtain a continuous map of the distribution for runoff characteristics over the areas, which are insufficiently covered with measuring stations, it is necessary to apply various methods of spatial interpolation. There are various methods of interpolation, but the choice of the optimal one depends on the data sample, the density of the network of observations, the physical and geographical conditions of the territory and the features of the spatial distribution of each specific characteristic (Ghasemi et al, 2015). Nowadays, the most convenient methods of interpolation involve the use of GIS technologies. In hydrology and engineering hydrology, hydrological data modelling and analysis are necessary, at different scales and for various purposes (Sui and Maggio, 1999). GIS technologies are necessary and useful tools in surface water management, which allow presenting the relationship between spatial and hydrological data of river basins (Kusre et al, 2010). A description of this topic and the benefits of using GIS technologies can be found in the works of Ukrainian (Korniienko et al, 2021;Pochaievets et al, 2021) and international researchers (Hammouri and El-Naqa, 2007;Sami et al, 2013). However, despite a number of works devoted to the interpolation of hydrological data within the Pripyat basin, full-fledged comparisons of different interpolation methods were not carried out in practice. Analyses of this type are carried out all over the world (Dzhalalvand et al, 2019), but their results cannot be accurately extrapolated to areas with other natural conditions. All this determines the relevance of this topic. This type of researches is also important as the Pripyat River basin has an insufficiently dense network of observation stations and an uneven distribution of hydrological posts throughout the basin. This requires spatial interpolation in the study of river basins and river basins management (Sokolchuk, 2019). The goal of this paper is to compare the effectiveness of the IDW, Spline, Kriging and TIN methods in the spatial interpolation of specific hydrological data on the territory of the Pripyat River basin within Ukraine, determine the optimal settings for each method.

Material and methods
The studied area is the Pripyat River basin, within Ukraine. Pripyat is a river in Ukraine, located in Volyn, as well as partly in Rivne and Kyiv regions, and in Belarus. It is the largest river among the right Dnipro river tributaries, and flows into the Kyiv reservoir. The total area of the river basin is 121 000 km 2 . The Belarusian part of the basin is approximately 43% of the catchment area, the Ukrainianaround 57% (Palamarchuk and Zakorchevna, 2001). The flat nature of the river and slight slopes of the water surface create difficulties in determining hydrological parameters of the river and basin, which is why the values of the basin area and river length can vary according different literature sources (Obodovskii et al, 2012). The Pripyat basin occupies almost the entire northwestern part of Ukraine (Fig. 1). Permanent posts of the hydrological measuring network are located on the right bank of Pripyat River. In view of the foregoing, obtained results can be applied only for this part of the basin, which, however, is most of the Pripyat basin within Ukraine. The Pripyat basin has a well-developed hydrographical network, approximately 10.5 thousand of rivers and streams. Among the right-bank tributaries flowing through the territory of Ukraine the most significant are Turia, Stokhid, Styr, Horyn, Stvyha, Ubort, Slovechna, Solon, Uzh. The Pripyat basin is located within two physical and geographical zones, this river basin is one of the most swampy and forested area in Ukraine. The climate in the territory of the studied basin is moderately continental with warm and humid summers and fairly mild winters. Spring is long and unstable, with frequent alternating cold and warm periods, summer is warm and rainy. The annual amount of precipitation in the territory of the basin varies from 550 to 600 mm (EUWI+, 2019). The water regime is characterized by a long spring flood, a short-term summer low, which is disturbed by rain floods and almost annual autumn water level rises. Spring accounts for 65% of the annual runoff (Susidko and Lukianets, 2004). Most of the Pripyat basin is located within the Poliska Lowland, the south-western part of the basin is on the Volyn Upland. The relief is mainly flat and undulating lowlands and plains. Widespread denudation forms of relief, which are formed on crystalline rocks (EUWI+, 2019). The relief is homogeneous, the landscape and meteorological conditions change smoothly. Therefore, physical and geographical conditions within the territory of the Pripyat basin allow making spatial interpolation. Considering similar physical and geographical conditions of neighbouring watersheds, also data from the Teteriv River basin and the South Bug River was used.

Methodology and input data
As a basic hydrological characteristic for this study, the average annual specific discharge was chosen. It was calculated based on data from stationary flow measurement stations. This characteristic is used to create cartographic materials, compare watersheds, etc. Using GIS were calculated the centre points of the catchments area for particular catchments, defined by the hydrological stations. The general picture of division is presented in figure 2. The study is based on data from 33 permanent hydrological observing posts, 28 within the Prypiat basin and 5 outside this river basin. The long-term averaged value of specific discharge for the right side of Prypiat basin according to observations ranges from its smallest value 2.10 l s -1 km -2 in the upper reaches of the Prypiat (Lyubyaz), the eastern part of the basin, to its highest value of 5.94 l s -1 km -2 for the river Radostavka above Triytsya, in the north-western part of the basin. The area of the basin does not have a predominant effect on the specific discharge; mainly this characteristic depends on the physical and geographical conditions of the territory. Similar patterns of spatial changes can be observed on all obtained maps, which allow comparison of the interpolation methods accuracy. Taking into account a certain unevenness of the development of the hydrological observation network, further evaluation and averaging of data, all available data from hydrological stations were used. Most series of observations begin in the period from the late 1940s to the early 1960s, the average duration of water runoff observations is more than 60 years. In the presence of individual gaps, they were filled with the help of correlation dependencies. Series of observations are sufficiently representative, and runoff parameters in general are reliable and unbiased. This pattern is disturbed for two data series: at the posts Pripyat -Richitsa and Norin-Slovenia. Physical and geographical features of their basins and anthropogenic activities influenced the average annual runoff in these stations. Considering that interpolation comes from watershed centres, and the probable causes of abrupt changes in the average annual runoff are specific to the territory of these watersheds, these data were also used to study spatial generalizations patterns (Sokolchuk, 2019).
During the research, four spatial interpolation methods were compared, for some of them different settings were used. First method-the triangulation (TIN), is based on linear interpolation and it is close to the manual linear interpolation. The starting points are connected in such a way that the resulting surface is covered by triangles, and none of the sides of the triangle intersects the sides of the other triangles (Fig. 3). The surface values at the points of the grid network, which are inside the triangle, are calculated based on the fact that they belong to the plane passing through the vertices of the specific triangle. The obtained surface necessarily passes through all the starting points at which the surface values are determined by observation. Insufficient number of starting points leads to the appearance of large rectilinear segments on the map; the result obtained in this case would be a method error. The second method used is method of inverse weighted distances IDW. It is based on the calculation of weights by which the values of the surface at the starting points are "weighed" when constructing the interpolation function. The weighting factor assigned to a single starting point for calculating the surface value at the grid node is proportional to the degree of inverse distance from the starting point to the calculated grid node. In order to calculate the surface value at the grid node, the sum of all the weights of the starting points is equal to one, and the weight of each starting point is the fraction of this total unit weight. Structure of isolines can form concentric areas of the same value around known data points, generally around one with large or small values of the surface area. It is known as "bull's eye effect", which is the disadvantage of the IDW method. To reduce the influence of these points when implementing the method, it is necessary to set the value of the parameter that smoothest the interpolation function.
The larger the smoothing parameter, the less influence each starting point has on the calculated value in the Grid node (Ishchuk et al, 2003). For this research two values were used, P=2 and P=5, they are indicative and within which the results are approximately reliable for the available data. Spline is the third interpolation method, tool that estimates values using a mathematical function that minimizes overall surface curvature, resulting in a smooth surface that passes exactly through the input points. Conceptually, the sample points extrude to the height of their magnitude. The spline works on the principle of a surface that passes through the input points while minimizing its total curvature. It fits a mathematical function to a specified number of nearest input points while passing through the sample points. This method is highly useful for generating smoothly varying surfaces such as elevation, water table heights or pollution concentrations. We can assume that the hydrological characteristics of the watersheds have similar distribution features. The IDW and Spline interpolation tools are referred to as deterministic interpolation methods as they are directly based on the surrounding measured values or on specified mathematical formulas that determine the smoothness of the resulting surface. Another type of interpolation methods consists of geostatistical methods, such as Kriging, which are based on statistical models that include autocorrelation, the statistical relationships among the measured points. Due to this, geostatistical techniques not only have the capability of producing a prediction surface but also provide some measure of the certainty or accuracy of the predictions (ESRI, 2016; Watson and Philip, 1985). Kriging is the fourth method, which was chosen for comparison. It assumes that the distance or direction between sample points reflects a spatial correlation that can be used to explain variation in the surface. The Kriging tool fits a mathematical function to a specified number of points, or all points within a specified radius, to determine the output value for each location (ESRI, 2016). The evaluation of the used methods was performed by determining the differences between the actual values of the specific discharge and the values taken from the created maps. The evaluation was performed in two ways. The first one was the evaluation of the differences of the observation points, used for the interpolation procedures (22 observation "non-control points" out of total 28 points). This type of the evaluation aims to determine how accurate the interpolation procedures are with respect to the existing (specified) observation points. In the second type of evaluation, 6 points out of 28 within the study area were selected and excluded from the interpolation procedure ("control points"). The evaluation was done based on the comparison between the measured (real) and interpolated values in these control points. This was done to test the feasibility of using the interpolated maps obtained during the research to study hydrologically unexplored watersheds. Points were selected on the principle of even distribution across the territory. The hydrological stations of the catchments adjacent to the Pripyat basin were not taken into account in this assessment. To calculate the mean absolute error (MAE) of the obtained results, all individual regression residues are squared, summed, and the sum is divided by the total number of checked points (Azpurua and Ramos, 2010): (1) where yiactual value, obtained from observations, yppredicted value, obtained by interpolation, nnumber of points.
The square root of this value is denoted as RMSE (Root Mean Square Error): The deviation is also presented as a percentage, by dividing the average absolute error by the average real specific discharge. A comparison of this type cannot be based solely on a mathematical assessment of the results. We estimated as well as the shape of the isolines, the presence of numerical errors on the interpolated surface which called artifacts -unwanted effects which result from using a high-precision GIS to process low-accuracy spatial data, from positional errors, not attribute errors etc. (Goodchild and Kemp, 1990). Also were taken into consideration the overall compliance of the distribution of the annual average specific discharge within the studied area.

Results
As a result of the research maps of the distribution of the specific discharge were created using different methods of spatial interpolation. The nature and features of spatial changes of the selected hydrological characteristic within the basin was analysed. Surfaces of mathematically calculated values are presented in the form of gradient with extracted isolines on Fig. 4a, 4b and 5a-e. The results of the comparison of actual and interpolated values are shown in Table 1 and 2. In absolute values the lowest error has the method of Kriging exponential, the second most accurate method is the IDW method. The highest errors are shown by values, taken from surfaces, interpolated by the TIN linear and Kriging exponential. Despite the difference in accuracy between the methods is 100%, on average obtained values deviate from the actual values of the specific discharge by less than 6%, i.e. all methods give a result close to the actual values. The assessment of the accuracy of the obtained maps during the inspection for the previously extracted control points (Table 2), as expected, showed greater errors, but the general trends remain. Kriging shows the most accurate results, followed by IDW, TIN and Spline in sequence. Last one showing more than 15% error for the assessment of basins within the study area. Of all the four methods used, only one, IDW, also allows data extrapolation beyond conditional boundaries formed by geographically boundary points ( Fig. 4a and 4b). This allows estimating values for most of the Pripyat basin within Ukraine. However, due to the absence of observation posts, the assessment of their accuracy in the most remote parts is difficult. The figures (Fig. 4a and 4b) also show the formation of "bull's eyes" effect, information is not interpolated on the entire territory, especially when increasing the distance factor. However, given the interpolation from the centres of catchments, mostly slow changes in physical and geographical conditions of runoff formation in the study area, which is indirectly confirmed by small errors, the application of the method is rather justified. Interpolations by the other three methods give results only within the limits, set automatically by the QGIS, defined by input data. It is more mathematically correct, but does not allow to fully assess the watersheds, not sufficiently covered with hydrological observation network. Each of the methods has its advantages, disadvantages and features, caused by both the mechanisms of their work and the peculiarities of the mechanism in the GIS environment used. When interpolating by the Kriging method, shown in Fig. 5a and 5b, the data are interpolated over the area, the transitions are smoother. However, a more detailed analysis of maps and isolines extracted from them showed that at some points of the isolines make sharp changes in direction of isolines and artefacts are widely formed. Therefore, the use of this method for a detailed study of the area might require additional operations. Figures 5.c and 5.d show that when interpolated using the TIN method, the isolines are not sufficiently smooth. Some large rectangular areas with the same values of the specific discharge can be seen, i.e. the method might give inaccurate results. The results at the boundary of the obtained interpolation site in the north are strongly distorted when using TIN cubic (Fig. 5d) due to the peculiarities of the interpolation mechanism in QGIS. It is not possible to take reliable values for the boundary points for their verification, which was also considered in the calculations for Table 1   The transitions and isolines obtained by the Spline method are the smoothest, sharp outlines are not present. The interpolation results are also distorted in the north of the basin due to the mechanism of the algorithm, but to a lesser extent than when using TIN cubic.

Conclusions
According to the results of the work, considering both errors and visual assessment, came to the conclusion that the IDW method is better for data interpolation over the study area. The isolines are smooth and it is possible to obtain an approximate estimate for the entire catchment, which may be particularly important for regions, where there are not enough observation stations. Despite the error of the obtained values is not the smallest for it with distance coefficient P=5, using higher coefficient is necessary to identify zones with significant deviations from the average, as well as to reconcile the boundaries of the zones of influence of the points with the boundaries of watersheds. Kriging method allows obtaining specific discharge with relatively small errors; however, the results require significant refinement or improvement of the source programs that provide interpolation. TIN method in none of the variants can be used to accurately study the boundaries of the catchment and areas outside the defined area. Given the inability to rapidly increase the number of reference points and the above aspects, TIN in these conditions can be used only to refine data in the central part of the right bank of the Pripyat river basin. Compared to linear TIN, TIN cubic showed lower accuracy. The Spline method, despite relatively significant errors and some implementation problems, was proven to be a promising option for further research. Significant advantages are smooth isolines, absence of artefacts. The prevalence of the method for hydrological interpolation was not presented in the Ukrainian scientific literature, which may be due to the small number of such estimates and the lack of this method in the standard set of tools of one of the most common GIS (QGIS). The obtained results can be used to analyse patterns of spatial distribution of hydrological characteristics in the Pripyat basin and preliminary research of catchments, which are not sufficiently covered by observation stations. These conclusions can be cautiously applied to catchments of other rivers with similar conditions, i.e. for plain catchments with smooth changes in the conditions of flow formation, with an average catchment area ranging from 1,000 to 10,000 km 2 . Considering the complete absence of rivers with a catchment area of less than 100 km 2 in the database, the obtained conclusions require separate confirmation of their suitability for the analysis of small rivers. This article also could be used as an illustrative example of the differences between various interpolation methods. It is planned to conduct additional assessment of the accuracy of these methods on watersheds with different hydrologic conditions.