BIVARIATE JOINT PROBABILITY ANALYSIS OF FLOOD HAZARD AT RIVER CONFLUENCE

Assessing the statistical significance of floods in the complex hydrological conditions that exist at the confluence of the main river and its tributaries, as well as the choice of hydrological design parameters for flood protection in these areas is one of the main tasks in the current hydrology. The main aim of this paper is a joint bivariate frequency analysis of annual peak discharges and synchronously occurred maximum discharges at the main river and its tributary. The annual maximum and daily maximum discharges of the Morava River, as main river, and its tributary Myjava River were analysed. We selected the most appropriate copula function for our bivariate analysis. Selected copulas were used to illustrate the joint occurrence probabilities and joint return periods of the discharge pairs and consequently to determine the joint probabilities of measured discharges. Results of such analyses provide comprehensive information about flood situations where a devastating effect may be increased in the case were floods occur at the same time on the main river and its tributary. And at the same time, the results obtained by the bivariate analysis of the variables which characterize the hydrological regime can contribute to a more reliable assessment of the flood risks.


Introduction
Flood wave is the result of the numerous natural processes, such as rainfall distribution and intensity, catchment characteristics and its area, land use, water reservoirs, etc. The determining of the flood wave significance for design of the water management project is based on adequate mathematical technique such as statistical theory of probability. The basic mathematical technique is based on the evaluation of the selected one dimensional variable, which characterizes some extreme event. It is established that, during the flood wave occurrence there is correlation between some hydrological characteristics (discharge, duration, volume) which may have an impact on the design of flood control measures. Therefore, a multidimensional approach to analysing hydrological characteristics is increasingly preferred in many studies (e.g. Yue, 2000;Hawkes et. al., 2002;Favre et al., 2004;De Michele et al., 2005;Szolgay et al., 2016;etc.). Regulation of rivers as well as adjustments of the river basins often brings changes in concentration of basins drainage as well as increasing of the speed of flood wave. One of a result of such interventions, it may be coincidence of the waves on the main river and its tributary. Therefore, the one dimensional approach gives satisfactory results in the case of simple systems, for example, where the main river does not capture major tributaries. One dimensional approach may not give satisfactory results for the evaluation of flood risk in situations where floods occur on two or more rivers and join together at the same time. The bivariate statistical approach to analysis of flood events should be further developed and defined at neighbouring profiles on the main river and its tributaries. Prohaska et al. (1999) dealt with synchronously occurring flood waves on the Danube and its tributaries. Their analysis was based on the theory of bivariate variable statistics and results confirmed that flood wave genesis is very complex within the Danube basin. In Slovak territory the coincidence of multiple flood waves caused the flood with time return period of 100-year on Tisa and Bodrog River in year 2000. For example the flood occurred in August, 2002 in Czech Republic on the Vltava River and Dyje River showed an increase in return period of the discharges with an increase in area of the basin. It was caused by coincidence of the flood waves in profiles of the river network (Report of T. G. Masaryk Water Research Institute, 2002). Chen et al. (2012) dealt with analysing the flood risk due to the correspondence of flood discharges at the main river and its tributaries using selected marginal probability distributions and multidimensional (4D) copula functions. Authors evaluated X-Gumbel copula function as appropriate for joint conditional distribution function and return periods of joint discharges. Espinoza et al. (2013) analysed formation of the floods on the Amazonia River and its tributary. They focused on the flood occurred in 2012, when coincidence of two large flood waves has occurred. The coincidence of flood peaks from the Changjiang River and four other rivers, and the highest precipitation disaster drivers for Dongting Lake region flood vulnerability was study in Li et al. (2013). Joint probability of two random variables in contrast to the conditional probability reflects the probability with which the two random variables occur simultaneously. Gupta et al. (1976) dealt the using of the joint distribution function in estimating relationship of the large floods and their return period taking into account the seasonal influences. Bender et al. (2016) analysed flood peaks of two streams where was unlikely that all block maxima values occurred simultaneously. They concluded that in the majority of cases four marginal distributions and two distinct joint distribution functions are required to fully describe the stochastic behaviour of the system. Gilja et al. (2018) dealt with joint frequency analysis at the river confluences. The research conducted that flood hazards at the Sava River could be underestimated by traditional univariate analysis. Tadić et al. (2016) analysed the joint occurrence probability of floods on the Rivers Danube and Drava near Osijek. Their results showed that the probability of such situation is low (0.79%) but they reminded that such a situation occurred in 1966 and it was one of the biggest floods. Therefore, the results obtained by the bivariate analysis of the variables which characterize the hydrological regime can contribute to a more reliable assessment of the flood risks. The main aim of this research is to analyse the floods occurred at the Morava River and its tributary Myjava and provide a practical approach for the designed flood estimation in areas of the rivers confluence. In the context of climatic extreme events, statistical techniques such as event coincidence analysis will be relevant for investigating the impacts of anthropogenic climate change on human societies and ecosystems globally. The results obtained by the bivariate (as well as threevariate) analysis of the variables which characterize the hydrological regime can contribute to a more reliable assessment of the flood risks. In this research we focus upon an:  description of the methodological approach;  preparing of the data and identification of the variable pair combination for analysing the relationship between flood wave discharges on the main river and its tributary;  identify univariate distribution function of the variables (discharges) of the main river and tributary;  identify bivariate joint distribution function of the selected pair;  compare joint exceedance probability with univariate flood frequency analysis based on the measured values on downstream of the confluence.

Methodology
Copula functions were used as a mathematical tool for determining a joint cumulative distribution of two dependent variables. According the Sklar (1959) theory for two dimensional (bivariate) distribution function H(x,y), we can write: where F(x)=u and F(y)=v are marginal distribution functions. If F(x) and F(y) are continuous, then the copula function C is unique.
The identification of the univariate (marginal) distribution is the first step of the bivariate analysis. The random variables may have different properties and thus need to be converted to variables having interval of [0, 1] by scaling the data. Knowing the marginal distribution, we are able to separate marginal behaviour and dependence structure. The dependence structure is fully described by the joint distribution of uniform variables obtained from marginal distribution. To determine univariate parametric distribution functions (marginal distributions), standard MLM (maximum likelihood method) method was used. According to the goodness-of-fit tests (Kolmogorov-Smirnov and χ2) the marginal distributions can be selected.
In our study we used the Archimedean class of copula functions. Among existing types of copulas, the Archimedean one is the very popular class used in hydrological application (Zhang and Sing, 2006;Favre et al., 2004;De Michele et al., 2005;Gargouri-Ellouze and Eslamian, 2014;Dehghani, 2019 etc.). This class of copulas is popular in empirical applications for flexibility, easy construction and includes a whole suite of closed-form copulas that covers a wide range of dependency structures, including comprehensive and non-comprehensive copulas, radial symmetry and asymmetry, and asymptotic tail dependence and independence. The Clayton, Gumbel-Hougaard and Frank copulas were selected for this study (Table 1). The copula parameter θ was estimated using a mathematical relationship between the Kendall`s coefficient of rank correlation and the generating function φ(t) (Nelsen, 2006). Testing how well a statistical model (copula in this case) describes a set of observations is discussed as a topic in literature (e.g. Kojadinovic and Yan, 2011;Karmakar and Simonovic, 2009;Shiau et al., 2010;Chowdhary et al., 2011;Genest et al., 2009;Bender 2016). According to Meylan et al. (2012) we can divide these tests into three groups: a) based on probability integral transformation; b) based on the kernel estimation of the copula density and c) based on the empirical process of copulas. The first implies conditioning on successive components of the random vector and has the drawback of depending on the order in which this conditioning is done. The second category of tests depends on various arbitrary choices, such as the kernel, the window size and the weight function, which make their application cumbersome. Several goodness-of-fit tests can be used for comparison of the empirical joint probability population and the probability population derived by parametric copulas (e.g. Kolmogorov-Smirnov, Ci-square, Table 1. Probability functions, parameter space, generating function and relationship of nonparametric dependence measure with association parameter for the selected Archimedean copulas Anderson-Darling or Cramér-von-Mises). The empirical probability (Gringorten, 1963;Cunnane, 1978;Yue et al., 1999;Zhang and Singh, 2006) represents Equation (2).
where N is the total number of the variables, j and i ascending ranks of xi and yi , nml is the number of occurrence of the combinations of xi and yj.
In hydrological frequency analysis, the return period of the hydrological variable that occurs once in a year can be defined as: ( 3) where T is the return period in years and F(x) is univariate cumulative distribution function and  represents average number of events per year. In frequency analysis of annual maximum value of  is equal (1).
In multivariate statistical analysis, we can determine the return period of the phenomenon in two ways. The first is a joint return period, while the second is a conditional return period. The first one defines joint return periods as: the return periods using one random variable equalling or exceeding a certain magnitude and/or using another random variable equalling or exceeding another certain magnitude. The second one is conditional return period for one random variable, given that another random variable equals or exceeds a specific magnitude. Joint return period for two variables defined by more authors (Shiau, 2003;Salvadori and De Michele, 2006) and it can be written in the form of: Equation (4) represents the joint return period of X  x and Y  y. Equation (5) represents joint return period of X  x or Y  y. These relationships indicate, that different combinations of the numbers x and y, can take same return period (relation 8). H(x, y) is the joint cumulative distribution function (can be expressed as copula function).
Conditional return period for X, given Y  y may be expressed as (Shiau, 2003): where x and y are random variables and H(x, y) is the joint cumulative distribution function. Conditional cumulative distribution function of X, given Y  y may be expressed as: where H(x, y) is the joint cumulative distribution function of the random variables X and Y, and F(y) is cumulative distribution function of the variable Y. An equivalent formula for conditional return period of Y  y, given X  x can be obtained.

Study area
The paper is focused on the bivariate analysis of the floods occurred at the main river and its tributary for the designed flood estimation in areas of the rivers confluence. The Morava is a left tributary of the Danube River in Central Europe. The length of the Morava River is 329 km and its basin area covers 26 579.7 km 2 . The river originates on the Králický Sněžník mountain near the border between the Czech Republic and Poland and has a vaguely southward trajectory. The lower part of the river's course forms the border between the Czech Republic and Slovakia and then between Austria and Slovakia.
The Myjava River is a river in western Slovakia and for a small part in the Czech Republic and left tributary of the Morava River. The length of the Myjava River is 79 km and its basin area is 806 km 2 . It rises in the White Carpathians near the village of Nová Lhota in Moravia, but crosses the Czech-Slovak border shortly afterwards and flows in a southern direction until the town of Myjava, where it enters the Myjava Hills and turns west. Near Sobotište it flows into the Záhorie Lowland and turns south until the village of Jablonica, turning northwest and from Senica it flows west, passing through Šaštín-Stráže and finally flowing into the Morava River near Kúty. Table 2 lists selected main river, tributary, gauging stations and measurement period. The first year of the analysed period is the beginning of data measurements in current stations. The scheme of the selected rivers confluence is presented in Fig.1.

Preparing of the data
With regard to flood coincidence analysis, it is necessary to consider gauging stations immediately downstream and upstream from the tributary. In our work we investigated combination of the first pairs of Qamaxup -Qamaxtr and second pairs of Qmaxup -Qcor1tr. Where: Qamaxup, Qamaxdwn, Qamaxtrare annual peak discharges on upstream and downstream on main river and the annual maximum discharges on tributary; Qmaxdwnis maximum daily discharge on the main river downstream from the confluence (same event like in upstream station); Qmaxupis maximum daily discharge on the main river upstream from the confluence; Qcor1tris corresponding discharge on the tributary in the moment of occurrence of the maximum daily discharge on the main river upstream from the confluence (may be moved one or two days).
Discharges time series of the annual peak discharges Morava and Myjava reveal that high annual peaks at the Myjava River do not frequently coincide with high annual peaks at Morava River (Fig. 2). The analysis of flood hazard is based on the continuous record of discharges on a gauging station that reflects water regime in particular section. As the first we analysed pairs of annual peak discharges Qamaxup -Qamaxtr. In this study only the combination of the values of annual peak discharge were analysed not they occurrence within the year. Selected and analysed pairs of annual peak discharges Morava -Myjava confluence are presented in Fig. 3a. The linear Pearson correlation coefficient of this pairs is R=0.63 and Kendall rank correlation coefficient is τ=0.42. The annual maximum (AM) series approach is the most frequently used in probabilistic hydrology. But this data series are limited by two factors: 1) the length of the series of annual maxima can be very short and 2) the annual maxima time-series may be interrupted and thus they may not allow us to infer the antecedent conditions in the basin preceding a given peak. The first limiting factor produces uncertainties in interpreting statistical analyses, while the latter constrain implies that statistical models built on a phenomenological basis must rely on ancillary data in order to validate the underling hypotheses (Claps and Lio, 2003). This situation is avoided in the Peaks Over Threshold method (POT). Data series of the POT method consider all values exceeding a certain predefi-   (Bayliss, 1999;Rao and Hamed, 2000). The POT method has been proposed as an alternative analytical tool to the AM approach in analysis of extreme hydrological events. This method was discussed in a number of papers (Langbein, 1949;Todorovic, 1970;Cunnane, 1973;Rosbjerg, 1977;Madsen at al., 1997a;Madsen at al., 1997b;Lang at al., 1999;Bača and Bačová Mitkova, 2007;Bačová Mitková and Onderka, 2010). Therefore, in practice it seems to be more meaningful to consider not only the annual discharge maxima but also flood events that exceed safety limits. The first threshold can be chosen near the long-term mean discharge. This value is rather low; POT series can have high diffusion and can include some insignificant maxims. Therefore, a threshold value is usually chosen so that POT data series includes in average 4 maximum values per year and flood events must be independent (Bayliss, 1999). The threshold value in our calculations was appointed at the level of 40-50% of the long-term maximum annual discharge in order to ensure the independence of the waves and to include all significant events in the analysed year. On the basis of daily discharges and POT method the flood waves were selected for bivariate statistical analysis. The maximum mean daily discharges and synchronous discharges according to the above mentioned scheme were selected. The pairs of maximum daily discharges and correspondding daily tributary discharges of the Morava -Myjava confluence based on POT method are presented in Fig. 3b. The linear the Pearson correlation coefficient of this pairs are R=0.57 and Kendall rank correlation coefficient is τ=0.37.

Fig. 3. Selected and analysed pairs of a) annual peak discharges Morava -Myjava confluence and b) maximum daily discharges and corresponding daily discharge Morava -Myjava confluence (based on POT method).
The results of the correlation analysis of different combinations of the variables (discharge) show a statistically significant correlation. All of the combinations of the variables can be used on the following bivariate joint frequency analysis to investigate, how the relationship of the hydrological characteristics may affect the size and course of extreme hydrological situations.

Univariate Analysis
In order to determine univariate parametric distribution functions (marginal distributions), standard MLM (maximum likelihood method) method was used. According to the goodness-of-fit test (Kolmogorov-Smirnov) the marginal distributions where selected. The variables of the annual maximum (AM) approach were preferable fitted with Gumbel, Gamma and LogPearson III. The Gumbel distribution is asymmetric, extreme value distribution (EVD) and is used to model the distribution of the maximum (or the minimum). The Gamma two-parametric distribution is a very important model in statistical hydrology. It is a flexible function capable taking many different shapes and has been widely used in many countries for flood series modelling. The Pearson type III distribution is sometimes called three-parameter Gamma distribution, since it can be obtained from the two-parameter Gamma distribution by introducing location parameter τ. It is very flexible since it has three parameters which can produce a wide variety of shapes of density function. The variables derived by POT method was preferably fitted with distributions JohnsonSB, Gamma and LogPearson III. The JohnsonSB distribution is a continuous four-parametric distribution defined on bounded range, and the distribution can be symmetric or asymmetric. The Kolmogorov-Smirnov test was performed to test the assumption that the discharge magnitudes follow the theoretical distributions. The p-value (p≥0.05) was used as a criterion for rejection of the proposed distribution hypothesis. The fitted distributions, p-value of the goodness-of-fit test and calculated designed discharges are presented in Table 3. We cannot reject the hypothesis that selected distributions fit well to the observed data at a 5% significance level. Comparison of the AM and POT approaches shows significant differences (in average 13%) for discharges with return period of 50 and 100 years on tributary. The lower estimated values of designed discharges may be result of the creation of the analysed POT data series, especially for tributary where are corresponding discharges with Qmaxup and not only Qmaxtr. It effects the estimation of QN.

Joint bivariate analysis of the discharges at the rivers confluence using copulas
The joint probability distribution of two hydrologic variables at the confluence was evaluated by copulas to calculate the joint exceedance probability for the analysation of the flood hazard occurred at the confluence of the main river and its tributary. Results of the correlation show that there is a strong positive dependence between the discharges at river confluence with values of Kendall`s coefficient of rank correlation ranked over 0.3. The values of the estimated parameters of selected Archimedean copula functions are listed in Table 4. Results of the comparison of the joint empirical probabilities with fitted parametric copula showed that computed errors of the estimation reached relatively small differences between all three tested Archimedean copula functions (Fig. 4). The Kolmogorov-Smirnov test was performed to test the assumption that the joint pair magnitudes follow the theoretical joint distributions (copula). The p-value (p≥0.05) was used as a criterion for rejection of the proposed distribution hypothesis. The fitted joint distributions (copula) and p-value of the goodness-of-fit test are presented in Table 4. We cannot reject the hypothesis that selected joint distributions fit well to the observed data at a 5% significance level. Based on the results of non-parametric test the all three Archimedean copula functions were used to determine the joint probability distribution of the pair variables. Subsequently, the all selected copula functions were used for simulation of 3000 pairs for all combination of variables on selected river (Fig. 5). The figures show the scatter plot of measured data pairs and simulated values generated from copula models for the discharge pairs at the confluence. Simulated pairs were performed to determine the joint probability distribution using copulas and consequently to determine the joint occurrence probability of the variables. For traditional method, the resulting discharge for the selected return period is   calculated as reciprocal of the probability of exceedance. The discharge calculated using the copulas can lie anywhere on the isoline representing return period and can have infinite combinations. Since the worst-case scenario is regarding flood hazard at confluences, the extreme value of combined discharges was calculated, i.e. the maximum discharge resulting from infinite combinations of discharges at the Morava River and its tributary Myjava River. The comparison of the characteristic designed discharges (QN) calculated using traditional univariate distribution and bivariate copula distribution at annual peak discharges and maximum daily discharges are presented in Table 6.

Conclusions
The paper presents the bivariate joint frequency analysis of the discharges at the main river and its tributary. The annual peak discharges and mean daily discharges of the Morava and Myjava confluence were analysed (period of 1968-2011). For analysis the two statistical approaches were used. The first was traditional univariate approach and the second one was the bivariate joint distribution approach using the copulas. The results in this paper show that:  The type of theoretical probability distribution as well as the type of used input data series significantly affects the estimated QN.  Comparison of the AM and POT approaches shows significant differences (in average 13%) for discharges with return period of 50 and 100 years on tributary.  Comparison of the AM and POT univariate approaches with selected marginal distributions didn't show significant differences for estimated maximum discharges with return period up to 100 years.  All tested Archimedes copula function achieved relatively small differences between calculated errors of estimation.  According to the KS test we cannot reject any of tested Archimedean copula function that the joint pair magnitudes follow the theoretical joint distributions.  Discharges estimated from Gumbel-Hougaard copula are higher than the ones calculated using the univariate method about 12%. Gumbel-Hougaard copula can well describe the multivariate relationship and improve the abnormal crossing phenomena, so it can give more reasonable results and be further applied to hydrological extreme analysis.
In conclusion, we can say, that the selection of the distribution function to estimate T-year discharges and also the processing of the statistical data series affects the results of the estimation. Determining the specific value of a 500-or 1000-year flood for engineering practice is extremely complex and in interpreting the results, it should be kept in mind that estimated values with very high return periods are extrapolated values. Each statistical method includes some uncertainty that may be caused by the method but also the data may be affected by certain measurement error therefore, is also necessary to specify confidence intervals in which the flow of a given 100-, 500-, or 1000-year flood may occur with probability, for example, 90%. Results of the bivariate analysis showed that bivariate copula model can be successfully applied at locations where significant change in flow regime is present, or flood intensity is governed by several variables, such as at river confluen- ces. At river confluences, marginal distributions of inflow discharges rarely follow similar distribution, making copula model especially suitable for flood hazard assessment. When analysing the complex stochastic character of a river system on streams where the extreme event does not occur nearly simultaneous and where the tributary contributes significantly to the main river is necessary to analyse also pairs Qmaxtr -Qcorup (Bender et al., 2016).