An assessment of the uncertainty of the extremity of flood waves with vine copulas

Flood hazard connected with the failure of hydraulic structures and flood risk associated with areas subject to flooding need to be estimated using flood hydrographs. Multivariate statistics of flood wave parameters enable quantitative conclusions about such flood hazards and risks. This study focuses on uncertainties in their estimates using unbounded and bounded marginal distributions of flood durations in the joint modelling of flood peaks, volumes, and durations with vine copulas. We have respected the seasonality of floods by distinguishing between the durations of summer and winter floods. We propose to use the bounded Johnson’s SB distribution to represent the hydrological constraints associated with flood durations. The practical consequences of selecting various unbounded and bounded distributions for modelling flood durations for the joint overall and conditional probabilities of the exceedance of flood peaks, volumes, and durations were demonstrated on data from the Parná River in Slovakia. Differences in modelling joint probabilities due to the tail behaviours of the marginal distributions tested were found. Although these are not critical for practical applications, accepting upper and lower bounds as hydrological constraints improves the quality of the statistical models.


Introduction
Managing flood risks relies, among other inputs, on hydrological design values, which are used for sizing hydraulic structures (reservoir volumes, spillways, flood levees, etc.), protecting residential zoning, and other areas at risk of flooding. The most recognised quantity in hazard estimation is the peak of a design flood, which is assessed by classical flood frequency analysis. Since specific hydraulic structures or flooding zones can only be correctly sized with a design input hydrograph, the knowledge of a such flood hydrograph (or at least of its selected parameters) is advantageous for a reliable design and adequate description of the impact of flooding events (Brunner et al., 2016a;Brunner, 2023;Kotaška and Říha, 2023;Škvarka et al., 2021). Another problem of univariate analysis is, that it does not consider the dependence between the hydrological control variables, e.g., flood peaks and volumes, which leads to inappropriate conclusions about the risks associated with their join impact (Rizwan et al., 2019). Interdependencies inherent among parameters of flood waves, which are of interest for a particular problem, must be accepted and assessed within a multivariate probabilistic framework (Brunner, 2023). Recently efforts were therefore directed towards a multivariate probabilistic analysis of two or more flood wave characteristics (Xiao et al., 2009). Classical applications of bi-and tri-variate probability distributions were initially preferred. Rizwan et al. (2019) provided a short review of these methods. Unfortunately, their inherent limitations, such as the requirement that the marginal distributions of the constituent variables be identical, are challenging to meet in practice since the suitable marginal distributions of the multidimensional analysis usually differ. In addition, the mathematical formulations exceeding two dimensions are increasingly complicated (Rizwan et al., 2019). Multivariate statistical analysis using the copula functions was introduced as a solution; it provides a convenient framework for estimating overall or conditional joint probabilities. Copulas allow for different marginal distributions of their components. These are connected by the copula function and which accounts for the dependence structure between them in the probability space. For a comprehensive overview of these developments, methods, and applications, see Nazeri Tahroudi et al. (2022); Tootoonchi et al. (2022); Größer and Okhrin (2022). Copula methods efficiently enable the assessment of joint and conditional probabilities of exceedance or return periods of a set of variables. Whereas, probabilities of exceedance in a univariate case, are uniquely defined, in a multivariate case we need to prefer one definition out of several possibilities for the practical problem given; an overview is provided by Gräler et al. (2013) and Brunner et al. (2016b). In recent years, vine copulas have been suggested as convenient models of higher dimensions (Tosunoglu et al., 2020). They can establish flexible dependence structures by mixing components modelled by bivariate copulas. Because of its practical convenience, the vine approach has recently received increased attention in studying multivariate hydrological analyses (e.g., Gómez et al., 2018;Brunner et al., 2019;Nazeri Tahroudi et al., 2022;Latif and Simonovic, 2022;Jafry et al., 2022), which has also led to increased efforts to conduct multivariate statistical analyses of entire flood events (e.g. Bačová Mitková et al., 2014;2023;Ganguli et al., 2013;Requena et al., 2013;Rizwan et al., 2019). For example, Medeiro et al. (2010) described an ensemble of design flood hydrographs given by the peaks and volumes of peakvolume probabilistic spaces. A set of hydrographs with the same joint return period was proposed for a reservoir design, which leads to return periods for water levels for different flood peaks and volume combinations. However, similarly to the case of design flood peaks, hydrologic flood wave characteristics and their respective parameters are inherently uncertain. The sources of the uncertainty are rooted in the length and quality of the hydrological records, the selection of uni-and multivariate statistical models, and the estimation of the parameters. To guarantee reliable flood wave estimates for designs, such uncertainties must be quantified and communicated to practitioners (Brunner et al., 2018;Liová et al., 2022). Traditional uncertainty analyses of univariate design variables in hydrology are well established; they focus on many relevant aspects of the problem, such as the choice of the probability distributions, their parameter uncertainties, and comparisons of annual maxima sampling versus peak-over-threshold sampling (Beven and Hall, 2014;Kjeldsen et al., 2014). For a bivariate analysis, the problem becomes more complex. Brunner et al. (2018) presented a complex framework for quantifying and solving uncertainty issues, which could also be generalised to a multivariate case. The framework reflects, among other issues, problems of the length of the record, flood wave sampling, base flow separation, determination of marginal distributions and fitting of probability density functions, dependence modelling between the variables, and estimations of the joint and conditional design variable quantiles. This study aims to invite the attention of practising risk analysts to one particular aspect of uncertainty associated with this new consistent probabilistic framework not tackled in previous studies. In the dependence modelling of peaks, volumes, and durations with vine copulas, we are only focusing on the consequences of respecting the hydrological constraints connected with modelling flood durations and propose to use the bounded Johnson distribution to describe them. While all sources of uncertainty mentioned above are acknowledged, we are ignoring here these uncertainties. Instead we pragmatically accept results of statistical testing when selecting the dependence structures among the peaks, volumes, and durations of the flood hydrographs modelled by vine copulas their marginal distributions. We are also robustly respecting the hydrologic seasonality of floods by distinguishing between summer and winter floods since flood types of diverse origins have different shapes and consequently exhibit other dependence structures between the characteristics of flood waves (e.g., Gaál et al., 2015;Grimaldi et al., 2016;Szolgay et al., 2015). Finally, the practical consequences for the joint overall and conditional probabilities of the exceedance of flood peaks, volumes, and durations of choosing a bounded distribution for the flood durations are compared with the performance of distributions without an upper bound. The comparison was demonstrated on summer and winter flood data from the Parná River above the Horné Orešany reservoir in Slovakia.
Here, the vine copula approach is used to construct copula C by using a graphic vine tool (see, e.g., Asquith (2022); Czado and Nagler (2022)) and bivariate copulas as building blocks, so that a multivariate copula is obtained through conditioning. Since, in practice, the flood peak is usually considered the most critical flood characteristic, we have chosen this variable as being conditioned upon; then the trivariate density is given by: where | ( , ) = ( , )/ ( ) is the conditional density function of Xi given Xj. Further, the copula density cij couples Xi and Xj, while cij|k couples the bivariate conditional distributions of Xi|Xk and Schirmacher and Schirmacher (2008) and Dissmann et al. (2013) for further details. Bivariate copulas, which are often used in hydrology, such as those from the Archimedean class (the Clayton, Gumbel, Frank, Joe and BB1 to BB8 copula families) with their survival counterparts and copulas of elliptically contoured distributions (Gaussian and Student t-copula), were considered as building blocks of the vine copula. Their selection and estimation of the parameters were made using a sequential procedure based on the AIC criterion and the maximum likelihood method (Dissmann et al., 2013). The marginal CDFs of flood volume and discharge were selected from several distribution functions often used in the hydrology of extremes, such as the Generalized Extreme Value (GEV), Pearson 3 (p3), Weibull 3 (wb3) and Log-normal 3 (ln3) distributions. The models of the marginal distributions of flood durations considered with two to five parameters were selected from distribution functions with a lower bound, such as Wakeby (wak), Log-normal 3 (ln3), Gamma (gam), Rayleigh (ray), Rice (rice), Exponential (exp), Govindarajulu (gov), and Generalized Pareto (gpa) distributions. The parameters were estimated using the Lmoments (Hosking, 1990). Additionally, flood durations were also modelled by the bounded distribution of Johnson (1949), specifically Johnson's SB distribution (joh), which showed that a random variable X with an upper and lower bound could be transformed to an approximately normal distribution with the density function given as: The location parameter , the range parameter , and shape parameters  and were estimated by the maximum likelihood method. The best-fitting distribution function for each flood variable was evaluated based on the CDF plots and the statistical Kolmogorov-Smirnov test (at 0.05 of the level of significance). The mean square error (MSE) between the observed and the simulated flood variables were calculated to evaluate the performance of the distribution functions selected. The statistical analysis and the visualizations were performed using the following R packages: Vine Copula ; ggplot2 (Wickham, 2016); metR (Campitelli, 2021); and cubature (Narasimhan et al., 2023).

Trivariate flood wave characteristics on the Parná River
The analysis was carried out on the Parná River inflow into the Horné Orešany reservoir located in western Slovakia (Fig. 1). The gauging station is situated above the reservoir on the Parná River at rkm 26.8 and has a basin area of 37.86 km 2 . A more detailed description of the basin can be found in Liová et al. (2022). The discharge time series in hourly time steps and the maximum annual discharges used in the analysis were provided by the Slovak Hydrometeorological Institute (SHMI) for the period from 01.11.1988 to 31.12.2021. The precipitation and air temperature data in the daily time step, which were collected from the Dolné Orešany rainfall station and SHMI Modra-Piesok climatological station, were used for a better determination of the duration of the flood wave. The flood waves analysed differ significantly in their shape, volume, and duration. For example, winter flood waves have a longer duration and greater volume than summer flood waves, which often arise from storm events and are also associated with melting snow or a combination of snowmelt and rain. On the other hand, summer flood waves are slimmer in shape and have a shorter duration. For this reason, flood waves were analysed separately for the summer (June-October) and winter (November-May) seasons. The separation of the flood waves, base flow, and estimations of the flood wave characteristics have been processed by an algorithm used in the FloodSep method (Valent, 2019). Next, we focused on a more detailed analysis of the flood wave durations and their effect on estimating the joint probability of the exceedance of flood characteristics. Fig. 2 presents the probability density functions and histograms of flood durations in the summer and winter seasons. We can observe a unimodal form of both histograms. The summer flood durations have a range from 1.08 to 8.08 days and a median of 3.5 days.

Estimation of the best-fitting distributions of the flood peaks, volumes and durations and defining marginals
Prior to the modelling, the peak discharges, volumes, and durations were extracted from each flood wave to form the distributions of the extreme values in the summer and winter seasons. The most appropriate distribution functions were then found for the peak discharges, the Generalized Extreme Value distribution (gev), and, for the flood volumes, the Pearson type III distribution (p3). As a means of comparison, the DVWK (1999) methodology, which provided similar results of the distribution fitting, was also used. For the flood durations, the efficiency of several selected distribution models with a lower bound, which are frequently used in hydrological applications sensitive to extreme values, were evaluated in detail. The validity of the estimated models was examined based on the goodness of fit (GoF) using the MSE (Fig. 3) and visually on the CDFs of the flood durations (Fig. 4). This was the rationale for adopting the best types of the distributions of the flood durations in the paper. According to the results, the three appropriate distribution functions of the flood durations were selected to assess the uncertainty of the extremity of the flood events. In Fig. 3, these are shown by colour as joh (red point), ln3 (green point) and rice (blue point) distribution. The Log-normal 3 and Rice distributions represent those with only the lower boundary parameters. However, Johnson's SB distribution can estimate not only the lower bound but also the upper bound of flood durations. The upper bound of the flood durations in the summer season using Johnson's SB distribution at 12 days and in the winter summer season at 37 days was estimated.

Modeling of unconditional joint probability of the exceedance of flood characteristics
To analyze the joint probability of the exceedance of the characteristics of maximum summer and winter floods, we evaluated the joint probability of the exceedance of the flood peaks "AND" the flood volumes "AND" the flood durations denoted as (1-F(Qmax,V,D)) for three various distribution functions the duration of floods (see Tables 1, 2). The most significant differences in the joint probability of the exceedance of flood peak Qmax, flood volume V, and flood duration D for summer flood waves were found for those with the longest durations. For example, for the longest flood wave, i.e., No. 25, with a duration

Modeling of the conditional probability of the exceedance of flood characteristics
In the next step, the uncertainties of the selection of the marginal distribution functions of the flood durations were demonstrated by the joint conditional probability of the exceedance of V and D on the 100-year flood peak Q100, written as S(V,D‫|‬Q100). From the marginal distribution of the flood peaks, the design discharge with the probability of exceedance 0.01 (100-year flood) was estimated as 32.76 m 3 s -1 . The results of the conditional probability of exceedance are illustrated by the isolines of S(V,D‫|‬Q100) in Fig. 5. From the results illustrated by the isolines of the joint probability of the exceedance of V and D, it can be seen that for summer floods, the differences in the joint probabilities of exceedance S(V,D‫|‬Q100) due to various marginal distributions of D are visible for flood waves with durations longer than approximately 2 days. With the decreasing of the joint probability of the exceedance of V and D, the flood durations from which the differences are evident are increasing. While for the joint probability of exceedance S(V,D‫|‬Q100) of 0.9, it is approximately 2 days; for S(V,D‫|‬Q100) = 0.7, it is 3 days; and for S(V,D‫|‬Q100) = 0.3, it is 4 days. We can see similar results in the winter season, but the flood durations from which the differences in S(V,D‫|‬Q100) are visible are longer. Because S(V,D‫|‬Q100) = 0.9, the flood durations are approximately 2.5 days; for S(V,D‫|‬Q100) = 0.7, they are approximately 5 days; and for S(V,D‫|‬Q100) = 0.3; they are approximately 10 days.

Fig. 5.
The isolines of the joint conditional probability of the exceedance of the flood duration with various probability distributions "AND" the volume of the 100-year flood peak in the summer (left) and winter (right) seasons on the Parná River.

Discussion
Besides factors that cause uncertainties in frequency analysis, which are well reflected in the univariate methods (the length of the record, the annual maxima versus the peak-over-threshold sampling, the choice of the probability distributions, and their parameter uncertainties), in the multivariate case additional uncertainty sources occur, such as the sampling of flood waves (including base flow separation), determination of the marginal distributions, and fitting of multivariate probability density functions, dependence modelling between the variables, and estimation of the joint and conditional design variable quantiles (Beven and Hall, 2014;Kjeldsen et al., 2014). This study aimed only at attracting the attention of practising analysts to one particular hydrological aspect of the uncertainty not tackled in previous studies. We acknowledge the importance of all uncertainty sources listed and treated in detail in Brunner et al. (2018). However, as is usual in a practical hydrological design exercise, we intentionally left out their assessments and pragmatically implemented the results of the basic statistical testing when selecting the appropriate models for the frequency and dependence analysis of the peaks, volumes, and durations of the flood hydrographs observed by vine copulas. The effect of baseflow considerations on the beginning and end of a flood was not evaluated here, too. We could also have discussed the options for the wave selection, such as the direct runoff separation or taking the entire hydrograph. Instead, the results from the FloodSep method described in Liová et al. (2022) were implemented. Nevertheless, we acknowledge the immense importance of using hydrologically sound and practically relevant flood wave separation methods for the outcomes of the analyses of volumes and durations. We also restricted the analysis of the typology of flood generation in the pilot catchment only to the basic seasonality of the floods by distinguishing between summer and winter floods. The discharge waves analysed differ significantly in their shape, volume, and duration, which can be seen, e.g., on the bimodal shape of the histogram of the annual maximum flood durations in Fig.6. Winter waves have longer durations associated with melting snow or combinations of snowmelt and rain. Their volumes are more significant than those of summer waves, which often arise from storm events and are slimmer in shape and have a shorter duration. For this reason, the flood waves were analysed separately for the summer (June-October) and winter (November-May) seasons, reaching a median of 3.5 days duration in the summer and 15 days in the winter seasons. The unimodal frequency histograms of the respective summer and winter flood durations showed that such a simplification might be acceptable in the given catchment. Although the seasonal histograms of flood durations are unimodal, we can also see that they are skewed (see Fig. 2), which could indicate that not all floods may have the same origin; e.g., in the winter, a pattern composed of a mix of flood events caused by snowmelt, prolonged rains, and rain on snow events could be hidden in a histogram. In such and similar cases in basins with a richer flood typology, respecting more flood types (short rain, long rain, rain on snow etc., see Gaál et al. (2015)) than in our method may be required since flood types of diverse origins have different shapes and consequently exhibit other dependence structures between the parameters of flood waves . We only focused in this analysis on the consequences of respecting or neglecting the obvious hydrological constraints connected with flood durations. To model a flood duration coming from a finite interval, we proposed to use the bounded Johnson's SB distribution to describe them. The SB Johnson distribution proved to be flexible. When respecting its domain of applicability defined through its third and fourth moments, see Johnson (1949) and Parresol (2003), it also enables the use of hydrologically determined upper and lower bounds. Practical consequences for the joint overall and conditional probabilities of the exceedance of the flood peaks, volumes, and durations of choosing a bounded distribution for the flood durations were compared with the performance of distributions without an upper bound, which is a hydrologically erroneous (but often used) assumption, see Figs. 4 and 7. Differences in the modelling of joint probabilities due to the different tail behaviors of the marginal distributions tested were found. Although these are not located in critical regions of joint probabilities for practical applications, accepting hydrological constraints as upper and lower bounds improves the statistical model's quality and is recommended.

Conclusion
The most recognized quantity in flood hazard estimations is the peak of the design flood. Univariate frequency analysis is used conventionally to assess its value and the associated uncertainties of its estimation. However, specific design tasks require considering the impact of the flood hydrograph, usually described by a set of hydrological control variables, e.g., the flood peaks, volumes or durations. Only by considering the dependence among these variables in the hazard estimation may lead to appropriate conclusions about the risks associated with the joint impact of these in their various combinations. The inherent interdependencies among flood wave shape control variables must be quantified within multivariate statistical assessment frameworks, which can also account for the uncertainties of the estimates (though in a more complex way than in a univariate case). Besides uncertainty-causing factors, which are reflected in univariate cases (such as problems involving the annual maxima versus peak-over-threshold sampling, the choice of the probability distributions, and their parameter uncertainties), additional issues such as flood wave sampling, base flow separation, fitting of multivariate probability density functions, the dependence modelling between the variables, and estimation of the joint and conditional probabilities of the design variable quantiles have to be considered. As previously mentioned, this study only intended to attract the attention of practising analysts to the uncertainty associated with the joint probabilities assessed within the vine copula probabilistic framework. These must be addressed in practice and in connection with statistical flood duration models. Therefore, our analysis did not consider the statistical sources of uncertainties but illustrated those originating in hydrological sources, which could require attention. These are mostly related to the sampling of flood waves and include the knowledge of flood generation processes (short rains, prolonged rains, rain on snow, snowmelt, etc.), the estimation of the flood volume (direct runoff vs. a whole hydrograph), and the determination of the duration of the flood waves. Regarding which flood generation processes to consider, using annual maxima in such an analysis may not meet the IID requirements in general. Instead, we recommend to consider the basic seasonality of the flood regime separately and to look within the seasons at a stepwise approach at the frequency histograms of the durations associated with respective flood types of the events sampled (unimodal vs. multimodal, symmetrical vs. skewed, etc.) and the practical goals of the design or risk analysis task. For example, frequency histograms of the respective durations of flood types may support a decision made on an acceptable grouping of sparser flood types (or eventually leaving them out) in the given catchment and for the specific task. In a decision on the selection of the flood volumes for the analysis, more practical aspects could be preferred and a task-related choice performed. For example, in sizing the retention volumes of multipurpose reservoirs or flood zoning, a whole hydrograph could be required for the design. To only consider direct runoff for flood detentions may not be correct since the detention volume is also filled with the flood's baseflow. On the contrary, direct runoff may be preferred when looking at a regime of event-based rainfall-runoff relationships. Determining the duration of a flood is connected to the decisions to be made on its volume, and it is not a straightforward task, especially when the whole hydrograph is taken as a flood wave. This determination is especially complicated in the winter season for snowmelt and mixed floods. Methods ranging from baseflow separations up to snowmelt and rainfall runoff modelling may be required for a hydrologically plausible solution. In any case, an upper and lower bound must be considered. The marginal distribution of the duration should preferably respect these bounds since, in the dependence modelling of peaks, volumes and durations, it affects the joint and conditional probabilities of the exceedance associated with any combined hazards. Our case study showed differences in the outcomes of modelling the joint probabilities due to the different tail behaviours of the marginal distributions tested. Although these may sometimes only be critical for practical applications, accepting hydrological constraints as upper and lower bounds improves the statistical model's quality. One computationally and conceptually acceptable option is the four-parameter bounded Johnson's SB distribution, as was illustrated in our case study.