Dataset: CLM MODFLOW Uncertainty Analysis


Description

Abstract

The dataset was derived by the Bioregional Assessment Programme from multiple source datasets. The source datasets are identified in the Lineage field in this metadata statement. The processes undertaken to produce this derived dataset are described in the History field in this metadata statement.

This dataset contains all the scripts used to conduct the uncertainty analysis for the maximum drawdown and time to maximum drawdown at the groundwater receptors in the Clarence-Moreton bioregion and all the resulting posterior predictions. This is described in product 2.6.2 Groundwater numerical modelling (Cui et al. 2016). See History for a detailed explanation of the dataset contents.

Dataset History

This dataset uses the results of the design of experiment runs of the MODFLOW groundwater model of the Clarence-Moreton subregion to train emulators to (a) constrain the prior parameter ensembles into the posterior parameter ensembles and to (b) generate the predictive posterior ensembles of maximum drawdown and time to maximum drawdown. This is described in product 2.6.2 Groundwater numerical modelling (Cui et al. 2016).

A flow chart of the way the various files and scripts interact is provided in CLM_MF_dmax_v02_Flowchart.png (editable version in CLM_MF_dmax_v02_Flowchart.gliffy).

R-script CLM_DoE_Parameters.R creates the set of parameters for the design of experiment in CLM_DoE_Parameters.csv. Each of these parameter combinations is evaluated with the groundwater model (dataset CLM groundwater model V1). Associated with this spreadsheet is file CLM_MF_Parameters.csv. This file contains, for each parameter, if it is included in the sensitivity analysis, tied to another parameters, the initial value and range, the transformation, the type of prior distribution with its mean and covariance structure.

The results of the design of experiment model runs are summarised in files CLM_MF_dmax_DoE_Predictions.csv, CLM_MF_tmax_DoE_Predictions.csv, CLM_MF_DoE_Observations.csv, which have the maximum additional drawdown, the time to maximum additional drawdown for each receptor and the simulated equivalents to observations respectively. The first two are generated with post-processing scripts in dataset groundwater model V1, while for the last file, additional script CLM_MF_postprocess_riverflux.py is used to summarise the simulated equivalents to the surface water groundwater exchange flux.

Spreadsheets CLM_MF_dmax_Predictions.csv and CLM_MF_tmax_Predictions.csv capture additional information on each prediction; the name of the prediction, transformation, min, max and median of design of experiment, a boolean to indicate the prediction is to be included in the uncertainty analysis, the layer it is assigned to and which objective function to use to constrain the prediction.

Spreadsheet CLM_MF_dmax_Observations.csv has additional information on each observation; the name of the observation, a boolean to indicate to use the observation, the min and max of the design of experiment, a metadata statement describing if the observation is steady state (SS) or transient (TR) and the source of the spatial coordinates (from dataset CLM - Bore water level NSW). Further it has the distance of each bore to the nearest blue line network and the distance to each prediction (both in km).

These files are used in script CLM_MF_SI.py to generate sensitivity indices (based on the Plischke et al. (2013) method) for each group of observations and predictions. These indices are saved in spreadsheets CLM_MF_SI_dmaxL1.csv, CLM_MF_SI_dmaxL2.csv, CLM_MF_SI_dmaxL3.csv, CLM_MF_SI_dmaxL4.csv, CLM_MF_SI_dmaxL6.csv, CLM_MF_SI_hobs.csv, CLM_MF_SI_Qcsg.csv, CLM_MF_SI_objfun.csv.

Script CLM_MF_dmax_ObjFun.py calculates the objective function values for the design of experiment runs. Each prediction in layer 1 has a tailored objective function which is a weighted sum of the residuals between observations and predictions with weights based on the distance between observation and prediction. In addition to that there is an objective function for the baseflow and CSG water production rates. The results are stored in CLM_MF_DoE_ObjFun.csv and CLM_MF_ObjFun.csv.

The latter files are used in scripts CLM_MF_dmax_CreatePosteriorParameters_oo.R and CLM_MF_dmax_CreatePosteriorParameters_gen.R to carry out the Markov Chain Monte Carlo sampling of the prior parameter distributions with the Approximate Bayesian Computation methodology as described in Cui et al (2016) by generating and applying emulators for each objective function. The scripts use the scripts in dataset R-scripts for uncertainty analysis v01. These files are run on the high performance computation cluster machines with batch file CLM_MF_dmax_CreatePosterior.slurm. These scripts result in posterior parameter combinations for each objective function, stored in directory PosteriorParameters, with filename convention CLM_MF_dmax_Posterior_Parameters_OO_%i_batch.csv % 1-982. The general posterior parameter distribution (i.e. without the distance weighted groundwater level observations) is stored in CLM_MF_dmax_Posterior_Parameters_gen_batch1.csv.

The same set of spreadsheets is used to test convergence of the emulator performance with script CLM_MF_emulator_convergence.R and batch file CLM_MF_emulator_convergence.slurm to produce spreadsheet CLM_MF_convergence_objfun_qriv.csv.

The posterior parameter distributions are sampled with scripts CLM_MF_dmax_MCsampler_OO_i.R, CLM_MF_dmax_MCsampler_gen_i.R, CLM_MF_tmax_MCsampler_OO_i.R, CLM_MF_tmax_MCsampler_gen_i.R and associated .slurm batch files. Files ending in OO_i.R sample for predictions that have a groundwater level observation constrained objective function, files ending in gen_i.R sample the predictions that have the general objective function. The scripts create and apply an emulator for each prediction. The emulator and results are stored in directory Emulators. This directory is not part of the this dataset but can be regenerated by running the scripts on the high performance computation clusters.

Script CLM_MF_collate_predictions.csv collates all posterior predictive distributions in spreadsheets CLM_MF_dmax_PosteriorPredictions.csv and CLM_MF_tmax_PosteriorPredictions.csv. These files are further summarised in spreadsheet CLM_MF_dmax_tmax_excprob.csv with script CLM_MF_exc_prob. This spreadsheet contains for all predictions the coordinates, layer, number of samples in the posterior parameter distribution and the 5th, 50th and 95th percentile of dmax and tmax, the probability of exceeding 1 cm and 20 cm drawdown, the maximum dmax value from the design of experiment and for the predictions in layer 1 the threshold of the objective function and the acceptance rate.

Dataset Citation

Bioregional Assessment Programme (2016) CLM MODFLOW Uncertainty Analysis. Bioregional Assessment Derived Dataset. Viewed 10 July 2017, http://data.bioregionalassessments.gov.au/dataset/25e01e3c-7b87-4200-9ef2-5c5405627130.

Dataset Ancestors

General Information

Distributions