• Nie Znaleziono Wyników

Inference of reactive transport model parameters using a Bayesian multivariate approach

N/A
N/A
Protected

Academic year: 2021

Share "Inference of reactive transport model parameters using a Bayesian multivariate approach"

Copied!
22
0
0

Pełen tekst

(1)

RESEARCH ARTICLE

10.1002/2013WR014156

Inference of reactive transport model parameters using a

Bayesian multivariate approach

Luca Carniato1, Gerrit Schoups1, and Nick van de Giesen1

1Department of Water Management, Delft University of Technology, Delft, Netherlands

Abstract

Parameter estimation of subsurface transport models from multispecies data requires the defi-nition of an objective function that includes different types of measurements. Common approaches are weighted least squares (WLS), where weights are specified a priori for each measurement, and weighted least squares with weight estimation (WLS(we)) where weights are estimated from the data together with the parameters. In this study, we formulate the parameter estimation task as a multivariate Bayesian infer-ence problem. The WLS and WLS(we) methods are special cases in this framework, corresponding to specific prior assumptions about the residual covariance matrix. The Bayesian perspective allows for generalizations to cases where residual correlation is important and for efficient inference by analytically integrating out the variances (weights) and selected covariances from the joint posterior. Specifically, the WLS and WLS(we) methods are compared to a multivariate (MV) approach that accounts for specific residual correlations with-out the need for explicit estimation of the error parameters. When applied to inference of reactive transport model parameters from column-scale data on dissolved species concentrations, the following results were obtained: (1) accounting for residual correlation between species provides more accurate parameter estima-tion for high residual correlaestima-tion levels whereas its influence for predictive uncertainty is negligible, (2) inte-grating out the (co)variances leads to an efficient estimation of the full joint posterior with a reduced computational effort compared to the WLS(we) method, and (3) in the presence of model structural errors, none of the methods is able to identify the correct parameter values.

1. Introduction

Reactive transport models are increasingly used to simulate complex chemical reactions in the subsurface. These models are typically characterized by flow, transport, and kinetic parameters, which need to be esti-mated through inverse modeling. During the last 30 years, inverse modeling methodologies have been extensively applied in the context of groundwater flow and contaminant transport [Yeh, 1986; Kool et al., 1987]. A review of the algorithms and methodologies used to infer flow and transport parameters from lab and field measurements is given by Vrugt et al. [2008b]. Inverse modeling of reactive transport is difficult because it is often ill posed and different models might reproduce the same set of measurements. Hill and Tiedeman [2007] report common issues encountered in the calibration of groundwater and transport mod-els, suggesting that unreasonable parameter estimates and large error variances indicate the presence of model error. Matott and Rabideau [2008b] developed a method where parameters and rate models are cali-brated simultaneously from measurements, effectively demonstrating that different model formulations result in a similar fit to the data. Thus, one major question in reactive transport model applications is how deficiency in model structures can be identified and corrected, allowing the modeler to formulate models capable of providing reliable predictions. Gupta et al. [2012] review how model error (or model structural adequacy) is addressed in different scientific communities (groundwater, unsaturated zone, terrestrial hydrometeorology, and surface water communities), including subsurface reactive transport modeling. In their work, it is acknowledged that identification of reaction networks from data is a new and challenging problem, requiring new methods to identify model inadequacy.

From a Bayesian point of view, parameters are estimated from measurements by their posterior density function (pdf) [Wagner, 1992; Medina and Carrera, 1996; Fienen et al., 2009; Vrugt et al., 2003]. Wagner [1992] showed how groundwater heads and concentration measurements can be integrated in a unique function to estimate flow and transport parameters. In the approach of Wagner [1992], the integration of different measurement groups in a unique objective function requires the estimation of the observation error

Key Points:

Effect of residual correlation is limited to strongly correlated cases

Linearly correlated residuals cannot account for model structural errors

Analytically integrating out covariances leads to efficient inference Supporting Information: Readme Tables S1–S4 Figure S1 Correspondence to: L. Carniato, carniato.luca@gmail.com Citation:

Carniato, L., G. Schoups, and N. van de Giesen (2014), Inference of reactive transport model parameters using a Bayesian multivariate approach, Water Resour. Res., 50, doi:10.1002/ 2013WR014156.

Received 21 MAY 2013 Accepted 15 JUL 2014

Accepted article online 18 JUL 2014

Water Resources Research

(2)

variances, in order to normalize the contribution of different sets. A similar approach was adopted in Barlebo et al. [2004], where error variances were estimated iteratively during parameter estimation. Medina and Car-rera [1996] and Fienen et al. [2009] extended the approach of Wagner by including prior knowledge of the parameter values in the objective function. Vrugt et al. [2003] considered the case where the distribution of the model residuals is not normal, providing a general form of the posterior parameter density function. An important issue in the aforementioned studies is the specification of the covariance matrix of the model residuals. Different assumptions about the structure of the covariance matrix lead to different estimation methodologies. In generalized least squares (GLS) [Draper and Smith, 1998], the residuals (observed values minus simulated counterparts) are assumed to be normally distributed with zero mean and a known covari-ance matrix. The weighted least squares approach (WLS) further assumes that the covaricovari-ance matrix is diag-onal with known variances (with weights inversely proportidiag-onal to the variances), whereas the ordinary least squares approach (OLS) also assumes all variances to be equal (homoscedastic). Several applications of these methods are reported in the literature. Hill et al. [1998] used a WLS method to estimate hydraulic parameters of a synthetic aquifer using heads and flow data. Sim˚unek et al. [1999] and Arora et al. [2011] used a WLS method to estimate the model parameters of a soil column experiment. Barth and Hill [2005] used a WLS method to estimate model parameters for a virus transport model. Bell et al. [2002] applied an OLS method to determine the sorption and diffusion parameters from a laboratory cell experiment. They used a data transformation to ensure that the assumptions of the OLS method were satisfied (uncorrelated and homoscedastic model residuals).

For the Maximum A Posteriori (MAP) estimation of the model parameters with the GLS, WLS, and OLS meth-ods, efficient gradient-based algorithms have been used, such as the Levenberg-Marquardt algorithm (e.g., UCODE [Poeter et al., 2005] and PEST [Doherty, 2011]). However, optimal parameter values found by these algorithms may represent a local minimum. To prevent the estimation of suboptimal parameters, global optimization algorithms can be used instead. The application of heuristic algorithms to the minimization of a WLS and OLS objective function has been less extensively reported. Mertens et al. [2009] and Matott and Rabideau [2008a] compared several algorithms for the minimization of WLS and OLS objective functions to infer the parameters of a complex biochemical model and a pesticide lysimeter test, respectively. Vrugt et al. [2008b] applied a multialgorithm method to infer the parameters of a three-dimensional soil vapour extraction model, relying heavily on parallel computing.

Bayesian approaches aim to integrate the full joint parameter posterior, usually using efficient sampling techniques (such as Markov Chain Monte Carlo or MCMC sampling [Vrugt et al., 2009a, 2008a]) and allowing an accurate quantification of parameter and prediction uncertainty. The cost of integrating the joint poste-rior is much higher compared to identifying only its mode (MAP estimation) and the application of Bayesian approaches in reactive transport is still limited to few studies [Reichert and Schuwirth, 2012; Shi et al., 2014; Zhang et al., 2013].

In most of the applications listed above, the WLS method was applied assuming that the total residual var-iance could be estimated a priori (for example, from an estimate of measurement and model error). In reac-tive transport models, residuals may be dominated by unknown model structural errors, for example, due to the incomplete knowledge of the chemical reaction network. In this case, residual variances (or weights) can be estimated together with model parameters using an iterative two-stage procedure [Sun, 1994], referred to here as WLS with weight estimation, or WLS(we). In the first stage, the model parameters are estimated keeping the residual variances fixed, whereas in the second stage the maximum likelihood method is used to estimate the variances, keeping the model parameters fixed at values obtained in the first stage. The method is described in Sun [1994] and applied in Dai and Samper [2004, 2006].

Finally, residuals may also be correlated (in space, in time, or between chemical species), as explicitly acknowledged in the GLS approach, where a full covariance matrix is specified a priori. However, to the extent that such correlations arise from a priori unknown model errors, it may be necessary to also estimate residual correlations from the data [Kuczera, 1983; Schoups and Vrugt, 2010]. For this purpose, typically sim-ple correlation structures, such as first-order autoregressive models, are postulated a priori and their param-eters estimated from the data.

In this study, we formulate the reactive transport model parameter estimation task as a multivariate Bayes-ian inference problem, thereby unifying and generalizing existing approaches. Specifically, the WLS and

(3)

WLS(we) methods are compared to a multivariate (MV) approach to parameter inference that accounts for general (linear) residual correlations without the need for simultaneous estimation of the covariance matrix. The paper is organized as follows. Section 2 presents the Bayesian framework for parameter inference. Sec-tion 3 introduces the case studies and reactive transport model used here as an applicaSec-tion, secSec-tion 4 com-pares the inference results obtained for the different methods for synthetic and real data, and section 5 provides conclusions.

2. Bayesian Framework

A reactive transport model generally contains a number of parameters, whose values are not known a priori, but which can be estimated from data, for example, from measured dissolved species concentrations. We consider here a data set consisting of n-independent sampling events, where for each sampling event one measures concentrations of m chemical species at a particular location and time. The data can be

assembled into a n times m matrix Y where each row corresponds to a sampling event of m measurements. The data matrix Y is subtracted from the corresponding matrix of simulated concentrations, g(/), to yield a matrix of residuals e, e5FðYÞ2Fðgð/ÞÞ5 e11 . . . e1j . . . e1m ⯗ ⯗ ⯗ ei1 . . . eij . . . eim ⯗ ⯗ ⯗ en1 . . . enj . . . enm 0 B B B B B B B B @ 1 C C C C C C C C A ; (1)

where / is the vector of model parameters, and F is a matrix of monotonic transformations (e.g., log trans-forms) applied to both measured and simulated concentrations, such that for each sampling event i 5 1,. . ., n, data vectors e(i)(corresponding to rows in e) are (approximately) independent and identically distributed according to an m-variate zero-mean Gaussian Nm(0, R) with mxm covariance matrix R. This Gaussian distri-bution accounts for random fluctuations of the (transformed) residuals about zero due to a combination of measurement errors (e.g., sampling and analytical errors), model input errors (e.g., misspecified boundary conditions), and model structural errors (e.g., when an important reaction is not accounted for in the model). Hence, in this formulation errors from various sources are lumped together into an overall residual term, as opposed to explicitly separating out model and observation errors [Kennedy and O’Hagan, 2001; Reichert and Schuwirth, 2012].

Given these assumptions, the likelihood function l(/,RjY) is obtained as the product of n m-variate Gaus-sians [Box and Tiao, 1992],

lð/; RjYÞ  pðej/; RÞ5ð2pÞ2mn=2jRj2n2exp 21

2 Xn i51 eTðiÞR21eðiÞ ! ; (2)

wherejRj is the determinant of the covariance matrix. For example, in the case study reported in the next section we consider measurements of four different chemical species (m 5 4), namely calcium, iron, total inorganic carbon or TIC, and pH, such that the R matrix becomes,

R5 r2

Ca21 covCa21Fe21 covCa21TIC covCa21pH

covFe21Ca21 r2Fe21 covFe21TIC covFe21pH

covTICFe21 covTICCa21 r2TIC covTICpH

covpHCa21 covpHFe21 covpHTIC r2pH

0 B B B B B @ 1 C C C C C A ; (3) where r2

j are the variances of the different species, and cov are the covariances between species. Hence,

whereas residuals are assumed independent between sampling events (different locations and times), resid-ual correlation between species is taken into account. For our application, F consists of log transforms, except for pH, which already is a log-transformed concentration.

(4)

lð/; RjYÞ / jRj2n2exp 21

2trR

21Sð/Þ

 

; (4)

where tr stands for trace or sum of diagonal elements of the S(/) matrix defined in our case study as,

Sð/Þ5 Xn u51 e2u;Ca21 Xn u51 eu;Ca21eu;Fe21 Xn u51 eu;Ca21eu;TIC Xn u51 eu;Ca21eu;pH Xn u51 eu;Fe21eu;Ca21 Xn u51 e2u;Fe21 Xn u51 eu;Fe21eu;TIC Xn u51 eu;Fe21eu;pH Xn u51 eu;TICeu;Ca21 Xn u51 eu;TICeu;Fe21 Xn u51 e2u;TIC X n u51 eu;TICeu;pH Xn u51 eu;pHeu;Ca21 Xn u51 eu;pHeu;Fe21 Xn u51 eu;pHeu;TIC Xn u51 e2 u;pH 0 B B B B B B B B B B B B B B B B @ 1 C C C C C C C C C C C C C C C C A : (5)

The likelihood function is augmented with prior probability density functions (pdf) for / and R, to yield the posterior pdf p(/,RjY), which forms the basis for estimating / and R from the data,

pð/; RjYÞ / lð/; RjYÞpð/Þp Rð Þ; (6)

where / and R are assumed independent a priori. Following Box and Tiao [1992, p. 426], we assume a uni-form prior for /, i.e., pð/Þ / 1, such that the joint posterior becomes,

pð/; RjYÞ / jRj2n2exp 21

2trR

21Sð/Þ

 

p Rð Þ: (7)

Various cases can now be distinguished depending on which prior assumptions are made about the covari-ance matrix R:

R is fixed a priori and assumed diagonal (WLS). In this case, the off-diagonal elements of R are zero, and the diagonal elements (variances) are specified a priori, e.g., based on known measurements errors (as in Table 1 for the case study described in the following section) or estimates of model errors [Hill and Tiedeman, 2007, pp. 300–301]. As such, the posterior in equation (7) reduces to,

pð/jY; RÞ / exp 21 2trR 21Sð/Þ   5exp 21 2 Xm j51 1 r2 j Sjjð/Þ ! ; (8)

with species-specific weights equal tor12 j

. This posterior has been widely applied in inverse modeling studies [see, for example, Hill et al., 1998; Sim˚unek et al., 1999; Barth and Hill, 2005; Arora et al., 2011]. Maximizing (8) Table 1. Experimental Errors for Each of the Measurement Sets

Data Set

Measurement

Error Min Valuea Max Value

Standard Deviation Untransformed Variable rMEij,untr b Transformation F Standard Deviation Transformed Variable rMEj, WLSc Standard Deviation Transformed Variable rj5 ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi r2 MEj1r2MOj q , WLS(we)c Standard Deviation Transformed Variable, MV Calcium 14% 60 mg L21 208.29 mg L21 0:14C 1:65 Log 0:141:65 0:14 1:65  2 1r2 MOCa21 h i0:5 Estimated from R MAP Iron 7% 0.57 mg L21 49 mg L21 0:07C 1:65 Log 0:07 1:65 0:07 1:65  2 1r2 MO;Fe21 h i0:5 Estimated from R MAP Total inorganic carbon (TIC) 30% 2.98 mg L21 120 mg L21 0:30C 1:65 Log 1:650:3 0:30 1:65  2 1r2 MO;TIC h i0:5 Estimated from R MAP pH 0.01 pH units 6.97 8.71 0:01 1:65 None 0:011:65 0:01 1:65  2 1r2 MO;pH h i0:5 Estimated from R MAP a

Considering all experiments, detection limits for calcium, iron, and TIC were 0.05, 0.02, and 0.2 mg L21

respectively.

b

The standard deviation was calculated assuming a normal distribution of the residuals and that 90% of the times the residuals lie within the interval defined by the measurement error [Hill and Tiedeman, 2007].

c

Weights for the WLS and WLS(we) were calculated as the inverse of the standard deviations. Note that for the WLS case, rMEjis assumed equal to the coefficient of variation cvjof

(5)

is equivalent to minimizing its negative logarithm, which leads to the WLS objective function [Carroll and Ruppert, 1982; Hill, 1998]: OBFWLS5 Xm j51 1 r2 j Sjjð/Þ: (9)

R is unknown a priori and assumed diagonal (WLS(we)). Often it is not possible to specify R a priori, for example when the residuals are affected by a priori unknown model input and/or model structural errors. In this case, it is useful to estimate R from the data [Lu et al., 2013]. A common approach is to assume that the off-diagonal elements of R are zero (no residual correlation between species), and to then estimate the diagonal elements of R (variances) from the data. If one takes a uniform prior, i.e., p Rð Þ / 1, the posterior becomes, pð/; RjYÞ / jRj2n2exp 21 2trR 21Sð/Þ   5 Y m j51 r2nj ! exp 21 2 Xm j51 1 r2 j Sjjð/Þ ! ; (10)

where variances (one for each measurement set j) and model parameters are inferred jointly from the data. In principle, the variances should be estimated for each residual eij, because errors can be different at each time and location. However, this leads to an underdetermined inverse problem, where the number of parameters (model plus error parameters) exceeds the number of measurements. Therefore, assumptions about the statistical properties of the errors must be introduced [Schoups and Vrugt, 2010]. In this study, is assumed that the residual variance r2

j in the WLS(we) method to have the following form:

r2j5r2MEj1r2MOj j51; . . .; m; (11)

where r2

MEjis the measurement error variance (assumed to be known) and r2MOjis the remaining residual

variance due to unknown model input and/or model structural errors (which typically derives from model application and was estimated). In our approach, r2

j is assumed to be constant within the set j (note that

equation (11) refers to the variance of the transformed residuals). Maximizing the posterior (10) leads to the WLS method with weight estimation or WLS(we), where weights are given by1

r2 j

for each species j. Hence, weights are estimated for each measurement species j, unlike robust regression methodologies that aim to detect and adjust the weights of specific outliers [Huber and Ronchetti, 2009].

An arguably better prior (than uniform) for R, one that is scale-invariant, is the noninformative (Jeffreys) prior suggested by [Box and Tiao, 1992], namely pðRÞ /Qmj51r22

j . Note that this prior prefers small residual

variances r2

j, as opposed to the uniform prior used in (10). It results in the following posterior,

pð/; RjYÞ / Y m j51 r2n22j ! exp 21 2 Xm j51 1 r2 j Sjjð/Þ ! /Y m j51 v22n r 2 j Sjjð/Þ ! Sjjð/Þ2n 221 " # ; (12)

where, given /, each term r

2 j

Sjjð/Þ

has an inverse-chi-square distribution with n degrees of freedom. It is useful to analytically integrate out the variances (and thus R), leading to the following marginal posterior for the model parameters [Box and Tiao, 1992],

pð/jYÞ /Y

m j51



Sjjð/Þ2n2 (13)

An advantage of the marginalization step is that simultaneous joint estimation of model parameters / and the m variances (or weights) can be avoided. Instead, the model parameters can be inferred separately using their marginal posterior (for example, using MCMC sampling), followed by conditional sampling of the variances. In other words, writing the joint posterior of / and R given Y as:

pð/; RjYÞ5pð/jYÞp Rj/; Yð Þ: (14)

We first sample / from the marginal posterior p /jYð Þ using MCMC, and then sample each variance r2 j of

the covariance matrix R from p Rj/; Yð Þ, which amounts to drawing a number from a v22

n distribution and

multiplying it by Sjjð/Þ. Hence, no additional runs with the reactive transport model are required to estimate the full posterior pð/; RjYÞ once samples from the marginal posterior p /jYð Þ are available.

(6)

R is unknown a priori and allowed to be nondiagonal (MV). This is the most general case we consider, which accounts for residual correlation between species (although still assuming independence between sampling events). Following Box and Tiao [1992], we adopt a noninformative (Jeffreys) prior for R, i.e.,

pðRÞ / jRj212ðm11Þ, yielding the posterior (for n m),

pð/; RjYÞ / jRj212ðn1m11Þexp 21 2trR 21Sð/Þ   5W21 m ðR; Sð/Þ; nÞZð/Þ; (15) where W21

m ðR; Sð/Þ; nÞ is an inverse-Wishart distribution for R (given /) with scale matrix Sð/Þ, n degrees

of freedom, and normalizing constant Zð/Þ5cjSð/Þj2n2, with c a constant independent of /. Integrating out

R from the joint posterior then yields the following marginal posterior for / [Box and Tiao, 1992],

pð/jYÞ / jSð/Þj2n2; (16)

with the conditional posterior of R, pðRj/; YÞ, equal to W21

m ðR; Sð/Þ; nÞ and matrix S defined in equation

(5). As in the diagonal case, an advantage of the marginalization step is that joint estimation of model parameters / and all (1/2)m(m 1 1) distinct elements of R can be avoided. Similar to the diagonal case, the full posterior is obtained by MCMC sampling / from p /jYð Þ, followed by sampling a covariance matrix R from W21

m ðR; Sð/Þ; nÞ.

The mode of W21

m ðR; Sð/Þ; nÞ provides a direct estimation of the R for / 5 /MAP, i.e., RMAP

Sð/MAPÞ

n1m11: (17)

An important constraint in arriving at equation (16) is that the number of measurement events n is equal to or greater than the number of different groups m for the posterior in equation (15) to be valid (i.e., not sin-gular). Hence, assigning each data point to its own group is not allowed, since then n 5 1 < m. As indicated above, here we group residuals by species, but, for our case studies, alternatively grouping residuals by location (depth) or time, to account for correlation in space or time, would also comply with the constraint n m. Specifically, only a 4 3 4 covariance matrix was considered in equation (4), neglecting the residual correlations by depth and time. Finally, note that in deriving equation (16), no assumptions were made, beyond grouping the residuals, about the (linear) correlation structure. This is in contrast to using an a priori fixed correlation structure, such as a first-order autoregressive model [Kuczera, 1983; Schoups and Vrugt, 2010].

For the case studies reported below we use the DREAM-ZS (DiffeRentialEvolution Adaptive Metropolis algo-rithm) algorithm developed by Vrugt et al. [2009b] for MCMC sampling from the parameter posteriors, with the total number of MCMC samples set equal to the number of parameters multiplied by 2000. This value was generally sufficient to achieve convergence of the MCMC algorithm.

3. Application to Reactive Transport Model

3.1. Column-Scale Experiment

The WLS, WLS(we), and MV methods were used to infer the parameters of a column-scale reactive transport model, which is described in detail in Carniato et al. [2012]. The laboratory experiment consists of a 0.25 m column filled with zero-valent iron and sand (Figure 1). Polluted groundwater, containing a mixture of vola-tile organic compounds (PCE, TCE, trans-DCE, cis-DCE, and VC) and inorganic ions (dissolved calcium, car-bonate, magnesium, and chloride), was pumped through the column for more than 1 year, at an average flow rate of 8.7 mL h21. Sampling ports at different levels within the column were used to sample aliquots of groundwater and measure organic and inorganic concentrations. Sampling was performed at different times, after various pore volumes (PV) were pumped through the column. The dissolved calcium, iron, total inorganic carbon (TIC), and pH measurements at 0.03, 0.05, 0.1, and 0.25 m and at 5, 55, 160, 209, 244, 326, and 604 PV were used to calibrate and validate the reactive transport model.

3.2. Reactive Transport Model

The reactive transport model used to model the column experiment is MIN3P [Mayer et al., 2002]. The model solves saturated and unsaturated groundwater flow and reaction chemistry is modeled using the

(7)

partial equilibrium approach [Lichtner, 1985; Steefel and Lasaga, 1994]. Only the inor-ganic chemistry was modeled here, since the aim of the study is to compare different method-ologies to infer model parame-ters and not to provide predictions of the barrier con-taminant remediation efficiency. The inorganic part was assumed to be unaffected by low con-taminant concentrations and the flow to be fully saturated. Detailed explanations of the model assumptions are reported in the study of Carniato et al. [2012].

Eight dissolved mobile species (H1, Ca21, Fe21, CO223 , Cl2, H2(aq), O2(aq), and Fe31) and 19 mineral equilibrium reactions were included in the model. Iron corrosion and mineral precipitation were included as kinetic reactions. Iron corrosion by groundwater usually occurs under anaerobic conditions with the consumption of acidity and hydrogen production. We assumed a first-order dependence of corrosion on iron reactive surface area as proposed by Mayer et al. [2001]:

RIrC;W52max kIrC;WSr 12

IAPIrC;W KIrC;W   ;0 ; (18)

where kIrC,Wis the rate coefficient for iron corrosion by water per unit iron reactive surface area (mol m22 s21), IAPIrCis the ion activity product for iron corrosion and KIrC,Wis the equilibrium constant

(logKIrC,W5 211.78 as reported by Mayer et al. [2001]). It was also assumed that iron can be oxidized by car-bonate anions [Mayer et al., 2001; Reardon, 1995] according to the following rate:

RIrC;TIC52kIrC;TICSr CO22 3   khCO22 3 1 CO 22 3   ; (19)

where kIrC,TICis the iron corrosion by carbonate rate coefficient per unit of iron reactive surface area (mol m22s21) and khCO32-is the half saturation constant reported in the MIN3P database (1028mol L21). Mineral precipitation following a pH increase was described by the following rate equation [Lasaga, 1998]:

Rm i 5max keff ;i IAPm i Km i 21   ;0 ; (20)

where keff,iis the rate coefficient for precipitation of mineral i (mol L 21

s21), and IAPm

i and kmi are ion activity

product and solubility of mineral i, respectively. Based on X-ray diffraction data and influent concentrations, it was assumed that relevant minerals to be accounted for in the model include aragonite (CaCO3, i 5 CC), iron hydroxy carbonate (Fe2(OH)2CO3, i 5 IC), and amorphous iron hydroxide (FeOH2(am), i 5 IH). The model also accounts for changes in porosity and permeability over time as mineral precipitation progresses [Mayer, 1999]. Finally, iron reactive surface deactivation by mineral coating was modeled with an exponential equa-tion [Jeen et al., 2007]:

Srðx; tÞ5Sr0exp 2 Xnm i51 aiuiðx; tÞ ! ; (21)

where Sr0is initial iron reactive surface area per unit water volume (m 2

L21), uiis the volume fraction of min-eral i at location x and time t, and aiis the deactivation constant for mineral i (2). In columns that receive

Pump ZVI column 20% iron/80% sand L: 25 cm I.D.: 4 cm Contaminated groundwater

Sealed glass tube Manometer Manometer Waste bottle Sampling port Pump ZVI column 20% iron/80% sand L: 25 cm I.D.: 4 cm Contaminated groundwater

Sealed glass tube Manometer Manometer

Waste bottle Sampling port

(8)

carbonate-rich groundwater, the deactivating effect is mainly caused by carbonate minerals [Jeen et al., 2007]. Therefore, we assumed that iron hydroxy carbonate and aragonite precipitation in the column result in a decrease in iron corrosion rates (nm52), as these minerals form carbonate-rich, deactivating coatings on the iron grains [Jeen et al., 2006].

Equations (18)–(21) are strongly coupled. The iron corrosion reactions cause a pH increase, which shifts the carbonate equilibrium, with conversion of bicarbonate to carbonate. At high carbonate concentrations, the IAPm

i of carbonate minerals increases (equation (20)) and calcium carbonate and iron hydroxy carbonate

start to precipitate. This causes an increase in the volume fractions uiin equation (21) and a decrease of the available reactive surface Srfor iron corrosion (equations (18) and (19)). As the reactive surface decreases, the corrosion reaction and the subsequent mineral precipitation and reactive surface deactivation are slowed down.

Based on a preliminary sensitivity analysis, seven parameters were considered to affect the simulation results the most: the rate coefficient for iron corrosion by water (kIrC,Win equation (18)), the rate coefficient for iron corrosion by carbonate (kIrC,TICin equation (19)), the precipitation rate coefficients for aragonite and iron hydroxy carbonate (keff,CCand keff,ICin equation (20)), the deactivation constants for aragonite and iron hydroxy carbonate (aCCand aICin equation (21)) and the solubility of iron hydroxy carbonate (KICmin

equa-tion (20)). The solubility of iron hydroxy carbonate was assumed adjustable because the logarithmic value of this parameter has been reported to vary significantly from 21.56 [Lee and Wilkin, 2010] to 0.075 [Jeen et al., 2007].

The column was discretized using a 0.01 m space interval, resulting in 25 finite volumes. In contrast to the PHAST geochemical model [Parkhurst et al., 2010] used in our previous study, MIN3P adjusts the time incre-ments adaptively depending on the stiffness of the geochemical integration problem. Moreover, MIN3P uses a globally implicit approach to couple transport and reactions, which suppresses the numerical diffu-sion introduced by the operator splitting approach of PHAST, allowing increasing the time stepping with small loss of accuracy when an upstream spatial weighting scheme is used. These technical details are important in that they greatly reduce the CPU time required for one forward simulation, compared to the implementation in PHAST. Since in our system the major geochemical changes occur at the beginning of the simulation, with only gradual changes at later times, time steps in MIN3P can be increased without a substantial loss of accuracy. In our runs, MIN3P was up to 200 times more efficient than PHAST for the simu-lation of the first 209 pore volumes. Some preliminary simusimu-lations were performed to compare the simula-tion obtained with an accurate numerical scheme (central spatial weighting with grid Peclet number equal to 1.5 and grid Courant number equal to 0.5) with those obtained with the upstream spatial weighting used in the numerical experiments. Small differences were found between the results produced by the two schemes, indicating that the parameter values inferred by the different estimation methods can safely be considered independent of the space and time discretization.

Flow boundary conditions for the model were specified flux at the column inlet and zero-pressure head at the column outlet. The initial flow condition was zero-pressure head in the whole domain. For transport, a Cauchy boundary condition was specified at the column inlet (specified mass flux) and a free-exit boundary condition at the column outlet. For the real column test case (see below), the variability of the inflow con-centrations was taken into account in the model. The column was assumed to be initially filled with water at a pH of 7 and with low concentrations for all the dissolved inorganic species (1e-10 mol L21). Longitudi-nal dispersivity was fixed at a value of 1e-3 m, in line with similar column experiments [Jeen et al., 2007].

3.3. Case Studies

The methods presented in section 2 were first tested on two synthetic data sets generated from the model described above using the following procedure. First, model simulations of each species were generated at 0.03, 0.075, 0.15, and 0.25 m from the column inlet at 5, 55, 110, 160, and 209 pore volumes (PV) using the correct parameter values, yielding a 20 3 4 model simulation matrix g(/) (20 measurements events for each species). Then, the model simulations were log transformed and corrupted with measurement errors generated from a multivariate Gaussian normal distribution Nm(0, R) with known R (20 independent sam-ples were extracted from Nm(0, R), yielding a 20 3 4 matrix of residuals e, which were added to the model simulation matrix g(/) to obtain the measurement matrix Y). The diagonal elements of R were set equal to the estimated measurement error variances reported in Table 1, seventh column. For the first synthetic data

(9)

set (low correlation), the off-diagonal elements of R were calculated such that the off-diagonal elements of the correlation matrix were all equal to 0.1. For the second synthetic data set (high correlation), the off-diagonal elements of the correlation matrix R were calculated to obtain off-off-diagonal elements of the corre-lation matrix equal to 0.9. Due to the small sample size (20 draws), the Rresmatrix estimated from the resid-uals [Davis, 2002] was different from the R used to generate the data set. The maximum absolute off-diagonal value of Rreswas 0.44 for the low correlation case, whereas the minimum absolute off-diagonal value of Rreswas 0.82 for the high correlation case.

In the first two cases (synthetic case with low and high residual correlation), we used the WLS, WLS(we), and MV methods to infer the rate coefficients of iron carbonate precipitation, iron corrosion by water, arag-onite precipitation, and iron corrosion by carbonate (keff,IC, kIrC,W, keff,CC, and kIrC,TICin equations (18–21)) from the two synthetic data sets. In these cases, the model includes all the reactions used for the generation of the synthetic data and the only source of error is the measurement error. In the third case (synthetic case with model error), the precipitation of iron carbonate was excluded from the model and the iron corrosion by water, aragonite precipitation, and iron corrosion by carbonate (kIrC,W, keff,CC, and kIrC,TIC) were calibrated using the low correlation data set. This numerical experiment was performed to assess the performance of the three methods in a case where model error is introduced.

In the fourth case, actual measurements of the laboratory experiment [Carniato et al., 2012] were used to infer all sensitive parameters. In addition to the parameters inferred in the synthetic cases, the deactivation constants for iron carbonate and calcium carbonate (aICand aCCin equation (21)) and the equilibrium con-stant for iron hydroxy carbonate (Km

IC in equation (20)) were included in the parameter inference. Therefore,

seven parameters were estimated using the measurements at 0.03, 0.075, 0.15, and 0.25 m from the column inlet at 5, 55, 110, 160, and 209 pore volumes (PV), similarly to the synthetic cases. Additional measurements available at 244, 325, and 604 PV were used as a validation data set.

3.4. Assessing Model Structural Error

When residuals are not only affected by measurement errors, but also by model structural errors (and/or model input errors), estimated residual variances r2

j become larger than corresponding measurement error

variances r2

MEj, and the excess variance r2MOjcan be used as a measure of model error. The WLS(we) method

is the only one that explicit estimate the variances r2

MOj(equation (11)). For the other methods r2MOjwere

estimated subtracting r2

MEj(assumed to be known a priori) from r2j. For the WLS method the total residual

variance r2

j was estimated from the residuals eijcomputed at the MAP parameter estimation. For the MV method, the total residual variances at the MAP parameter estimations were obtained from the diagonal elements of the RMAPmatrix (equation (17)). Another measure of model error is the standard error (SE),

SE5 ffiffiffiffiffiffiffiffiffiffiffiffiffiffi OBFWLS mn2p s ; (22)

where OBFWLSis defined in equation (9) (with rj5rMEj) and p is the number of model parameters. As reported by Hill and Tiedeman [2007], SE should be equal to one in the absence of model error and when the weighting scheme reflects data accuracy. In this study, we calculated the standard error to indicate the presence of errors beyond the estimated measurement error and used the estimated r2

MOjto indicate the

species that were affected the most by it. The SE was also used to measure goodness of fit.

4. Results

Results are organized into three subsections. First, we investigate the effect of residual correlation on parameter and predictive uncertainty by means of synthetically generated data sets (first and second case study from section 3.3). Second, we evaluate performance of the various methods in the presence of model structural errors by means of both a synthetically generated data set and a real data set (third and fourth case study from section 3.3). Finally, computational efficiency of the different methods is compared.

4.1. Effect of Residual Correlation

In Figures 2a–2d, the profiles simulated with the correct parameters and the synthetic data corrupted with low correlated noise level are shown. The profiles obtained with the MAP parameter set identified by each

(10)

method are hardly distinguishable from the correct simulations and the different results are not shown (the corresponding RMSE values were almost identical to those indicated in Figure 2). From the figure it can be seen that larger residuals were generated for calcium and TIC near the influent and lower ones for iron and pH, which agrees with the specified rMEij,untrreported in Table 1 (larger for calcium and TIC and proportional to the magnitude of the measurement). No clear correlation is visually evident between residuals for differ-ent species.

In Figures 2e–2h the simulated profiles with the correct parameter values and high correlated residuals are shown (as for the low correlations case, the simulations with the MAP parameters were very similar for all methods). From the figure it is evident that a positive residual correlation exists. An example is the sampling event at 55 PV at 0.075 m from the column inlet, where the calcium, iron, and TIC measurements are lower than the correct concentration profiles. In real applications, systematic overestimation or underestimation might be caused by model errors and/or misspecified boundary conditions, for example due to unac-counted variations of inflow rates and in field applications due to well construction problems. Normal probability plots [Chambers et al., 1983; Draper and Smith, 1998; Hill and Tiedeman, 2007] for weighted residuals are used to check the Gaussian assumption underlying all methods studied here; weighted residuals are considered to be normally distributed if they plot on a straight line. Weights for each measurement set were specified inversely proportional to the residual standard deviations computed as reported in Table 1, columns 7–9. For the MV method, graphical checks of multivariate normality can also be performed using multivariate normality plots based on the Mahalanobis distance which accounts for residual correlation [Mahalanobis, 1936]. However, these plots are not directly comparable to normality plots of the WLS and WLS(we) methods, because the multivariate normality plots display a point for each measurement event (consisting of measured concentration for each of the four species), resulting in a plot with only 20 points instead of a normality plot with 80 points. Normality plots can be used to check normal-ity of the weighted residuals also for the MV method, because if the prior assumption of the multivariate residual distribution Nm(0, R) holds, the marginal distribution of the residuals belonging to the set j must be normal with zero mean and standard deviation rj. Substantial deviations of the weighted residuals from the normality line certainly indicate that the multivariate normality assumption is not satisfied. In addition to normality plots, Lilliefors tests [Lilliefors, 1967] at 5% significance level were also used to test for

0 0.05 0.1 0.15 0.2 0.25 50 100 150 200 (a) Ca2+ [13.28] mgL −1 Distance (m) 0 0.05 0.1 0.15 0.2 0.25 0 10 20 30 40 (b) Fe2+ [1.28] mgL −1 Distance (m) 0 0.05 0.1 0.15 0.2 0.25 0 20 40 60 80 100 120 (c) TIC [9.34] mgL −1 Distance (m) 0 0.05 0.1 0.15 0.2 0.25 6.5 7 7.5 8 8.5 9 (d) pH [0.01] pH Distance (m) Model 5 PV Lab 5 PV Model 55 PV Lab 55 PV Model 110 PV Lab 110 PV Model 160 PV Lab 160 PV Model 209 PV Lab 209 PV 0 0.05 0.1 0.15 0.2 0.25 50 100 150 200 (e) Ca2+ [13.38] mgL −1 Distance (m) 0 0.05 0.1 0.15 0.2 0.25 0 10 20 30 40 (f) Fe2+ [0.99] mgL −1 Distance (m) 0 0.05 0.1 0.15 0.2 0.25 0 20 40 60 80 100 120 (g) TIC [8.23] mgL −1 Distance (m) 0 0.05 0.1 0.15 0.2 0.25 6.5 7 7.5 8 8.5 9 (h) pH [0.01] pH Distance (m)

Figure 2. Species concentrations simulated with the correct parameter values (solid lines) and synthetic measurements (symbols) based on artificial residuals with either (top row) low or (bottom row) high correlation. In square brackets, the RMSEs for the correct parameter values are reported.

(11)

normality. The results in Figure 3 show that the residual distributions can be considered normal for all meth-ods, for both high and low correlation levels.

At low correlation, residuals are more aligned to the normality line compared to the high correlation case, with the WLS method having mean and standard deviation closest to 0 and 1, respectively (Figure 3a). The adjustment of the weights in the WLS(we) method improved the alignment to the normality line for the cal-cium residual (209 PV, 0.03 m) located at the positive tail of the WLS residual distribution (Figure 3b). A simi-lar effect can also be seen for the MV method, where the residual variances (i.e., the weights) were

integrated out from the joint posterior (Figure 3c). In this case, the standard deviation of the normalized residual distribution was slightly larger than 1. Similar results were also obtained for the high correlation case.

In Figures 4a–4d and 4j–4m, the posterior cumulative distribution functions (CDFs) of the model parameters for the low and high correlation cases are shown. As can be seen, the methods provide similar results at low correlation level, with the lowest L2norm of the bias vector (defined as the difference between the correct and the MAP parameter vectors) obtained for the MV method. Also for high correlation levels the MV method provides the best MAP parameter estimation and lower parameter uncertainties, as can be noted from the steep green curves close to the correct values. Such result is due to the fulfilment of the statistical assumptions about the residual correlation, which were considered only in the MV method. An additional experiment with an increased measurement error variance (50% increase) was performed, with the MV method providing the best estimates of the correct parameter values with narrower confidence intervals. In this test, the true value of the log keff,ICwas included in the 5–95% confidence interval only for the MV

−2 −1 0 1 2 0.01 0.02 0.05 0.10 0.25 0.50 0.75 0.90 0.95 0.98 0.99 Probability Residuals (a) WLS low correlation

Mean 3.33e−02 St. dev. 1.01e+00 Lil. test: Normal

−2 −1 0 1 2 0.01 0.02 0.05 0.10 0.25 0.50 0.75 0.90 0.95 0.98 0.99 Probability Residuals (b) WLS(we) low correlation

Mean 5.87e−02 St. dev. 9.83e−01 Lil. test: Normal

−2 −1 0 1 2 0.01 0.02 0.05 0.10 0.25 0.50 0.75 0.90 0.95 0.98 0.99 Probability Residuals (d) WLS high correlation Mean 1.06e−01 St. dev. 9.11e−01 Lil. test: Normal

−2 0 2 0.01 0.02 0.05 0.10 0.25 0.50 0.75 0.90 0.95 0.98 0.99 Probability Residuals (e) WLS(we) high correlation

Mean 6.98e−02 St. dev. 9.04e−01 Lil. test: Normal

−2 −1 0 1 2 0.01 0.02 0.05 0.10 0.25 0.50 0.75 0.90 0.95 0.98 0.99 Probability Residuals (c) MV low correlation Mean 6.49e−02 St. dev. 1.12e+00 Lil. test: Normal

−2 −1 0 1 2 0.01 0.02 0.05 0.10 0.25 0.50 0.75 0.90 0.95 0.98 0.99 Probability Residuals (f) MV high correlation Mean 8.64e−02 St. dev. 1.12e+00 Lil. test: Normal

Figure 3. Normal probability plots of the weighted residuals for the synthetic case without model error: the residual distributions for the (a–c) low correlation case and (d–f) high correla-tion case. Black circles represent calcium residuals, red asterisks iron residuals, green crosses TIC residuals, and blue squares pH residuals. Lillefors tests at 5% significance level indicate normal residual distributions for all cases.

(12)

method. From these results it is clear that the MV method is superior to the WLS and WLS(we) methods in identifying the correct parameters in the case of highly correlated residuals and in absence of model errors. The posterior CDFs of the estimated variances are shown in Figures 4e–4h and 4n–4q. As can be seen, the WLS method uses fixed values of the residual variances, based on the prior knowledge of the measurement errors (Table 1), which might differ from the variances computed from the correct residuals (in these syn-thetic cases such difference is attributable to the small sample size). Such result is also confirmed in Tables S1b and S2b of the supporting information, where positive rMOjwere estimated for the WLS method despite the absence of model errors. For the WLS(we) method, the residual variance was estimated by add-ing a positive rMOjto the measurement error variance (equation (11)). For the cases where the sample var-iance was lower than the measurement error varvar-iance rMEj, adding a positive value did not improve the estimations (as can be seen in Figures 4g for the low correlation case and Figures 4o–4q for high correlation case). Making fewer assumptions on the residual variances as in the MV method has the advantage of pro-viding more reliable estimations, with the correct variances always included in the 5–95% confidence inter-vals. The MV method was also able to provide a reliable estimation of the covariance elements of R matrix (equation (3)), as shown in Figures 4i and 4r for the covCa21Fe21element and in Figure S1 of the supporting information for the other elements. As can be seen from the figure, covCa21Fe21is zero for the WLS(we) and WLS methods, as assumed a priori.

Figure 5 shows the predictive QQ plot and predictive uncertainty bounds for all measurements used in cali-bration. When the empirical cumulative distribution function of the computed quantile values at each observation agrees with the uniform distribution (correspondence with the 1:1 line), the hypothesis made −7.2 −7 −6.8 0 0.2 0.4 0.6 0.8 1 (j) log k eff,IC −11.6 −11.4 −11.2 (k) log k IrC,W −6.8 −6.6 −6.4 (l) log k eff,CC −7.32 −7.31 −7.3 (m) log k IrC,TIC 0 .015 .03 (n) σCa22+ F(x) 0 .005 .01 (o) σFe22+ F(x) 0 .05 .1 (p) σTIC2 F(x) 0 .0001 (q) σpH2 F(x)

WLS (L2 norm 0.063) WLS(we) (L2 norm 0.036) MV (L2 norm 0.021)

0 .005 (r) cov Ca2+Fe2+ −7.2 −7 −6.8 0 0.2 0.4 0.6 0.8 1 (a) log k eff,IC −11.6 −11.4 −11.2 (b) log k IrC,W −6.8 −6.6 −6.4 (c) log k eff,CC −7.32 −7.31 −7.3 (d) log k IrC,TIC 0 .015 .03 (e) σCa22+ F(x) 0 .005 .01 (f) σFe22+ F(x) 0 .05 .1 (g) σTIC2 F(x) 0 .0001 (h) σpH2 F(x) WLS (L

2 norm 0.068) WLS(we) (L2 norm 0.055) MV (L2 norm 0.043)

0 .005 (i) cov

Ca2+Fe2+

Figure 4. Posterior cumulative distribution functions for model parameters and residual variances for the (a–i) low and (j–r) high correlation cases. The black dashed lines represent the correct parameter values and the variances computed form the correct residuals. In the legend boxes, the L2norms of the bias parameter vectors are reported.

(13)

in the calibration framework are consistent with the data [Thyer et al., 2009]. For the low correlation case (Figure 5a), all methods approach the 1:1 line. For the high correlation case (Figure 5b), the different meth-ods also perform similarly, with the MV method marginally closer to the 1:1 line. In both cases, the MV method slightly overestimates predictive uncertainty, as can be seen in Figures 5c and 5d, where slightly larger predictive bounds were obtained for calcium and iron. Such slight overestimation of predictive uncer-tainty could also be guessed from Figure 4, where wider posteriors for the variance parameters were

0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Theoretical quantile Computed quantile

(a) Synthetic case low correlation

WLS (97.50% SE 1.029) WLS(we) (98.75% SE 1.057) MV (100.00% SE 1.034) 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Theoretical quantile Computed quantile

(b) Synthetic case high correlation

WLS (93.75% SE 0.935) WLS(we) (96.25% SE 0.958) MV (97.50% SE 0.935) 50 100 150 200 Ca2+ mgL −1 0 20 40 Fe2+ mgL −1 0 50 100 TIC mgL −1 0 5 10 15 20 7 8 9 pH pH Sampling event (c) Prediction bounds low correlation

50 100 150 200 Ca2+ mgL −1 0 20 40 Fe2+ mgL −1 0 50 100 TIC mgL −1 0 5 10 15 20 7 8 9 pH pH Sampling event (d) Prediction bounds high correlation

Figure 5. Predictive QQ plot and uncertainty bounds for the synthetic case (a–c) with low residual correlation and (b–d) with high residual correlation. In the legend boxes, the percentage of measurements contained in the 95% prediction bounds and the standard error values are reported. Dashed lines delimit the pore volumes (5, 55, 110, 160, and 209 PV). Blue, red, and green lines correspond to the WLS, WLS(we), and MV methods, respectively.

(14)

obtained in the MV method. Overall, Figure 5 shows that accounting for correlation has a negligible effect on predictive uncertainty in these cases.

The SE measure (Figures 5a and 5b) correctly indicates the absence of model errors, since all values are close to one both at high and low correlation levels. The rMOjvalues for each measurement set are reported in Tables S1b and S2b of the supporting information; these values are close to zero, indicating no model error, as expected, but not exactly zero due to the small sample size.

4.2. Effect of Model Structural Error 4.2.1. Synthetic Case With Model Error

In this case, the methods were used to infer parameters related to iron corrosion by water, aragonite precip-itation, and iron corrosion by carbonate (kIrC,W, keff,CC, and kIrC,TIC), while excluding precipitation of iron hydroxy carbonate from the model. In Figure 6, the simulated profiles provided by the different methods are shown. As can be seen, iron and TIC concentrations were largely overestimated, whereas a relatively good fit was obtained for calcium. More carbonate (TIC) remains in solution because carbonate precipitation in the form of iron hydroxy carbonate was excluded from the model. As a result, the ion activity product (IAP) of aragonite is higher and parameter keff,CCis underestimated by more than 1 order of magnitude (see Table S3 in combination with equation (20) which shows that the precipitation rate of calcium carbonate is proportional to keff,CCand IAP). Moreover, the kIrC,Wparameter almost reaches its lower bound (log

kIrC,W5 212), which already suggests the presence of model error. In this case, none of the methods was able to retrieve the correct parameters.

The residual distributions at the MAP parameter estimates (Figures 7a–7c) deviate from normality for all methods, with heavy negative and positive tails (corresponding to an overestimation of the dissolved iron and the underestimation of pH). The WLS method, which uses a fixed weight approach, deviates the most from normality, whereas the WLS(we) and MV methods give better results, but with the Lilliefors test sug-gesting nonnormality for all methods. Therefore, the statistical assumptions made in the calibration phase were not met in this synthetic case with model error.

The posterior parameter CDFs provided by the different estimation methods are shown in Figures 8a–8c. The WLS method provides the sharpest CDFs, underestimating parameter uncertainty. Such underestima-tion can be attributed to the assumpunderestima-tion that the residual variance only consists of measurement errors, whereas in this test model error is also an important component of the residual variance. The WLS(we) and MV methods provided different CDFs with the correct value of the log kIrC,Wparameter included in the 5–95% confidence interval. The correct value of keff,CCwas included in the 5–95% confidence interval only for the MV method (Figure 8b), and none of the methods were able to identify the correct value for kIrC,TIC, indicating that the proposed methods can not correct the large errors induced by omitting iron hydroxy carbonate precipitation from the model. This is also confirmed by the large L2norms of the bias vector reported in the top legend of Figure 8.

Different parameter estimates between the WLS(we) and MV methods can mostly be attributed to the dif-ferent assumptions made about residual correlation. This hypothesis was confirmed by performing an addi-tional experiment in which only the diagonal elements of the S(/) matrix were considered a priori (referred to as MV(diagonal) in Figure 8), corresponding to an assumption of no correlation). MV(diagonal) is equiva-lent to the WLS(we) method when Jeffreys prior for the covariances r2

j is used, resulting in the posterior in

equation (13). Neglecting residual correlation resulted in parameter CDFs that were indeed very similar to those obtained with the WLS(we) method (Figures 8a–8c). Small differences between the WLS(we) and MV(diagonal) parameter posteriors may be due to different assumptions about the prior of the covariance matrix (uniform in the WLS(we) case and Jeffreys prior in the MV methods). The MV(diagonal) method con-sistently finds r2

j values that are smaller than the WLS(we) method (Figures 8d–8g), which may be related

to the Jeffreys prior preferring small variances. Note that using the Jeffreys prior in the MV method does not always improve the estimation of the variances when model error is present, as can be seen in Figure 8d, where the MV methods underestimates the correct calcium residual variance. Moreover, the introduc-tion of model error did not allow the correct inference of the covariance parameters in the MV method (Fig-ure 8h).

The WLS(we) and MV methods adjust the residual variances to decrease the weights of extreme outliers (mostly iron measurements) and increase the value of the respective likelihood functions (Table S3a of the

(15)

supporting information). The WLS method underestimates the correct residual variances, which are fixed at the measurement error variance (Figures 8d–8g). The weight adjustment resulted in a lower overestimation of the iron measurements from the WLS to the WLS(we) and MV methods (Figures 6b, 6f, and 6j), as opposed to the simulation with the correct parameter values where a severe overestimation of iron and pH was observed (with dissolved iron concentrations up to 180 mg L21and pH values up to 9.2). Therefore, estimating the weights from the data in presence of model bias did not allow the correct simulation of the concentration profiles.

The predictive QQ plot presented in Figure 9a indicates that the WLS method underestimates prediction uncertainty, with most of the measurements outside the prediction bounds (most of the computed quan-tiles close to 0 and 1). Consistent with the large estimated residual variances (Figure 8), the WLS(we) method provides the largest prediction bounds with all measurement contained in the prediction bounds and a typical S-shape of the QQ plot, indicating a severe overestimation of predictive uncertainty. For the MV methods (MV and MV(diagonal)), the computed quantiles are the closest to the 1:1 line. However, a reli-able quantification of predictive uncertainty in the synthetic case with model error also requires the statisti-cal description of the introduced model error, which was not included in our methods. For example, Reichert and Schuwirth [2012] indicate that neglecting model bias in the description of the model output can lead to an incorrect estimation of predictive uncertainty. In our case, large uncertainty bounds were pre-dicted for dissolved iron, with values up to 300,000 mg L21for the WLS(we) method and 8000 mg L21for

Model 5 PV Lab 5 PV Model 55 PV Lab 55 PV Model 110 PV Lab 110 PV Model 160 PV Lab 160 PV Model 209 PV Lab 209 PV 0 0.05 0.1 0.15 0.2 0.25 0 50 100 150 200 (a) Ca2+ [25.29] mgL −1 Distance (m) 0 0.05 0.1 0.15 0.2 0.25 0 40 80 120 160 (b) Fe2+ [61.26] mgL −1 Distance (m) 0 0.05 0.1 0.15 0.2 0.25 0 40 80 120 (c) TIC [27.48] mgL −1 Distance (m) 0 0.05 0.1 0.15 0.2 0.25 6.5 7 7.5 8 8.5 9 (d) pH [0.11] pH Distance (m) 0 0.05 0.1 0.15 0.2 0.25 0 50 100 150 200 (e) Ca2+ [25.99] mgL −1 Distance (m) 0 0.05 0.1 0.15 0.2 0.25 0 40 80 120 160 (f) Fe2+ [58.34] mgL −1 Distance (m) 0 0.05 0.1 0.15 0.2 0.25 0 40 80 120 (g) TIC [29.75] mgL −1 Distance (m) 0 0.05 0.1 0.15 0.2 0.25 6.5 7 7.5 8 8.5 9 (h) pH [0.12] pH Distance (m) 0 0.05 0.1 0.15 0.2 0.25 60 110 160 200 (i) Ca2+ [27.65] mgL −1 Distance (m) 0 0.05 0.1 0.15 0.2 0.25 0 40 80 120 160 (j) Fe2+ [47.37] mgL −1 Distance (m) 0 0.05 0.1 0.15 0.2 0.25 0 40 80 120 (k) TIC [32.30] mgL −1 Distance (m) 0 0.05 0.1 0.15 0.2 0.25 6.5 7 7.5 8 8.5 9 9.5 (l) pH [0.25] pH Distance (m)

(16)

the MV method. Obviously, these values are not realistic and are caused by the presence of large model errors.

The presence of model error is correctly indicated by the large estimated SEs (Figure 9a). The estimated rMOjvariances are reported in Table S3b of the supporting information. Compared to cases without model error, larger positive values were obtained for all measurement sets and methods, with the largest values obtained for iron and TIC. Such information could be used as a guide toward improving the model struc-ture, in this case indicating the need of including the omitted iron carbonate precipitation reaction which consumes dissolved iron and TIC.

4.2.2. Real Case

Finally, we also test and compare the different methods on a real data set. In this case, the measurements of a real column experiment were used to infer seven model parameters, as described in section 3.3. In Fig-ure 10, the model fits for the different MAP parameter estimates are shown together with the correspond-ing RMSE values. The WLS method fits the pH measurements accurately because large weights were assigned to pH residuals and the pH measurements were sensitive to mineral precipitation and iron corro-sion rates. In the WLS(we) method, the adjustment of the variances resulted in a redistribution of the weights with lower RMSE for calcium, iron, and TIC measurements and a higher RMSE for pH. Similar results were obtained for the MV method. The simulated concentrations shown in Figure 10 can be explained by analyzing the MAP parameter estimates reported in Table S4a of the supporting information. The calcium carbonate precipitation rate coefficient (keff,CC) was estimated about 2 orders of magnitude higher in the WLS(we) and MV methods compared to the WLS method. This leads to more precipitation of calcium

−100 −50 0 0.01 0.02 0.05 0.10 0.25 0.50 0.75 0.90 0.95 0.98 0.99 Probability Residuals

(a) WLS synthetic model error

Mean −8.17e+00 St. dev. 2.49e+01 Lil. test: Not normal

−1 0 1 0.01 0.02 0.05 0.10 0.25 0.50 0.75 0.90 0.95 0.98 0.99 Probability Residuals

(b) WLS(we) synthetic model error

Mean −2.34e−01 St. dev. 4.76e−01 Lil. test: Not normal

−2 0 2 0.01 0.02 0.05 0.10 0.25 0.50 0.75 0.90 0.95 0.98 0.99 Probability Residuals (c) MV synthetic model error

Mean −3.89e−01 St. dev. 1.05e+00 Lil. test: Not normal

−100 0 100 0.01 0.02 0.05 0.10 0.25 0.50 0.75 0.90 0.95 0.98 0.99 Probability Residuals (d) WLS real case Mean −3.50e−01 St. dev. 2.32e+01 Lil. test: Not normal

−2 0 2 0.01 0.02 0.05 0.10 0.25 0.50 0.75 0.90 0.95 0.98 0.99 Probability Residuals (e) WLS(we) real case

Mean −1.26e−01 St. dev. 1.03e+00 Lil. test: Normal

−5 0 5 0.01 0.02 0.05 0.10 0.25 0.50 0.75 0.90 0.95 0.98 0.99 Probability Residuals (f) MV real case Mean −4.88e−01 St. dev. 1.61e+00 Lil. test: Not normal

Figure 7. Normal probability plots of the weighted residuals for the synthetic case with model error and the real case: the residual distributions for the (a–c) synthetic case with model error and (d–f) real case. Black circles represent calcium residuals, red asterisks iron residuals, green crosses TIC residuals, and blue squares pH residuals. Lilliefors tests at 5% significance level indicate a normal residual distribution only for the WLS(we) method in the real case.

(17)

−12 −110 −10 0.2 0.4 0.6 0.8 1

(a) log kIrC,W

−8.5 −7.5 −6.5 (b) log keff,CC −9 −8 −7 (c) log kIrC,TIC 0 2 4 (d) σCa2+ 2 F(x) 0 20 40 (e) σFe2+ 2 F(x) 0 5 10 15 (f) σTIC 2 F(x) 0 1 2 (g) σpH 2 F(x) −2 −1 0 1 (h) covCa2+ Fe2+

WLS (L2 norm 1.312) WLS(we) (L2 norm 1.523) MV (L2 norm 1.424) MV(diagonal) (L2 norm 1.540)

−100 −8 −6 0.2 0.4 0.6 0.8 1

(i) log keff,IC

−14 −12 −10 (j) log kIrC,W −8 −6 (k) log keff,CC 0 .3 .6 (l) σCa2+ 2 F(x) 0 2 4 (m) σFe2+ 2 F(x) 0 .5 1 (n) σTIC 2 F(x) 0 .2 .4 (o) σpH 2 F(x) WLS WLS(we) MV 0 .5 1 (p) covCa2+ Fe2+

Figure 8. Posterior cumulative distribution functions for model parameters and residual variances (a–h) for the synthetic case with model error and (i–p) for the real case. The black dashed lines represent the correct parameter values and the variances estimated from the correct residuals. In the top legend box, the L2norms of the bias parameter vectors are

reported. 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Theoretical quantile Computed quantile

(b) Real case calibration

WLS (27.63% SE 24.194) WLS(we) (94.74% SE 29.697) MV (96.05% SE 28.640) 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Theoretical quantile Computed quantile

(c) Real case validation

WLS (41.67% SE 25.662) WLS(we) (100.00% SE 28.765) MV (100.00% SE 23.632) 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Theoretical quantile Computed quantile

(a) Synthetic case model error

WLS (27.50% SE 26.605) WLS(we) (100.00% SE 27.418) MV (96.25% SE 31.437) MV(diagonal) (97.50% SE 26.843)

Figure 9. Predictive QQ plots (a) for the synthetic case with model error and for the real case in (b) calibration and (c) validation. In the legend boxes, the percentage of measurements contained in the 95% prediction bounds and the standard error values are reported.

(18)

carbonate (aragonite), thereby removing carbonate from solution, reducing precipitation of iron hydroxy carbonate, and thus leaving more iron in solution.

Normality plots of the weighted residuals are shown in Figures 7d–7f. As in the synthetic case with model error, the WLS residual distribution deviates the most from normality, with heavy tails consisting of pH residuals (at 5 PV and 0.15 and 0.25 m from the influent). The normality assumption holds only for the WLS(we) method. This does not necessarily imply the absence of model errors, which might still be present as indicated by a high SE value (Figure 9b). Overall, the tails of the residual distribution in the real case devi-ate less from normality than in the synthetic case with model error, indicating a better simulation of meas-ured concentrations (as it can be confirmed by comparing Figure 10 to Figure 6).

Posterior CDFs for keff,IC, kIrC,W, keff,CCparameters are shown in Figures 8i–8k. Parameter uncertainty increases considerably from the WLS method to the other methods, due to residuals affected by other errors than just measurement errors. Accounting for residual correlation in the MV method further increases parameter uncertainty (as indicated by the confidence intervals reported in Table S4a of the supporting information) and in this case also increases the estimated residual variances (Figures 8l–8o).

As for the synthetic case with model error, the WLS method underestimates prediction uncertainty in bration (Figure 9b). For the WLS(we) and MV methods, larger 95% prediction bounds were estimated in cali-bration, with a closer correspondence of the computed quantiles to the theoretical quantiles (1:1 line). Accuracy of the estimated prediction bounds crucially depends on the presence of model bias, since none

Model 5 PV Lab 5 PV Model 55 PV Lab 55 PV Model 110 PV Lab 110 PV Model 160 PV Lab 160 PV Model 209 PV Lab 209 PV 0 0.05 0.1 0.15 0.2 0.25 50 100 150 200 (a) Ca2+ [61.65] mgL −1 Distance (m) 0 0.05 0.1 0.15 0.2 0.25 0 20 40 60 (b) Fe2+ [14.29] mgL −1 Distance (m) 0 0.05 0.1 0.15 0.2 0.25 0 30 60 90 120 (c) TIC [29.55] mgL −1 Distance (m) 0 0.05 0.1 0.15 0.2 0.25 6.5 7 7.5 8 8.5 9 9.5 (d) pH [0.25] pH Distance (m) 0 0.05 0.1 0.15 0.2 0.25 50 100 150 200 (i) Ca2+ [38.37] mgL −1 Distance (m) 0 0.05 0.1 0.15 0.2 0.25 0 20 40 60 (j) Fe2+ [10.05] mgL −1 Distance (m) 0 0.05 0.1 0.15 0.2 0.25 0 30 60 90 120 (k) TIC [21.03] mgL −1 Distance (m) 0 0.05 0.1 0.15 0.2 0.25 6.5 7 7.5 8 8.5 9 9.5 (l) pH [0.31] pH Distance (m) 0 0.05 0.1 0.15 0.2 0.25 50 100 150 200 (a) Ca2+ [27.23] mgL −1 Distance (m) 0 0.05 0.1 0.15 0.2 0.25 0 20 40 60 (c) Fe2+ [6.52] mgL −1 Distance (m) 0 0.05 0.1 0.15 0.2 0.25 0 30 60 90 120 (e) TIC [18.36] mgL −1 Distance (m) 0 0.05 0.1 0.15 0.2 0.25 6.5 7 7.5 8 8.5 9 9.5 (g) pH [0.32] pH Distance (m)

(19)

of the proposed error models explicitly account for it. The large SEs estimated in calibration suggests the presence of model errors also in the real case.

Compared to the calibration step, lower RMSEs were generally obtained in validation (Figure 11), probably because at early stages the geochemical conditions in the column were not fully stabilized (see, for exam-ple, the TIC measurements in Figure 10 at 5 PV at 0.03 and 0.05 m). Moreover, similar concentration profiles were simulated by the WLS(we) and MV methods, with the MV method providing the best fit for iron and pH measurements and slightly worse fit for calcium and TIC. The better fit obtained in validation is also con-firmed by the lower WLS(we) and MV standard errors (Figure 9c), with the MV method providing the lowest value. For the WLS method, the underestimation of predictive uncertainty is less severe compared to cali-bration whereas the WLS(we) and MV methods overestimate uncertainty. The rMOjstandard deviations reported in Table S4b of the supporting information indicate that iron residuals are the most affected by model errors, suggesting that model improvements should be sought in the description of iron corrosion and precipitation processes.

In all studied cases, the MV method was applied considering correlation between different species, since a strong interplay between different ions was expected (for example, a pH increases promote mineral precipi-tation and causes a decrease of dissolved TIC and calcium). As indicated in section 2, residuals must be grouped to obtain a valid posterior density for the MV method. Grouping the residuals is equivalent to neglect some residual correlations, in our case those by depth and time. As a result, the MV method allows

Model 244 PV Lab 244 PV Model 325 PV Lab 325 PV Model 604 PV Lab 604 PV 0 0.05 0.1 0.15 0.2 0.25 50 100 150 200 (a) Ca2+ [31.07] mgL −1 Distance (m) 0 0.05 0.1 0.15 0.2 0.25 0 20 40 60 (b) Fe2+ [12.36] mgL −1 Distance (m) 0 0.05 0.1 0.15 0.2 0.25 30 60 90 120 (c) TIC [13.93] mgL −1 Distance (m) 0 0.05 0.1 0.15 0.2 0.25 6.5 7 7.5 8 8.5 (d) pH [0.26] pH Distance (m) 0 0.05 0.1 0.15 0.2 0.25 50 100 150 200 (e) Ca2+ [24.89] mgL −1 Distance (m) 0 0.05 0.1 0.15 0.2 0.25 0 20 40 60 (f) Fe2+ [13.60] mgL −1 Distance (m) 0 0.05 0.1 0.15 0.2 0.25 0 30 60 90 120 (g) TIC [11.11] mgL −1 Distance (m) 0 0.05 0.1 0.15 0.2 0.25 6.5 7 7.5 8 8.5 (h) pH [0.26] pH Distance (m) 0 0.05 0.1 0.15 0.2 0.25 50 100 150 200 (i) Ca2+ [28.32] mgL −1 Distance (m) 0 0.05 0.1 0.15 0.2 0.25 0 20 40 60 (j) Fe2+ [8.93] mgL −1 Distance (m) 0 0.05 0.1 0.15 0.2 0.25 0 30 60 90 120 (k) TIC [13.33] mgL −1 Distance (m) 0 0.05 0.1 0.15 0.2 0.25 6.5 7 7.5 8 8.5 (l) pH [0.25] pH Distance (m)

(20)

integrating out from the joint posterior only a simplified covariance matrix, where specific covariances are set equal to 0. Using the residuals at the MAP parameter estimates of the MV method (Figures 10i–10l), the maximum off-diagonal correlation coefficient between different species was 0.86 (iron and TIC couple), whereas when the residuals were grouped by depth the maximum off-diagonal correlation coefficient was 0.84, for the measurements taken at 0.05 and 0.10 m from the column influent. When the residuals were grouped by pore volumes the maximum significant correlation coefficient was 0.95 (between pore volume 110 and 160). Based on these results, the MV method with residuals grouped by pore volumes could also be considered and the obtained results compared with those obtained in this study.

4.3. Computational Efficiency

In Table 2, the CPU hours required to achieve convergence of DREAM-ZS (with convergence indicated by the Gelman-Rubin R statistic below 1.2) are reported together with the number of model runs. Except for the synthetic case with high correlation, the MV method outperforms the WLS(we) method, with a speed up factor ranging from 1.2 (for the synthetic case with low correlation) to 3.8 (for the synthetic case with model error), thanks to analytical marginalization of the covariance matrix before running the MCMC algo-rithms. However, for the synthetic case with high correlation, chain convergence was reached faster in the WLS(we) method. In a hypothetical case where all the elements of the covariance matrix are explicitly esti-mated from the data (yielding a total of 14 parameters in our synthetic case), it is likely that the WLS(we) method will become more computational demanding than the MV method. For the WLS method the con-vergence was found to be slow and reached only after the specified maximum number of model runs (except for the synthetic case with high correlation).

5. Conclusions

A multivariate Bayesian methodology was presented for estimating reactive transport model parameters from multispecies data. It was shown that existing approaches, specifically the weighted least squares method (WLS) and the weighted least squares method with weight estimation (WLS(we)), fit within this framework and are obtained by making specific prior assumptions about the residual covariance matrix. The Bayesian perspective allows for generalizations to cases where residual correlation is important and for more efficient parameter estimation by analytically integrating out the variances and selected covariances from the joint posterior. One such method, referred to here as the MV method (for MultiVariate), incorpo-rates these two benefits. For the cases studied here, analytically integrating out the variances and selected covariances from the posterior leads to estimation of parameter and predictive uncertainty that is more effi-cient, in terms of CPU time, compared to the alternative of using MCMC to estimate model parameters and the covariance matrix together.

Using numerical experiments, the MV method was compared to WLS and WLS(we) to evaluate the effect of residual correlation (between species) on parameter and predictive uncertainty. It was found that residual correlation does not have a strong effect on predictive uncertainty and that parameter estimates and their posteriors were only affected at relatively high levels of residual correlation between species (0.9). Our results also show that all methods can be considered robust only for zero-mean residual distributions, as they all rely on this a priori assumption. In the presence of model structural errors, for example, due to an unaccounted for or improperly described reaction, strongly biased residual distributions and parameters values were obtained for all methods, as confirmed by large misfits between simulated and observed data and by estimated values for the parameters that reach prior bounds. The complete reaction network is rarely known in real systems and model structural errors are likely to be present. In such cases, more effort

Table 2. CPU Hours Required to Achieve Convergence of DREAM-ZSa Method

Synthetic Case Low Correlation

Synthetic Case High Correlation

Synthetic Case With

Model Error Real Case

WLS 32.1 (8000) 19.8 (4788) 21.9 (6000) 154.2 (14000)

WLS(we) 18.1 (4788) 24.7 (6384) 16.5 (4893) 111.0 (12078)

MV 15.0 (3591) 36.5 (7182) 4.3 (1200) 55.7 (6291)

a

Cytaty

Powiązane dokumenty

Volume 4 of the yearbook Sztuka Europy Wschod- niej • Искусство Восточной Европы • The Art of Eastern Europe is titled Henryk Siemiradzki and academism.. It

In Article I.1 we read, “Refl ecting the will of the citizens and States of Europe to build a common future, this Constitution establishes the European Union, on which the

The Hellenic National Cadastre: An Elemental Institution for Nurturing and Promoting Innovation in 3D Geospatial Data.. 5 th International FIG 3D Cadastre Workshop 18-20

Trudno jest jedno- znacznie rozstrzygnąć, którą interpretację tego imiesłowu powinniśmy przyjąć, ponieważ z jednej strony księżyc przedstawiony został jako ciało niebieskie

Zakład Badań nad Antykiem Chrześcijańskim KUL jako propagator myśli wczesno- chrześcijańskiej w Polsce w 30-lecie działalności, VoxP 19 (1999) t. na sympozjum Zakładu Antyku

The language of preference for the majority of scholarly papers published by “Religious and Sacred Poetry: An International Quarterly of Religion, Culture and Education”

In addition, the existence proof by means of a shooting argument gives rise to simple numerical algorithm, requiring only the solution of an initial value problem for an

W syntezie mariologii szkolnej autor nawet wprost podaje zasady odnowy czci Maryi według Marialis cultus. W ostatnim temacie książki autor poruszył zagadnienie maryjne­ go