BMA for ECOOP: Technical documentation

(1)

(2)

(3)

1 Background...1–1

2 BMA Theory ...2–1

3 Program description...3–1

4 Conclusions and recommendations...4–1

5 References ...5–1

Appendices

(4)

1 Background

The Dutch coastline is highly exposed to flooding and erosion under extreme weather

conditions. In order to foresee the thread of flooding, a highly sophisticated hydrodynamic

model: the Dutch Continental Shelf Model (DCSM, reviewed by Gerritsen, 1994), for

forecasting of water levels has been developed, as part of a warning system for extreme

events. This system has been operational for the last decades and has served its purpose

well.

In order to protect low-lying areas in South East England and the German Bight, similar

warning systems for extreme of water levels have been developed in the UK and Germany.

In fact, all countries bordering the North West European Shelf have established an efficient

storm surge prediction system. Recently, the national responsible agencies have agreed on a

closer co-operation with the purpose of improving the storm surge forecasts and thereby the

entire storm surge warning system.

The NOOS consortium (North West Shelf Operational Oceanographic System) is a

collaboration between partners from nine countries bordering the North Sea and North West

European Shelf (Belgium, Denmark, France, Germany, Ireland, Netherlands, Norway,

Sweden, and the UK). One of its goals is to develop an ocean observation and forecasting

systems for the North West Shelf area, making use of already existing building blocks as

much as possible. The NOOS partners are exchanging real time operational data, such as

water level observations and model forecasts.

ECOOP (European COastal-shelf sea OPerational monitoring and forecasting system) is an

EU project aiming to integrate and further develop existing operational observing and

forecasting systems for European coastal and regional seas. One of the goals of ECOOP is

to develop a storm surge forecasting system using pre-existing forecasting instruments as

building blocks. The work described in this paper is a step towards this integrated

forecasting system.

(5)

Table 1: NOOS partners and the forecasting models used in this study.

NOOS Partner

Country

Model

Bundesamt für Seeschifffahrt

und Hydrographie

Germany

BSH

Danish Meteorological Institute Denmark

DMI

Management Unit of the North

Sea Mathematical Models,

Belgian Royal Institute of

Natural Sciences

Belgium

MUMM

UK MetOffice

United Kingdom

UKMO

Norwegian Meteorological

Institute

Norway

DNMI

RIKZ, National Institute for

Coastal and Marine

Management and KNMI,

Koninklijk Nederlands

Meteorolologisch Instituut

Netherlands

DCSM, DCSMK

The hydrodynamic models solve shallow water equations on a rectangular or curvilinear

grid of several kilometres mesh size, which encloses the area of interest. Harmonic tidal

components are imposed at the boundaries. For the DCSM this area is depicted in Figure 1.

The other models in the NOOS community enclose different areas, but some include the

Dutch coast line and can thus be used as forecasts in this study.

(6)

The result of the calculation is a predicted water level at several stations along the coast. The

stations that were used in this study are shown in figure 2.

Figure 2: Water level stations used in this study.

(7)

2 BMA Theory

Bayesian Model Averaging (BMA) is a standard statistical approach for post-processing

ensemble forecasts from multiple competing models (Laemer, 1978). It has been widely

used in social and health sciences and was first applied to dynamical weather forecasting

models by Raftery (2005). The basic idea is to generate an overall forecast Probability

Distribution Function (PDF) by making a weighted average of the forecast PDF's from the

individual models. The weights represent the estimated posterior model probabilities, i.e. the

probability that a model will give the correct forecast PDF. In a dynamical model

application, the weights are continuously updated and determined by investigating the

performance of the competing models over a recent training period. In the current

application this training period is typically one to two weeks.

The mean of the overall forecast PDF has a mean that is generally better than the individual

model forecasts. Moreover, the overall forecast PDF provides a confidence interval, which

is valuable for many practical applications. The variance of the overall forecast PDF is the

result of two components. The first component is associated with the spread of the ensemble

members. The second component is the variance of the individual model forecast PDFs.

This latter component should also be determined over a training period, which can be

different from the training period mentioned earlier. In the present study identical training

periods were used to determine both the BMA weight and the variance of the individual

models.

The BMA method assumes that the probability of an observation y

obs

_{(s,t) at location s and}

time t is given by a weighted sum over a number of probability distributions g(y

fc

_(k,s,t))

from the different forecasting models k.

( , )

( )

( , , )

obs fc

k

P y

s t

w k g y k s t

(1)

The forecast distribution of each individual model is assumed a normal distribution with

variance (k).

2 2

( , )

( , , )

1 ( , , ) |

( , ), ( )

exp

2 ( )

( ) 2

obs fc fc obs

y

s t

y k s t

g y k s t y

s t

k

(2)

The BMA algorithm finds the optimal values for w(k) and (k), such that the likelihood of

the overall PDF (equation (1)) is maximal, given a set of historical forecasts and

observations.

(8)

( , ) |

( , , ), ( )

( , , )

( , ) |

( , , ), ( )

obs fc obs fc k

g y

s t y k s t

k

z k s t

g y

s t y k s t

k

(3)

The second step in the iterative algorithm is to determine the weights w(k) and variances

(k) of each of the models k, based on the values of z(k,s,t):

,

1 ( )

( , , )

s t

w k

z k s t

n

(4)

where n is the number of observations in the training period (s,t).

2 , 2 ,

( , )

( , , )

( )

( , , )

obs fc s t s t

z s t y

s t

y k s t

k

z k s t

(5)

The two steps are alternated to convergence, i.e. when w(k) and (k) no longer change after

a recalculation of z(k). One can use a convergence criterion or alternatively use a fixed

number of iteration cycles that should guarantee convergence. In the current setup it is found

that around 10 iteration cycles are generally sufficient, w(k) and (k) have reached a stable

value. To be sure that the iteration error is negligible, 12 cycles were used for the production

runs so far. For the dynamical model applications one can use the weights and variances

from the previous time step as a starting point for the new iteration.

A bias correction for the forecasts greatly improves the performance of the BMA. In fact,

without bias correction the BMA is forced to make a linear regression without a constant.

Adding a constant gives a much better result. The bias correction can be applied to the

forecasts over the training period before

After the iteration has converged an overall forecast mean for each of the stations can be

computed from equation (1). Due to missing forecasts, the sum of the weights can deviate

from unity. Therefore, a second normalization is applied to correct for missing forecast

values. This amounts to giving the remaining models a larger weight if one model is

missing.

The overall forecast confidence intervals are computed by integrating the weighted sum of

the individual forecast PDFs. For instance, the 10% quantile Q10(s,t_fc) is given by:

10

0.1 ( )

( , ,

)

Q fc fc k

w k g y k s t

dh

(6)

(9)

3 Program description

The BMA method is implemented as a collection of Perl script in Matroos, using

observations and forecasts for five locations along the Dutch coast (Delfzijl, Harlingen, Den

Helder, Hoek van Holland, Vlissingen) and forecasts from seven hydrodynamic models

(Table 1). The BMA module ‘main.pl’ is called every time a new DCSM forecast is received

in Matroos, which amounts to about every six hours.

The BMA script is an independent module within the Matroos environment. The Matroos

server treats the BMA module as a new source of forecasts: ‘bma_noos’. A pre-processing

module creates a ‘bma.inc’ and starts the BMA module. This module then reads the

necessary data from the Matroos database, using ‘readMatroos’. The output is written in

NOOS format to temporary files. The general Matroos module for reading data then reads

this information as if it were just another forecast.

The BMA method can be applied to water level or surge data. The latter is preferred,

because of the relatively large errors that are observed for models that are not calibrated for

the locations along the Dutch coast.

The observation and forecast data is not always complete in the sense that for locations

and/or time steps data values are missing. The BMA method can deal with this: equations

(4) and (5) allow for missing values in the data set (s,t). As long as there is at least one

forecast-observation combination in the training set a weight w(k) and a (k) can be

determined. Missing data causes the effective training period to be shorter and could lead to

problems if only very few data points remain. This is unlikely to happen, because missing

data over longer periods of time are usually noticed and solved in an operational

environment. Therefore, the current BMA implementation gives no special warning for this.

The algorithm is run by a collection of Perl modules, which will be described below. The

‘main.pl’ script is the central program that calls the other modules.

main.pl - Parameter settings

The ‘main.pl’ script starts off by defining a number of global parameters. They are stored in

a structure ‘BMA’:

my %BMA = ( 'models' => ['knmi_noos','knmi_noos_kalman','bsh_oper', 'mumm_oper','dmi_oper','dnmi_oper', 'ukmo_oper'], 'locations' => ['Delfzijl','HoekVanHolland','Vlissingen', 'Den Helder','Harlingen'],

'start_training' => 200709250000, # Start of the training period 'stop_training' => 200710020000, # end of the training period 'start_forecast' => 200710020000, # start of the forecast period 'stop_forecast' => 200710040000, # end of the forecast period 'forecast_horizon' => 1440, # minimum forecast horizon 'max_forecast_horizon' => 6000, # maximum forecast horizon 'interval' => 10, # minutes

'n_iterations' => 12, # number of iterations 'uncertainty_low' => 0.05,

(10)

The array ‘models’ defines the models that will be used in the BMA forecast. The array

‘locations’ defines the locations that will be considered. Note that only a single weight is

calculated for each model. If we include two locations that are very different, we will get a

forecast that gives, on average, the best result for both locations.

The parameter ‘start_training’ defines the start of the training period. This may include

forecasts from earlier times, giving a forecast for a time just within the training period. The

parameter ‘stop_training’ defines the end of the training period. For most operational

applications this will be the current time, including the most recent observations in the

training period. The length of the training period can be varied to optimize the BMA

forecast.

The parameter ‘start_forecast’ defines the start of the forecast. For most operational

applications this will be the current time. The parameter ‘stop_forecast’ defines the end of

the forecast. It can be any length, although the maximum forecast horizon of the models is a

sensible limit.

The parameter ‘forecast_horizon’ defines a lower limit to the time (in minutes) between the

issue of the forecast (the analysis time) and the forecast time (the time for which a water

level is predicted). A forecast horizon of 1440 (24 hrs) means that the most recent forecast at

24 hrs before the forecast from all models was taken. The average forecast time is therefore

more than 24 hrs, depending on the forecast rate of the model. For 12 hr update rates the

average forecast time is 30 hrs. Some test calculations were done on 12 hr forecast, giving

similar results. The parameter ‘max_forecast_horizon’ defines the maximum forecast time

(in minutes) for any of the models. It is used only to select forecasts that could be relevant

for the training period. For the actual training, the most recent forecast is used.

The parameter ‘nr_iterations’ defines the (fixed) number of iterations in the

expectation-maximization scheme. In the current setup we found that around 10 iteration cycles are

generally sufficient and we use 12 cycles for the production runs. The parameters

‘uncertainty_low’ and ‘uncertainty_high’ define the bounds of the confidence interval.

The above parameter settings in the ‘main.pl’ script are default values. They can be

overruled in a ‘bma.inc’ file, which has the following format:

$BMA{'start_training'} = '200709250600'; $BMA{'stop_training'} = '200710020600'; $BMA{'start_forecast'} = '200710020600'; $BMA{'stop_forecast'} = '200710040600'; $BMA{'locations'} = ['Delfzijl'];

If the ‘BMA.inc’ file is non-existent. The default values will be used.

main.pl - Calculations

(11)

The array “forecast” contains the BMA mean forecast (expectation value). The hash key

is “<location>:<minutes>”, where minutes is the number of minutes since January

1

st

_{2000. Model and location are entries from the arrays BMA::models and}

BMA::locations.

Array “h_fc” contains the model forecasts. Hash key:

“<model>:<location>:<minutes>”.

The array “bias” contains the bias over the training period per model. The hash key is

“<model>:<location>“.

The array “sigma” contains the computed spread for each model forecast, calculated over

the training period. The hash key is “<model>“.

The array “weight” contains the computed BMA weight for each model, calculated over

the training period. The hash key is “<model>“.

my ($forecast,$h_fc,$bias,$sigma,$weight) = BMA::BMA_forecast(\%BMA);

The routine ‘BMA_forecast’ is described in detail in the next section.

After this, the confidence interval is computed numerically in the routine

compute_uncertainty

.

This

returns

a

hash

table

confidence

, key is:

“<lower,upper>:<location>:<minutes>”. The confidence interval is, in fact,

two new time series: one for the lower bound and one for the upper bound.

my %confidence =

BMA::compute_uncertainty($uncertainty_low,$uncertainty_high, \%BMA,$forecast,$h_fc,$bias,$sigma,$weight);

BMA_forecast - Calculations

The routine ‘BMA_forecast’ starts by reading observations and forecasts from the

Matroos database:

my $dtg1 = $BMA{'start_training'};

my $dtg2 = $BMA{'stop_training'};

my ($observed,$nr_obs) = readdata::readMatroos($locations,

['observed'],$dtg1,$dtg2,-1,-1,'');

my ($h_fc,$nr_fc) =

readdata::readMatroos($locations,$models,$dtg1,

$dtg2,$forecast_horizon,$max_forecast_horizon,'');

The observations and forecasts are stored in ASCII files, carrying the name of the analysis

time and the name of the model (or ‘observed’ for the observations. The extension can be

either sealev or surge, depending on the quantity that we are training on.

Next, the bias is computed for each model over the training period (compute_bias), the

BMA EM-algorithm is run (compute_weights) and the BMA forecast is made, using

these weights (make_forecast).

%bias = compute_bias($BMA,$observed,$h_fc);

(12)

The result is stored in a table called BMA_fc, which is returned to ‘main.pl’.

main.pl - Output

Finally, the ‘main.pl’ script writes the output to file. Two new directories are created for the

lower and upper bounds of the confidence interval.

`mkdir -p bma_$percentage_low`; `mkdir -p bma_$percentage_high`;

Then, for each location, a file is created with the name <location>.sealev. Some

header information is printed:

"# BMA $percentage_low % confidence\n";

Next, the

time series from start_forecast to stop_forecast is printed. Finally, the files

are closed.

foreach my $s (@locations) { my $s1 = $s;

$s1 =~ s/\s//g;

open(BMA05,">bma_$percentage_low/$s1.sealev") or die "Could not open output file: $!\n";

open(BMA95,">bma_$percentage_high/$s1.sealev") or die "Could not open output file: $!\n";

print BMA05 "# BMA $percentage_low % confidence\n"; print BMA95 "# BMA $percentage_high % confidence\n";

for(my $t=$start_forecast; $t<$stop_forecast; $t+=$interval) { $dtg = dtg::min2dtg($t);

my $fc_low = $confidence{"lower:$s:$t"}; my $fc_up = $confidence{"upper:$s:$t"}; print BMA05 "$dtg $fc_low\n";

print BMA95 "$dtg $fc_up\n"; }

close (BMA05); close (BMA95); }

dtg.pm - Time units and conversions

The observation and forecast files are identified by a dtg (date-time-group). A dtg is a

10-digit number that contains a year, month, day, hour and minutes. For example:

200710030910 represents October 10, 2007, 10 minutes past 9 AM. Within the BMA

program, this dtg is converted to an integer for convenience reasons. This integer is the

number of minutes since January 1

st

1970 (Julian time). This reference date can be altered to

another date if necessary.

The dtg and number of minutes can be converted to each other by two routines in the

module dtg.

(13)

4 Conclusions and recommendations

The BMA method is a promising method for generating probabilistic water level forecasts

from an ensemble of model forecasts. The method is applicable in the Matroos environment,

making use of the information provided by NOOS partners. The main added value of the

BMA method is to produce a probabilistic forecast, rather than an additional deterministic,

or mean value forecast.

A test version of the BMA module is running within the VMWare version of Matroos since

October 2007. The BMA probabilistic forecasts (5, 50 and 95% levels) are being issued as

additional time series for the Matroos database. Figure 3 is a screendump of the Matroos

system with the BMA forecast (mean and confidence interval) as three additional time

series.

Figure 3: The Matroos surge ensemble prediction system with the BMA forecast (5, 50 and 95%) as three additional time series (orange).

Further activities within the scope of the current project (beginning of 2008) are:

Several small adjustments will be made to improve the quality if the BMA forecast. For

instance, the training period can be varied to optimize the performance.

From the forecasts up to the time of writing, it is observed that the bounds of the

confidence interval are sometimes unstable (see Figure 3). This problem needs to be

solved.

An evaluation and validation of the BMA method will take place by the end of the storm

season (spring 2008). The evaluation will focus on three aspects of the BMA forecast:

–

The correct prediction of the confidence interval. Ideally, after 100 events, 90

observations will be within the 90% confidence interval of the BMA forecast. Five

observations will be below the lower confidence bound and five events will be

above the upper bound.

(14)

square error (RMSE). If the RMSE of the BMA is larger than that of the DCSM

model, the BMA forecast is of little value to the forecasters.

–

Finally, the performance of the BMA method during extreme events should be

evaluated. A number of storm events will be studied.

As a limited extension of the current project, the following activities are recommended for

the beginning of 2008:

At the moment, a single weight is used for every forecast horizon. Using different

weights for different forecast horizons (e.g. 0-6 hrs, 6-12 hrs, >12 hrs) can improve the

quality of the BMA forecast. This will, however, introduce discontinuities in the BMA

forecast at the regime boundaries.

Within ECOOP, the suggestion has been made that an exceedance probability of certain

critical water levels are calculated as time series.

Instead of forecasting and training on the water levels or surge, one could train the

models on the values of the high tides (i.e. the maximum water level), since this is the

quantity that we are specifically interested in.

The source code should be revised such that it meets the demands of an operational

forecasting system. This should be done by professional IT people.

Further suggestions beyond the scope of the current project are:

In many cases a storm surge develops within only one or two days. The performance of

the forecasting models over the relatively quiet period before the storm is somewhat

irrelevant to the performance during the storm. One would prefer to determine the

predictive power of the individual models on the basis of similar situations and to train

the BMA on comparable storms. Similar situations can be selected by:

–

taking into account only the extremely high tides

–

similar meteorological situations, based on a few characteristics

Some of the NOOS partners are running their deterministic models in an ensemble,

using an ensemble of weather forcing input. Incorporating these ensembles in the BMA

method is not straightforward. Simply treating every member of the ensemble as a new

model ignores the fact that these forecasts were generated by the same model and they

should get the same weight. Some research is needed to find a method to include the

ensemble in the BMA method in a proper way.

(15)

5 References

Gerritsen (1994), H. Gerritsen, J.W. de Vries, M. Philippart – The Dutch Continental Shelf Model, Quantitative

Skill Assessment for Coastal Ocean Models, Coastal and Estuarine Studies, vol 48

NOOS,

www.noos.cc

Heemink (1990), A.W. Heemink, H. Kloosterhuis – Data assimilation for non-linear tidal models. Int. J. of numerical methods in fluids, 11 1097-1112

Leamer (1978), E.E. Leamer - Specification searches. Wiley, NY

Rafetery (2005), A.E. Raftery, T. Gneiting, F. Balabdaoui, M. Polakowski - Using Bayesian Model Averaging to

(16)

(17)

A

Source Code

Main.pl

#! /usr/bin/perl use FindBin ('$Bin'); use lib "$FindBin::Bin"; use BMA; use dtg; use readdata; use Cwd; my $debug = 0; ######################################################################## # make one forecast, based on a training period.

######################################################################## ######################################################################## ### set parameters ######################################################################## my %BMA = ( 'models' => ['knmi_noos','knmi_noos_kalman','bsh_oper','mumm_oper', 'dmi_oper','dnmi_oper','ukmo_oper'], 'locations' => ['Delfzijl','HoekVanHolland','Vlissingen','Den Helder', 'Harlingen'],

'start_training' => 200709250000, # Start of the training period 'stop_training' => 200710020000, # end of the training period 'start_forecast' => 200710020000, # start of the forecast period 'stop_forecast' => 200710040000, # end of the forecast period 'forecast_horizon' => 1440, # minimum forecast horizon used in training period (24 hrs)

'max_forecast_horizon' => 6000, # maximum forecast horizon for any of the models (100 hrs)

'interval' => 10, # minutes

'n_iterations' => 12, # number of iterations in BMA algorithm

'uncertainty_low' => 0.05, 'uncertainty_high' => 0.95, );

# Include user options my $pwd = cwd;

my $incfile = "$pwd/bma.inc"; if ( -f $incfile ) {

print "Including non-default options from file $incfile\n";

open INC,"<$incfile" or die "Cannot open include file $incfile $!\n"; my @lines = <INC>;

close (INC); eval "@lines"; }

(18)

my @models = @{$BMA{'models'}}; my @locations = @{$BMA{'locations'}}; my $nmodels = scalar(@models); my $nlocations = scalar(@locations);

print "models: @models\nlocations:@locations\n" if $debug; # Get uncertainty_low and _high

my $uncertainty_low = $BMA{'uncertainty_low'}; my $uncertainty_high = $BMA{'uncertainty_high'};

my $percentage_low = sprintf("%2.2d",$uncertainty_low *100); my $percentage_high = sprintf("%2.2d",$uncertainty_high*100);

######################################################################## ### do the actual calculations:

my ($forecast,$h_fc,$bias,$sigma,$weight) = BMA::BMA_forecast(\%BMA);

my %confidence = BMA::compute_uncertainty($uncertainty_low,$uncertainty_high,

\%BMA,$forecast,$h_fc,$bias,$sigma,$weight);

######################################################################## ### write debug output

### read observations for comparison: if ( $debug ) { my $dtg1 = $BMA{'start_forecast'}; my $dtg2 = $BMA{'stop_forecast'}; my ($observed,$nr_obs) = readdata::readMatroos(\@locations,['observed'], $dtg1,$dtg2,-1,-1,''); my %observed = %$observed; open(OUTFILE,">output/BMAforecast.txt") or die "Could not open output file: $!\n"; print OUTFILE "BMA forecast:\ntime\t";

for my $s(@locations) {

print OUTFILE "$s\t\t\tobs\t"; }

my $start_forecast = dtg::dtg2min($BMA{'start_forecast'}); my $stop_forecast = dtg::dtg2min($BMA{'stop_forecast'}); my $interval = $BMA{'interval'};

for(my $t=$start_forecast; $t<$stop_forecast; $t+=$interval) { $dtg = dtg::min2dtg($t); print OUTFILE "\n$dtg"; foreach my $s(@locations) { my $fc = $forecast{"$s:$t"}; my $fc_low = $confidence{"lower:$s:$t"}; my $fc_up = $confidence{"upper:$s:$t"}; my $obs = $observed{"$s:$t"};

print OUTFILE "\t$fc\t$fc_low\t$fc_up\t$obs"; }

(19)

my $start_forecast = dtg::dtg2min($BMA{'start_forecast'}); my $stop_forecast = dtg::dtg2min($BMA{'stop_forecast'}); my $interval = $BMA{'interval'}; `mkdir -p bma_$percentage_low`; `mkdir -p bma_$percentage_high`; foreach my $s (@locations) { my $s1 = $s; $s1 =~ s/\s//g; open(BMA05,">bma_$percentage_low/$s1.sealev") or die "Could not open output file: $!\n";

open(BMA95,">bma_$percentage_high/$s1.sealev") or die "Could not open output file: $!\n";

print BMA05 "# BMA $percentage_low % confidence\n"; print BMA95 "# BMA $percentage_high % confidence\n";

for(my $t=$start_forecast; $t<$stop_forecast; $t+=$interval) { $dtg = dtg::min2dtg($t);

my $fc_low = $confidence{"lower:$s:$t"}; my $fc_up = $confidence{"upper:$s:$t"}; print BMA05 "$dtg $fc_low\n";

print BMA95 "$dtg $fc_up\n"; } close (BMA05); close (BMA95);

}

readdata.pm

#!/usr/bin/perl package readdata; use strict; use dtg; my $debug = 0; #################################################################### # module readdata, functions:

# remove_no_obs removes the forecasts for which there is no observation. # readobservations() reads observation data.

# readMatroos() reads forecast data. #

# observations in %observed {location:minutes} # forecasts in %h_fc {model:location:min} #################################################################### #################################################################### sub readMatroos ($$$$$$$) { #################################################################### my ($locs,$sources,$start,$stop,$fc_min,$fc_max,$anal_time) = @_; $sources = ['observed'] unless $sources;

$fc_min = -1 unless $fc_min; $fc_max = -1 unless $fc_max;

(20)

">> locs : @$locs\n". ">> fc_min: $fc_min\n".

">> fc_max: $fc_max\n" if $debug; # declare return hash %values

my %values;

my $overallcount = 0;

# do for each requested location foreach my $loc (@$locs) {

# do for each requested source foreach my $source (@$sources) {

# Set up the script call for getting the data my $matroos = "wget --proxy=off -O - ".

"'http://matroos/direct/get_series.php?". "loc=$loc&source=$source&unit=waterlevel&". "tstart=$start&tstop=$stop&fc_min=$fc_min&". "fc_max=$fc_max&anal_time=$anal_time'"; # Get the data from Matroos

my $ret = `$matroos 2>/dev/null`; my @data = split ("\n",$ret); my $count= 0;

my $model= "$source:";

$model= '' if ($source eq 'observed'); # Interpret the output

foreach my $line (@data) {

if($line =~ m/^\s*(\d+)\s+([^\s]+)/) { my $time = dtg::dtg2min($1); my $waterlevel = $2; $values{"$model$loc:$time"} = $waterlevel; $count++; } }

print "read $count values for location $loc, source $source\n" if $debug; $overallcount += $count; } } return (\%values,$overallcount); } ### end readMatroos() #################################################################### sub remove_no_obs($$) { #################################################################### my ($observed,$h_fc) = @_; my %observed = %$observed; my %h_fc = %$h_fc; my $count = 0;

(21)

} else { if(abs($observed{$indx2} - $h_fc{$indx}) > 5) { undef($h_fc{$indx}); $count++; } } } else {

print "??? could not recognise indx: $indx\n"; exit(1); } } return $count } ################################################################## sub check_data($$$$$$) { ################################################################## my ($start,$stop,$model,$locations,$observed,$h_fc) = @_; my @locations = @$locations; my %observed = %$observed; my %h_fc = %$h_fc; $start = dtg::dtg2min($start); $stop = dtg::dtg2min($stop); ### check observed and forecasts open(CHECK1, ">output/checkobs.txt"); print CHECK1 "time";

foreach my $loc (@locations) { print CHECK1 "\t$loc"; }

print CHECK1 "\n";

for(my $time=$start; $time<$stop; $time+=10) { my $dtg = dtg::min2dtg($time);

print CHECK1 $dtg;

foreach my $loc (@locations) { my $obs = $observed{"$loc:$time"}; print CHECK1 "\t$obs";

} print CHECK1 "\n"; } close CHECK1; ### check forecast open(CHECK2, ">output/checkforecast.txt"); print CHECK2 "time";

foreach my $loc (@locations) { print CHECK2 "\t$loc"; }

print CHECK2 "\n";

for(my $time=$start; $time<$stop; $time+=10) { my $dtg = dtg::min2dtg($time);

print CHECK2 $dtg;

foreach my $loc (@locations) {

my $fc = $h_fc{"$model:$loc:$time"}; print CHECK2 "\t$fc";

}

(22)

close CHECK2; return; } 1; BMA.pm #!/usr/bin/env perl package BMA; use readdata; use dtg; use normal; my $debug = 0; %observed = ();

%h_fc = (); # water level forecasts in %h_fc{model:location:min}

%weight = (); # weight for each forecast model

%sigma = (); # sigma for each forecast model

%BMA_forecast = (); %avg_forecast = (); %confidence= ();

#################################################################### # make one forecast, based on a training period.

#################################################################### sub BMA_forecast($) {

my $BMA = shift; my %BMA = %$BMA;

# Get settings from %BMA

my $locations = $BMA{'locations'}; my $models = $BMA{'models'}; my $dtg1 = $BMA{'start_training'}; my $dtg2 = $BMA{'stop_training'}; my $interval = $BMA{'interval'}; my $forecast_horizon = $BMA{'forecast_horizon'}; my $max_forecast_horizon = $BMA{'max_forecast_horizon'}; ### read observation and forecast data

my ($observed,$nr_obs) = readdata::readMatroos($locations,['observed'], $dtg1,$dtg2,-1,-1,'');

print "read total of $nr_obs observations from $dtg1 to $dtg2\n"; %observed = %$observed;

my ($h_fc,$nr_fc) = readdata::readMatroos($locations,$models,$dtg1,$dtg2, $forecast_horizon,$max_forecast_horizon,'');

print "read total of $nr_fc forecasts from $dtg1 to $dtg2\n"; %h_fc = %$h_fc;

my $n_removed = readdata::remove_no_obs($observed,$h_fc);

(23)

### make forecast based on the Bayes weights (set forecast horizon to zero) my $dtg1 = $BMA{'start_forecast'}; my $dtg2 = $BMA{'stop_forecast'}; my $forecast_horizon = -1; ($h_fc,$nr_fc) = readdata::readMatroos($locations,$models,$dtg1,$dtg2, $forecast_horizon,$max_forecast_horizon,$dtg1); print "read total of $nr_fc forecasts from $dtg1 to $dtg2\n"; if ( $debug ) {

my ($obs2,$nr) = readdata::readMatroos($locations,['observed'], $dtg1,$dtg2,-1,-1,'');

print "read total of $nr observed from $dtg1 to $dtg2\n";

readdata::check_data($dtg1,$dtg2,"knmi_noos_kalman",$locations,$obs2,$h_fc) ; } %BMA_fc = make_forecast($BMA,\%weight,$h_fc,\%bias); return (\%BMA_fc,$h_fc,\%bias,$sigma,\%weight); } #################################################################### ### compute uncertainties ####################################################################sub compute_uncertainty($$$$$$$$) { my $lowerperc = shift; my $upperperc = shift; my ($BMA,$fc_BMA,$h_fc,$bias,$sigma,$weight) = @_; my %BMA = %$BMA; my %fc_BMA = %$fc_BMA; my %h_fc = %$h_fc; my %bias = %$bias; my %sigma = %$sigma; my %weight = %$weight; my $start_forecast = dtg::dtg2min($BMA{'start_forecast'}); my $stop_forecast = dtg::dtg2min($BMA{'stop_forecast'}); my $interval = $BMA{'interval'}; my @locations = @{$BMA{'locations'}}; my @models = @{$BMA{'models'}};

if($lowerperc <= 0) {print "error in compute_uncertainty\n"; exit(1); } if($upperperc >= 1) {print "error in compute_uncertainty\n"; exit(1); } if($lowerperc > $upperperc) {print "error in compute_uncertainty\n"; exit(1); }

print "compute confidence intervals for $lowerperc .. $upperperc.\n"; my $largest_sigma = 0;

my $smallest_sigma = 1; foreach my $k(@models) {

if(defined($sigma{$k}) && ($sigma{$k}> 0)) {

if($sigma{$k} < $smallest_sigma) { $smallest_sigma = $sigma{$k}; } if($sigma{$k} > $largest_sigma) { $largest_sigma = $sigma{$k}; } }

}

my $dHW = $smallest_sigma*0.01;

(24)

my $HW = $fc_BMA{"$s:$t"} - 3*$largest_sigma; my $CDF = 0;

my $count = 0;

while (($CDF < $upperperc) && ($count < 10000)) { my $totalweight = 0; my $addtoCDF = 0; foreach my $k(@models) { my $indx = "$k:$s:$t"; if(defined($h_fc{$indx})) { my $fc = $h_fc{$indx} - $bias{"$k:$s"}; my $PDF = normal::normal_PDF($HW, $fc, $sigma{$k}); $addtoCDF += $weight{$k}*$PDF; $totalweight += $weight{$k}; } } if($totalweight > 0) { $CDF += $dHW * $addtoCDF / $totalweight; } else { my $dtg = dtg::min2dtg($t);

print "no weights for $s at t=$t\n"; my $done = <>; exit(1);

}

if(($CDF > $lowerperc) && (!defined($confidence{"lower:$s:$t"} ))) { $confidence{"lower:$s:$t"} = $HW - 0.5*$dHW; } $HW += $dHW; $count++; } $confidence{"upper:$s:$t"} = $HW - 0.5*$dHW; } } return %confidence; } #################################################################### ### compute bias #################################################################### sub compute_bias($$$) { my ($BMA,$observed,$h_fc) = @_; my %BMA = %$BMA; my %observed = %$observed; my %h_fc = %$h_fc; my $t_start = dtg::dtg2min($BMA{'start_training'}); my $t_stop = dtg::dtg2min($BMA{'stop_training'}); my $t_intv = $BMA{'interval'}; my @locations = @{$BMA{'locations'}}; my @models = @{$BMA{'models'}}; my %bias = (); my %n_bias = ();

my @l = sort keys %observed; foreach my $s(@locations) {

(25)

$bias{"$k:$s"} = $h_fc{$indx} - $obs; $n_bias{"$k:$s"} = 1;

} else {

$bias{"$k:$s"} += $h_fc{$indx} - $obs; $n_bias{"$k:$s"}++; } } } } } }

foreach my $indx (keys %bias) { $bias{$indx} /= $n_bias{$indx};

print "bias $indx = $bias{$indx}\n" if $debug; }

return %bias; }

#################################################################### ### iteration: expectation-maximization algorithm

#################################################################### sub compute_weights() { my ($BMA,$observed,$h_fc,$bias) = @_; my %BMA = %$BMA; my %observed = %$observed; my %h_fc = %$h_fc; my %bias = $bias; my $start_training = dtg::dtg2min($BMA{'start_training'}); my $stop_training = dtg::dtg2min($BMA{'stop_training'}); my $interval = $BMA{'interval'}; my @locations = @{$BMA{'locations'}}; my @models = @{$BMA{'models'}}; my $n_iterations = $BMA{'n_iterations'}; ### general variables, used in the BMA algorithm

my %weight = (); # weights for each forecast model

my %sum_weight = (); # average weights for each forecast model

my %n_weight = (); # average weights for each forecast model

my %z; # z-function (see literature)

my $nmodels = scalar(@models); ### initialise

foreach my $k (@models) {

$weight{$k} = 1/$nmodels; # intitially the weight of all models is equal

$sum_weight{$k} = 0; $n_weight{$k} = 0; if($sigma{$k} <= 0) {

$sigma{$k} = 0.2; # if unknown, sigma is initialised to 20 cm. }

}

### iteration

for(my $i=0; $i<$n_iterations; $i++) { print "iteration $i\n";

(26)

foreach my $s (@locations) {

for (my $t = $start_training; $t<$stop_training; $t+=$interval) { if(defined($observed{"$s:$t"})) { my $norm = 0; my $obs = $observed{"$s:$t"}; foreach my $k (@models) { my $indx = "$k:$s:$t"; if(defined($h_fc{$indx})) { my $fc = $h_fc{$indx} - $bias{"$k:$s"};

$z{$indx} = normal::normal_PDF($obs, $fc, $sigma{$k}); $norm += $z{$indx};

} }

foreach my $k(@models) { ### normalise z

my $indx = "$k:$s:$t"; if(defined($h_fc{$indx})) { if($norm <= 0) {

print "norm=$norm, indx=$indx, k=$k\n";

print "fc=$fc, obs=$obs, sigma=$sigma{$k}, z=$z{$indx}\n"; } else { $z{$indx} /= $norm; } } } if($norm <= 0) { my $dtg = dtg::min2dtg($t);

print "no forecasts for $s at t=$dtg, exit.\n"; my $done = <>; exit(1);

} } } }

### step 2: maximization, compute sigma for my $k (@models) {

my $sigma2_k = 0; my $sum_z = 0;

for my $s (@locations) {

for (my $t = $start_training; $t<$stop_training; $t+=$interval) { if(defined($observed{"$s:$t"})) { my $indx = "$k:$s:$t"; $sigma2_k += $z{$indx}*normal::sqr($h_fc{$indx} -$bias{"$k:$s"} - $observed{"$s:$t"}); $sum_z += $z{$indx}; } } } if($sum_z > 0) { $sigma{$k} = sqrt($sigma2_k/$sum_z); } else { $sigma{$k} = 0; }

(27)

$weight{$k} = 0; my $nst = 0;

for my $s(@locations) {

for (my $t = $start_training; $t<$stop_training; $t+=$interval) { my $indx = "$k:$s:$t"; if(defined($z{$indx})) { $weight{$k} += $z{$indx}; $nst++; } } } if($nst > 0) { $weight{$k} /= $nst; } else { $weight{$k} = 0; } $sum_w += $weight{$k}; }

### sum weights may not be unity due to missing observations, make a correction for my $k(@models) { if($weight{$k} > 0) { $weight{$k} /= $sum_w; $sum_weight{$k} += $weight{$k}; $n_weight{$k}++; }

print "weight($k) = $weight{$k}\n" if $debug; }

}

return (\%sigma,%weight); }

#################################################################### ### make the actual BMA forecasts, using the Bayes weights

#################################################################### sub make_forecast() { my ($BMA,$weight,$h_fc,$bias) = @_; my %BMA = %$BMA; my %weight = %$weight; my %h_fc = %$h_fc; my %bias = $bias; my $start = dtg::dtg2min($BMA{'start_forecast'}); my $stop = dtg::dtg2min($BMA{'stop_forecast'}); my $interval = $BMA{'interval'}; my @locations = @{$BMA{'locations'}}; my @models = @{$BMA{'models'}}; print "Make forecast using weights:\n"; for my $k(@models) {

my $w = $weight{$k}; print "$k\t$w\n"; }

for(my $t=$start; $t<=$stop; $t+=$interval) { for my $s(@locations) {

(28)

my $indx = "$k:$s:$t"; if(defined($h_fc{$indx})) { my $fc = $h_fc{$indx} - $bias{"$k:$s"}; $nmodels++; $sumweights += $weight{$k}; $avg_forecast{"$s:$t"} += $fc; $BMA_forecast{"$s:$t"} += $weight{$k} * $fc; } } if($sumweights > 0) { $avg_forecast{"$s:$t"} /= $nmodels; $BMA_forecast{"$s:$t"} /= $sumweights; } else { my $dtg = dtg::min2dtg($t);

print "no forecasts for $s at t=$dtg.\n"; $avg_forecast{"$s:$t"} = 0; $BMA_forecast{"$s:$t"} = 0; } } } return %BMA_forecast; } 1; dtg.pm #!/usr/bin/env perl package dtg; use strict; use Time::Local; ######################################################################## # module dtg

# dtg2min() calculate minutes since jan 1 2000

# min2dtg() calculate dtg from minutes since jan 1 2000 #

######################################################################## sub dtg2min($) {

# Purpose: convert gregorian time to julian minutes # args: date/time string in format YYYYMMDDhhmm # output: julian minutes

my $dtg =shift; my ($year,$month,$day,$hour,$min) = $dtg=~/^(\d\d\d\d)(\d\d)(\d\d)(\d\d)(\d\d)/; my $secs = timegm(0,$min,$hour,$day,$month-1,$year); my $minutes = $secs/60; return $minutes; } sub min2dtg($) {

(29)

(30)

BMA for ECOOP: Technical documentation

Contents

1

Background...1–1

2

BMA Theory ...2–1

3

Program description...3–1

4

Conclusions and recommendations...4–1

5

References ...5–1

Appendices

1

Background

The Dutch coastline is highly exposed to flooding and erosion under extreme weather

conditions. In order to foresee the thread of flooding, a highly sophisticated hydrodynamic

model: the Dutch Continental Shelf Model (DCSM, reviewed by Gerritsen, 1994), for

forecasting of water levels has been developed, as part of a warning system for extreme

events. This system has been operational for the last decades and has served its purpose

well.

In order to protect low-lying areas in South East England and the German Bight, similar

warning systems for extreme of water levels have been developed in the UK and Germany.

In fact, all countries bordering the North West European Shelf have established an efficient

storm surge prediction system. Recently, the national responsible agencies have agreed on a

closer co-operation with the purpose of improving the storm surge forecasts and thereby the

entire storm surge warning system.

The NOOS consortium (North West Shelf Operational Oceanographic System) is a

collaboration between partners from nine countries bordering the North Sea and North West

European Shelf (Belgium, Denmark, France, Germany, Ireland, Netherlands, Norway,

Sweden, and the UK). One of its goals is to develop an ocean observation and forecasting

systems for the North West Shelf area, making use of already existing building blocks as

much as possible. The NOOS partners are exchanging real time operational data, such as

water level observations and model forecasts.

ECOOP (European COastal-shelf sea OPerational monitoring and forecasting system) is an

EU project aiming to integrate and further develop existing operational observing and

forecasting systems for European coastal and regional seas. One of the goals of ECOOP is

to develop a storm surge forecasting system using pre-existing forecasting instruments as

building blocks. The work described in this paper is a step towards this integrated

forecasting system.

NOOS Partner

Country

Model

Bundesamt für Seeschifffahrt

und Hydrographie

Germany

BSH

Danish Meteorological Institute Denmark

DMI

Management Unit of the North

Sea Mathematical Models,

Belgian Royal Institute of

Natural Sciences

Belgium

MUMM

UK MetOffice

United Kingdom

UKMO

Norwegian Meteorological

Institute

Norway

DNMI

RIKZ, National Institute for

Coastal and Marine

Management and KNMI,

Koninklijk Nederlands

Meteorolologisch Instituut

Netherlands

DCSM, DCSMK

The hydrodynamic models solve shallow water equations on a rectangular or curvilinear

grid of several kilometres mesh size, which encloses the area of interest. Harmonic tidal

components are imposed at the boundaries. For the DCSM this area is depicted in Figure 1.

The other models in the NOOS community enclose different areas, but some include the

Dutch coast line and can thus be used as forecasts in this study.

The result of the calculation is a predicted water level at several stations along the coast. The

stations that were used in this study are shown in figure 2.

2

BMA Theory

Bayesian Model Averaging (BMA) is a standard statistical approach for post-processing

ensemble forecasts from multiple competing models (Laemer, 1978). It has been widely

_{(s,t) at location s and}

_(k,s,t))