Contents
1
Background...1–1
2
BMA Theory ...2–1
3
Program description...3–1
4
Conclusions and recommendations...4–1
5
References ...5–1
Appendices
1
Background
The Dutch coastline is highly exposed to flooding and erosion under extreme weather
conditions. In order to foresee the thread of flooding, a highly sophisticated hydrodynamic
model: the Dutch Continental Shelf Model (DCSM, reviewed by Gerritsen, 1994), for
forecasting of water levels has been developed, as part of a warning system for extreme
events. This system has been operational for the last decades and has served its purpose
well.
In order to protect low-lying areas in South East England and the German Bight, similar
warning systems for extreme of water levels have been developed in the UK and Germany.
In fact, all countries bordering the North West European Shelf have established an efficient
storm surge prediction system. Recently, the national responsible agencies have agreed on a
closer co-operation with the purpose of improving the storm surge forecasts and thereby the
entire storm surge warning system.
The NOOS consortium (North West Shelf Operational Oceanographic System) is a
collaboration between partners from nine countries bordering the North Sea and North West
European Shelf (Belgium, Denmark, France, Germany, Ireland, Netherlands, Norway,
Sweden, and the UK). One of its goals is to develop an ocean observation and forecasting
systems for the North West Shelf area, making use of already existing building blocks as
much as possible. The NOOS partners are exchanging real time operational data, such as
water level observations and model forecasts.
ECOOP (European COastal-shelf sea OPerational monitoring and forecasting system) is an
EU project aiming to integrate and further develop existing operational observing and
forecasting systems for European coastal and regional seas. One of the goals of ECOOP is
to develop a storm surge forecasting system using pre-existing forecasting instruments as
building blocks. The work described in this paper is a step towards this integrated
forecasting system.
Table 1: NOOS partners and the forecasting models used in this study.
NOOS Partner
Country
Model
Bundesamt für Seeschifffahrt
und Hydrographie
Germany
BSH
Danish Meteorological Institute Denmark
DMI
Management Unit of the North
Sea Mathematical Models,
Belgian Royal Institute of
Natural Sciences
Belgium
MUMM
UK MetOffice
United Kingdom
UKMO
Norwegian Meteorological
Institute
Norway
DNMI
RIKZ, National Institute for
Coastal and Marine
Management and KNMI,
Koninklijk Nederlands
Meteorolologisch Instituut
Netherlands
DCSM, DCSMK
The hydrodynamic models solve shallow water equations on a rectangular or curvilinear
grid of several kilometres mesh size, which encloses the area of interest. Harmonic tidal
components are imposed at the boundaries. For the DCSM this area is depicted in Figure 1.
The other models in the NOOS community enclose different areas, but some include the
Dutch coast line and can thus be used as forecasts in this study.
The result of the calculation is a predicted water level at several stations along the coast. The
stations that were used in this study are shown in figure 2.
Figure 2: Water level stations used in this study.
2
BMA Theory
Bayesian Model Averaging (BMA) is a standard statistical approach for post-processing
ensemble forecasts from multiple competing models (Laemer, 1978). It has been widely
used in social and health sciences and was first applied to dynamical weather forecasting
models by Raftery (2005). The basic idea is to generate an overall forecast Probability
Distribution Function (PDF) by making a weighted average of the forecast PDF's from the
individual models. The weights represent the estimated posterior model probabilities, i.e. the
probability that a model will give the correct forecast PDF. In a dynamical model
application, the weights are continuously updated and determined by investigating the
performance of the competing models over a recent training period. In the current
application this training period is typically one to two weeks.
The mean of the overall forecast PDF has a mean that is generally better than the individual
model forecasts. Moreover, the overall forecast PDF provides a confidence interval, which
is valuable for many practical applications. The variance of the overall forecast PDF is the
result of two components. The first component is associated with the spread of the ensemble
members. The second component is the variance of the individual model forecast PDFs.
This latter component should also be determined over a training period, which can be
different from the training period mentioned earlier. In the present study identical training
periods were used to determine both the BMA weight and the variance of the individual
models.
The BMA method assumes that the probability of an observation y
obs(s,t) at location s and
time t is given by a weighted sum over a number of probability distributions g(y
fc(k,s,t))
from the different forecasting models k.
( , )
( )
( , , )
obs fc
k
P y
s t
w k g y k s t
(1)
The forecast distribution of each individual model is assumed a normal distribution with
variance (k).
2 2( , )
( , , )
1
( , , ) |
( , ), ( )
exp
2 ( )
( ) 2
obs fc fc obsy
s t
y k s t
g y k s t y
s t
k
k
k
(2)
The BMA algorithm finds the optimal values for w(k) and (k), such that the likelihood of
the overall PDF (equation (1)) is maximal, given a set of historical forecasts and
observations.
( , ) |
( , , ), ( )
( , , )
( , ) |
( , , ), ( )
obs fc obs fc kg y
s t y k s t
k
z k s t
g y
s t y k s t
k
(3)
The second step in the iterative algorithm is to determine the weights w(k) and variances
(k) of each of the models k, based on the values of z(k,s,t):
,
1
( )
( , , )
s tw k
z k s t
n
(4)
where n is the number of observations in the training period (s,t).
2 , 2 ,
( , )
( , )
( , , )
( )
( , , )
obs fc s t s tz s t y
s t
y k s t
k
z k s t
(5)
The two steps are alternated to convergence, i.e. when w(k) and (k) no longer change after
a recalculation of z(k). One can use a convergence criterion or alternatively use a fixed
number of iteration cycles that should guarantee convergence. In the current setup it is found
that around 10 iteration cycles are generally sufficient, w(k) and (k) have reached a stable
value. To be sure that the iteration error is negligible, 12 cycles were used for the production
runs so far. For the dynamical model applications one can use the weights and variances
from the previous time step as a starting point for the new iteration.
A bias correction for the forecasts greatly improves the performance of the BMA. In fact,
without bias correction the BMA is forced to make a linear regression without a constant.
Adding a constant gives a much better result. The bias correction can be applied to the
forecasts over the training period before
After the iteration has converged an overall forecast mean for each of the stations can be
computed from equation (1). Due to missing forecasts, the sum of the weights can deviate
from unity. Therefore, a second normalization is applied to correct for missing forecast
values. This amounts to giving the remaining models a larger weight if one model is
missing.
The overall forecast confidence intervals are computed by integrating the weighted sum of
the individual forecast PDFs. For instance, the 10% quantile Q10(s,t_fc) is given by:
10
0.1
( )
( , ,
)
Q fc fc kw k g y k s t
dh
(6)
3
Program description
The BMA method is implemented as a collection of Perl script in Matroos, using
observations and forecasts for five locations along the Dutch coast (Delfzijl, Harlingen, Den
Helder, Hoek van Holland, Vlissingen) and forecasts from seven hydrodynamic models
(Table 1). The BMA module ‘main.pl’ is called every time a new DCSM forecast is received
in Matroos, which amounts to about every six hours.
The BMA script is an independent module within the Matroos environment. The Matroos
server treats the BMA module as a new source of forecasts: ‘bma_noos’. A pre-processing
module creates a ‘bma.inc’ and starts the BMA module. This module then reads the
necessary data from the Matroos database, using ‘readMatroos’. The output is written in
NOOS format to temporary files. The general Matroos module for reading data then reads
this information as if it were just another forecast.
The BMA method can be applied to water level or surge data. The latter is preferred,
because of the relatively large errors that are observed for models that are not calibrated for
the locations along the Dutch coast.
The observation and forecast data is not always complete in the sense that for locations
and/or time steps data values are missing. The BMA method can deal with this: equations
(4) and (5) allow for missing values in the data set (s,t). As long as there is at least one
forecast-observation combination in the training set a weight w(k) and a (k) can be
determined. Missing data causes the effective training period to be shorter and could lead to
problems if only very few data points remain. This is unlikely to happen, because missing
data over longer periods of time are usually noticed and solved in an operational
environment. Therefore, the current BMA implementation gives no special warning for this.
The algorithm is run by a collection of Perl modules, which will be described below. The
‘main.pl’ script is the central program that calls the other modules.
main.pl - Parameter settings
The ‘main.pl’ script starts off by defining a number of global parameters. They are stored in
a structure ‘BMA’:
my %BMA = ( 'models' => ['knmi_noos','knmi_noos_kalman','bsh_oper', 'mumm_oper','dmi_oper','dnmi_oper', 'ukmo_oper'], 'locations' => ['Delfzijl','HoekVanHolland','Vlissingen', 'Den Helder','Harlingen'],'start_training' => 200709250000, # Start of the training period 'stop_training' => 200710020000, # end of the training period 'start_forecast' => 200710020000, # start of the forecast period 'stop_forecast' => 200710040000, # end of the forecast period 'forecast_horizon' => 1440, # minimum forecast horizon 'max_forecast_horizon' => 6000, # maximum forecast horizon 'interval' => 10, # minutes
'n_iterations' => 12, # number of iterations 'uncertainty_low' => 0.05,
The array ‘models’ defines the models that will be used in the BMA forecast. The array
‘locations’ defines the locations that will be considered. Note that only a single weight is
calculated for each model. If we include two locations that are very different, we will get a
forecast that gives, on average, the best result for both locations.
The parameter ‘start_training’ defines the start of the training period. This may include
forecasts from earlier times, giving a forecast for a time just within the training period. The
parameter ‘stop_training’ defines the end of the training period. For most operational
applications this will be the current time, including the most recent observations in the
training period. The length of the training period can be varied to optimize the BMA
forecast.
The parameter ‘start_forecast’ defines the start of the forecast. For most operational
applications this will be the current time. The parameter ‘stop_forecast’ defines the end of
the forecast. It can be any length, although the maximum forecast horizon of the models is a
sensible limit.
The parameter ‘forecast_horizon’ defines a lower limit to the time (in minutes) between the
issue of the forecast (the analysis time) and the forecast time (the time for which a water
level is predicted). A forecast horizon of 1440 (24 hrs) means that the most recent forecast at
24 hrs before the forecast from all models was taken. The average forecast time is therefore
more than 24 hrs, depending on the forecast rate of the model. For 12 hr update rates the
average forecast time is 30 hrs. Some test calculations were done on 12 hr forecast, giving
similar results. The parameter ‘max_forecast_horizon’ defines the maximum forecast time
(in minutes) for any of the models. It is used only to select forecasts that could be relevant
for the training period. For the actual training, the most recent forecast is used.
The parameter ‘nr_iterations’ defines the (fixed) number of iterations in the
expectation-maximization scheme. In the current setup we found that around 10 iteration cycles are
generally sufficient and we use 12 cycles for the production runs. The parameters
‘uncertainty_low’ and ‘uncertainty_high’ define the bounds of the confidence interval.
The above parameter settings in the ‘main.pl’ script are default values. They can be
overruled in a ‘bma.inc’ file, which has the following format:
$BMA{'start_training'} = '200709250600'; $BMA{'stop_training'} = '200710020600'; $BMA{'start_forecast'} = '200710020600'; $BMA{'stop_forecast'} = '200710040600'; $BMA{'locations'} = ['Delfzijl'];
If the ‘BMA.inc’ file is non-existent. The default values will be used.
main.pl - Calculations
The array “forecast” contains the BMA mean forecast (expectation value). The hash key
is “<location>:<minutes>”, where minutes is the number of minutes since January
1
st2000. Model and location are entries from the arrays BMA::models and
BMA::locations.
Array “h_fc” contains the model forecasts. Hash key:
“<model>:<location>:<minutes>”.
The array “bias” contains the bias over the training period per model. The hash key is
“<model>:<location>“.
The array “sigma” contains the computed spread for each model forecast, calculated over
the training period. The hash key is “<model>“.
The array “weight” contains the computed BMA weight for each model, calculated over
the training period. The hash key is “<model>“.
my ($forecast,$h_fc,$bias,$sigma,$weight) = BMA::BMA_forecast(\%BMA);
The routine ‘BMA_forecast’ is described in detail in the next section.
After this, the confidence interval is computed numerically in the routine
compute_uncertainty
.
This
returns
a
hash
table
confidence, key is:
“<lower,upper>:<location>:<minutes>”. The confidence interval is, in fact,
two new time series: one for the lower bound and one for the upper bound.
my %confidence =
BMA::compute_uncertainty($uncertainty_low,$uncertainty_high, \%BMA,$forecast,$h_fc,$bias,$sigma,$weight);
BMA_forecast - Calculations
The routine ‘BMA_forecast’ starts by reading observations and forecasts from the
Matroos database:
my $dtg1 = $BMA{'start_training'};
my $dtg2 = $BMA{'stop_training'};
my ($observed,$nr_obs) = readdata::readMatroos($locations,
['observed'],$dtg1,$dtg2,-1,-1,'');
my ($h_fc,$nr_fc) =
readdata::readMatroos($locations,$models,$dtg1,
$dtg2,$forecast_horizon,$max_forecast_horizon,'');
The observations and forecasts are stored in ASCII files, carrying the name of the analysis
time and the name of the model (or ‘observed’ for the observations. The extension can be
either sealev or surge, depending on the quantity that we are training on.
Next, the bias is computed for each model over the training period (compute_bias), the
BMA EM-algorithm is run (compute_weights) and the BMA forecast is made, using
these weights (make_forecast).
%bias = compute_bias($BMA,$observed,$h_fc);
The result is stored in a table called BMA_fc, which is returned to ‘main.pl’.
main.pl - Output
Finally, the ‘main.pl’ script writes the output to file. Two new directories are created for the
lower and upper bounds of the confidence interval.
`mkdir -p bma_$percentage_low`; `mkdir -p bma_$percentage_high`;
Then, for each location, a file is created with the name <location>.sealev. Some
header information is printed:
"# BMA $percentage_low % confidence\n";Next, the
time series from start_forecast to stop_forecast is printed. Finally, the files
are closed.
foreach my $s (@locations) { my $s1 = $s;
$s1 =~ s/\s//g;
open(BMA05,">bma_$percentage_low/$s1.sealev") or die "Could not open output file: $!\n";
open(BMA95,">bma_$percentage_high/$s1.sealev") or die "Could not open output file: $!\n";
print BMA05 "# BMA $percentage_low % confidence\n"; print BMA95 "# BMA $percentage_high % confidence\n";
for(my $t=$start_forecast; $t<$stop_forecast; $t+=$interval) { $dtg = dtg::min2dtg($t);
my $fc_low = $confidence{"lower:$s:$t"}; my $fc_up = $confidence{"upper:$s:$t"}; print BMA05 "$dtg $fc_low\n";
print BMA95 "$dtg $fc_up\n"; }
close (BMA05); close (BMA95); }
dtg.pm - Time units and conversions
The observation and forecast files are identified by a dtg (date-time-group). A dtg is a
10-digit number that contains a year, month, day, hour and minutes. For example:
200710030910 represents October 10, 2007, 10 minutes past 9 AM. Within the BMA
program, this dtg is converted to an integer for convenience reasons. This integer is the
number of minutes since January 1
st1970 (Julian time). This reference date can be altered to
another date if necessary.
The dtg and number of minutes can be converted to each other by two routines in the
module dtg.
4
Conclusions and recommendations
The BMA method is a promising method for generating probabilistic water level forecasts
from an ensemble of model forecasts. The method is applicable in the Matroos environment,
making use of the information provided by NOOS partners. The main added value of the
BMA method is to produce a probabilistic forecast, rather than an additional deterministic,
or mean value forecast.
A test version of the BMA module is running within the VMWare version of Matroos since
October 2007. The BMA probabilistic forecasts (5, 50 and 95% levels) are being issued as
additional time series for the Matroos database. Figure 3 is a screendump of the Matroos
system with the BMA forecast (mean and confidence interval) as three additional time
series.
Figure 3: The Matroos surge ensemble prediction system with the BMA forecast (5, 50 and 95%) as three additional time series (orange).
Further activities within the scope of the current project (beginning of 2008) are:
Several small adjustments will be made to improve the quality if the BMA forecast. For
instance, the training period can be varied to optimize the performance.
From the forecasts up to the time of writing, it is observed that the bounds of the
confidence interval are sometimes unstable (see Figure 3). This problem needs to be
solved.
An evaluation and validation of the BMA method will take place by the end of the storm
season (spring 2008). The evaluation will focus on three aspects of the BMA forecast:
–
The correct prediction of the confidence interval. Ideally, after 100 events, 90
observations will be within the 90% confidence interval of the BMA forecast. Five
observations will be below the lower confidence bound and five events will be
above the upper bound.
square error (RMSE). If the RMSE of the BMA is larger than that of the DCSM
model, the BMA forecast is of little value to the forecasters.
–
Finally, the performance of the BMA method during extreme events should be
evaluated. A number of storm events will be studied.
As a limited extension of the current project, the following activities are recommended for
the beginning of 2008:
At the moment, a single weight is used for every forecast horizon. Using different
weights for different forecast horizons (e.g. 0-6 hrs, 6-12 hrs, >12 hrs) can improve the
quality of the BMA forecast. This will, however, introduce discontinuities in the BMA
forecast at the regime boundaries.
Within ECOOP, the suggestion has been made that an exceedance probability of certain
critical water levels are calculated as time series.
Instead of forecasting and training on the water levels or surge, one could train the
models on the values of the high tides (i.e. the maximum water level), since this is the
quantity that we are specifically interested in.
The source code should be revised such that it meets the demands of an operational
forecasting system. This should be done by professional IT people.
Further suggestions beyond the scope of the current project are:
In many cases a storm surge develops within only one or two days. The performance of
the forecasting models over the relatively quiet period before the storm is somewhat
irrelevant to the performance during the storm. One would prefer to determine the
predictive power of the individual models on the basis of similar situations and to train
the BMA on comparable storms. Similar situations can be selected by:
–
taking into account only the extremely high tides
–
similar meteorological situations, based on a few characteristics
Some of the NOOS partners are running their deterministic models in an ensemble,
using an ensemble of weather forcing input. Incorporating these ensembles in the BMA
method is not straightforward. Simply treating every member of the ensemble as a new
model ignores the fact that these forecasts were generated by the same model and they
should get the same weight. Some research is needed to find a method to include the
ensemble in the BMA method in a proper way.
5
References
Gerritsen (1994), H. Gerritsen, J.W. de Vries, M. Philippart – The Dutch Continental Shelf Model, Quantitative
Skill Assessment for Coastal Ocean Models, Coastal and Estuarine Studies, vol 48
NOOS,
www.noos.cc
Heemink (1990), A.W. Heemink, H. Kloosterhuis – Data assimilation for non-linear tidal models. Int. J. of numerical methods in fluids, 11 1097-1112
Leamer (1978), E.E. Leamer - Specification searches. Wiley, NY
Rafetery (2005), A.E. Raftery, T. Gneiting, F. Balabdaoui, M. Polakowski - Using Bayesian Model Averaging to
A
Source Code
Main.pl
#! /usr/bin/perl use FindBin ('$Bin'); use lib "$FindBin::Bin"; use BMA; use dtg; use readdata; use Cwd; my $debug = 0; ######################################################################## # make one forecast, based on a training period.
######################################################################## ######################################################################## ### set parameters ######################################################################## my %BMA = ( 'models' => ['knmi_noos','knmi_noos_kalman','bsh_oper','mumm_oper', 'dmi_oper','dnmi_oper','ukmo_oper'], 'locations' => ['Delfzijl','HoekVanHolland','Vlissingen','Den Helder', 'Harlingen'],
'start_training' => 200709250000, # Start of the training period 'stop_training' => 200710020000, # end of the training period 'start_forecast' => 200710020000, # start of the forecast period 'stop_forecast' => 200710040000, # end of the forecast period 'forecast_horizon' => 1440, # minimum forecast horizon used in training period (24 hrs)
'max_forecast_horizon' => 6000, # maximum forecast horizon for any of the models (100 hrs)
'interval' => 10, # minutes
'n_iterations' => 12, # number of iterations in BMA algorithm
'uncertainty_low' => 0.05, 'uncertainty_high' => 0.95, );
# Include user options my $pwd = cwd;
my $incfile = "$pwd/bma.inc"; if ( -f $incfile ) {
print "Including non-default options from file $incfile\n";
open INC,"<$incfile" or die "Cannot open include file $incfile $!\n"; my @lines = <INC>;
close (INC); eval "@lines"; }
my @models = @{$BMA{'models'}}; my @locations = @{$BMA{'locations'}}; my $nmodels = scalar(@models); my $nlocations = scalar(@locations);
print "models: @models\nlocations:@locations\n" if $debug; # Get uncertainty_low and _high
my $uncertainty_low = $BMA{'uncertainty_low'}; my $uncertainty_high = $BMA{'uncertainty_high'};
my $percentage_low = sprintf("%2.2d",$uncertainty_low *100); my $percentage_high = sprintf("%2.2d",$uncertainty_high*100);
######################################################################## ### do the actual calculations:
my ($forecast,$h_fc,$bias,$sigma,$weight) = BMA::BMA_forecast(\%BMA);
my %confidence = BMA::compute_uncertainty($uncertainty_low,$uncertainty_high,
\%BMA,$forecast,$h_fc,$bias,$sigma,$weight);
######################################################################## ### write debug output
### read observations for comparison: if ( $debug ) { my $dtg1 = $BMA{'start_forecast'}; my $dtg2 = $BMA{'stop_forecast'}; my ($observed,$nr_obs) = readdata::readMatroos(\@locations,['observed'], $dtg1,$dtg2,-1,-1,''); my %observed = %$observed; open(OUTFILE,">output/BMAforecast.txt") or die "Could not open output file: $!\n"; print OUTFILE "BMA forecast:\ntime\t";
for my $s(@locations) {
print OUTFILE "$s\t\t\tobs\t"; }
my $start_forecast = dtg::dtg2min($BMA{'start_forecast'}); my $stop_forecast = dtg::dtg2min($BMA{'stop_forecast'}); my $interval = $BMA{'interval'};
for(my $t=$start_forecast; $t<$stop_forecast; $t+=$interval) { $dtg = dtg::min2dtg($t); print OUTFILE "\n$dtg"; foreach my $s(@locations) { my $fc = $forecast{"$s:$t"}; my $fc_low = $confidence{"lower:$s:$t"}; my $fc_up = $confidence{"upper:$s:$t"}; my $obs = $observed{"$s:$t"};
print OUTFILE "\t$fc\t$fc_low\t$fc_up\t$obs"; }
my $start_forecast = dtg::dtg2min($BMA{'start_forecast'}); my $stop_forecast = dtg::dtg2min($BMA{'stop_forecast'}); my $interval = $BMA{'interval'}; `mkdir -p bma_$percentage_low`; `mkdir -p bma_$percentage_high`; foreach my $s (@locations) { my $s1 = $s; $s1 =~ s/\s//g; open(BMA05,">bma_$percentage_low/$s1.sealev") or die "Could not open output file: $!\n";
open(BMA95,">bma_$percentage_high/$s1.sealev") or die "Could not open output file: $!\n";
print BMA05 "# BMA $percentage_low % confidence\n"; print BMA95 "# BMA $percentage_high % confidence\n";
for(my $t=$start_forecast; $t<$stop_forecast; $t+=$interval) { $dtg = dtg::min2dtg($t);
my $fc_low = $confidence{"lower:$s:$t"}; my $fc_up = $confidence{"upper:$s:$t"}; print BMA05 "$dtg $fc_low\n";
print BMA95 "$dtg $fc_up\n"; } close (BMA05); close (BMA95);
}
readdata.pm
#!/usr/bin/perl package readdata; use strict; use dtg; my $debug = 0; #################################################################### # module readdata, functions:# remove_no_obs removes the forecasts for which there is no observation. # readobservations() reads observation data.
# readMatroos() reads forecast data. #
# observations in %observed {location:minutes} # forecasts in %h_fc {model:location:min} #################################################################### #################################################################### sub readMatroos ($$$$$$$) { #################################################################### my ($locs,$sources,$start,$stop,$fc_min,$fc_max,$anal_time) = @_; $sources = ['observed'] unless $sources;
$fc_min = -1 unless $fc_min; $fc_max = -1 unless $fc_max;
">> locs : @$locs\n". ">> fc_min: $fc_min\n".
">> fc_max: $fc_max\n" if $debug; # declare return hash %values
my %values;
my $overallcount = 0;
# do for each requested location foreach my $loc (@$locs) {
# do for each requested source foreach my $source (@$sources) {
# Set up the script call for getting the data my $matroos = "wget --proxy=off -O - ".
"'http://matroos/direct/get_series.php?". "loc=$loc&source=$source&unit=waterlevel&". "tstart=$start&tstop=$stop&fc_min=$fc_min&". "fc_max=$fc_max&anal_time=$anal_time'"; # Get the data from Matroos
my $ret = `$matroos 2>/dev/null`; my @data = split ("\n",$ret); my $count= 0;
my $model= "$source:";
$model= '' if ($source eq 'observed'); # Interpret the output
foreach my $line (@data) {
if($line =~ m/^\s*(\d+)\s+([^\s]+)/) { my $time = dtg::dtg2min($1); my $waterlevel = $2; $values{"$model$loc:$time"} = $waterlevel; $count++; } }
print "read $count values for location $loc, source $source\n" if $debug; $overallcount += $count; } } return (\%values,$overallcount); } ### end readMatroos() #################################################################### sub remove_no_obs($$) { #################################################################### my ($observed,$h_fc) = @_; my %observed = %$observed; my %h_fc = %$h_fc; my $count = 0;
} else { if(abs($observed{$indx2} - $h_fc{$indx}) > 5) { undef($h_fc{$indx}); $count++; } } } else {
print "??? could not recognise indx: $indx\n"; exit(1); } } return $count } ################################################################## sub check_data($$$$$$) { ################################################################## my ($start,$stop,$model,$locations,$observed,$h_fc) = @_; my @locations = @$locations; my %observed = %$observed; my %h_fc = %$h_fc; $start = dtg::dtg2min($start); $stop = dtg::dtg2min($stop); ### check observed and forecasts open(CHECK1, ">output/checkobs.txt"); print CHECK1 "time";
foreach my $loc (@locations) { print CHECK1 "\t$loc"; }
print CHECK1 "\n";
for(my $time=$start; $time<$stop; $time+=10) { my $dtg = dtg::min2dtg($time);
print CHECK1 $dtg;
foreach my $loc (@locations) { my $obs = $observed{"$loc:$time"}; print CHECK1 "\t$obs";
} print CHECK1 "\n"; } close CHECK1; ### check forecast open(CHECK2, ">output/checkforecast.txt"); print CHECK2 "time";
foreach my $loc (@locations) { print CHECK2 "\t$loc"; }
print CHECK2 "\n";
for(my $time=$start; $time<$stop; $time+=10) { my $dtg = dtg::min2dtg($time);
print CHECK2 $dtg;
foreach my $loc (@locations) {
my $fc = $h_fc{"$model:$loc:$time"}; print CHECK2 "\t$fc";
}
close CHECK2; return; } 1; BMA.pm #!/usr/bin/env perl package BMA; use readdata; use dtg; use normal; my $debug = 0; %observed = ();
%h_fc = (); # water level forecasts in %h_fc{model:location:min}
%weight = (); # weight for each forecast model
%sigma = (); # sigma for each forecast model
%BMA_forecast = (); %avg_forecast = (); %confidence= ();
#################################################################### # make one forecast, based on a training period.
#################################################################### sub BMA_forecast($) {
my $BMA = shift; my %BMA = %$BMA;
# Get settings from %BMA
my $locations = $BMA{'locations'}; my $models = $BMA{'models'}; my $dtg1 = $BMA{'start_training'}; my $dtg2 = $BMA{'stop_training'}; my $interval = $BMA{'interval'}; my $forecast_horizon = $BMA{'forecast_horizon'}; my $max_forecast_horizon = $BMA{'max_forecast_horizon'}; ### read observation and forecast data
my ($observed,$nr_obs) = readdata::readMatroos($locations,['observed'], $dtg1,$dtg2,-1,-1,'');
print "read total of $nr_obs observations from $dtg1 to $dtg2\n"; %observed = %$observed;
my ($h_fc,$nr_fc) = readdata::readMatroos($locations,$models,$dtg1,$dtg2, $forecast_horizon,$max_forecast_horizon,'');
print "read total of $nr_fc forecasts from $dtg1 to $dtg2\n"; %h_fc = %$h_fc;
my $n_removed = readdata::remove_no_obs($observed,$h_fc);
### make forecast based on the Bayes weights (set forecast horizon to zero) my $dtg1 = $BMA{'start_forecast'}; my $dtg2 = $BMA{'stop_forecast'}; my $forecast_horizon = -1; ($h_fc,$nr_fc) = readdata::readMatroos($locations,$models,$dtg1,$dtg2, $forecast_horizon,$max_forecast_horizon,$dtg1); print "read total of $nr_fc forecasts from $dtg1 to $dtg2\n"; if ( $debug ) {
my ($obs2,$nr) = readdata::readMatroos($locations,['observed'], $dtg1,$dtg2,-1,-1,'');
print "read total of $nr observed from $dtg1 to $dtg2\n";
readdata::check_data($dtg1,$dtg2,"knmi_noos_kalman",$locations,$obs2,$h_fc) ; } %BMA_fc = make_forecast($BMA,\%weight,$h_fc,\%bias); return (\%BMA_fc,$h_fc,\%bias,$sigma,\%weight); } #################################################################### ### compute uncertainties ####################################################################sub compute_uncertainty($$$$$$$$) { my $lowerperc = shift; my $upperperc = shift; my ($BMA,$fc_BMA,$h_fc,$bias,$sigma,$weight) = @_; my %BMA = %$BMA; my %fc_BMA = %$fc_BMA; my %h_fc = %$h_fc; my %bias = %$bias; my %sigma = %$sigma; my %weight = %$weight; my $start_forecast = dtg::dtg2min($BMA{'start_forecast'}); my $stop_forecast = dtg::dtg2min($BMA{'stop_forecast'}); my $interval = $BMA{'interval'}; my @locations = @{$BMA{'locations'}}; my @models = @{$BMA{'models'}};
if($lowerperc <= 0) {print "error in compute_uncertainty\n"; exit(1); } if($upperperc >= 1) {print "error in compute_uncertainty\n"; exit(1); } if($lowerperc > $upperperc) {print "error in compute_uncertainty\n"; exit(1); }
print "compute confidence intervals for $lowerperc .. $upperperc.\n"; my $largest_sigma = 0;
my $smallest_sigma = 1; foreach my $k(@models) {
if(defined($sigma{$k}) && ($sigma{$k}> 0)) {
if($sigma{$k} < $smallest_sigma) { $smallest_sigma = $sigma{$k}; } if($sigma{$k} > $largest_sigma) { $largest_sigma = $sigma{$k}; } }
}
my $dHW = $smallest_sigma*0.01;
my $HW = $fc_BMA{"$s:$t"} - 3*$largest_sigma; my $CDF = 0;
my $count = 0;
while (($CDF < $upperperc) && ($count < 10000)) { my $totalweight = 0; my $addtoCDF = 0; foreach my $k(@models) { my $indx = "$k:$s:$t"; if(defined($h_fc{$indx})) { my $fc = $h_fc{$indx} - $bias{"$k:$s"}; my $PDF = normal::normal_PDF($HW, $fc, $sigma{$k}); $addtoCDF += $weight{$k}*$PDF; $totalweight += $weight{$k}; } } if($totalweight > 0) { $CDF += $dHW * $addtoCDF / $totalweight; } else { my $dtg = dtg::min2dtg($t);
print "no weights for $s at t=$t\n"; my $done = <>; exit(1);
}
if(($CDF > $lowerperc) && (!defined($confidence{"lower:$s:$t"} ))) { $confidence{"lower:$s:$t"} = $HW - 0.5*$dHW; } $HW += $dHW; $count++; } $confidence{"upper:$s:$t"} = $HW - 0.5*$dHW; } } return %confidence; } #################################################################### ### compute bias #################################################################### sub compute_bias($$$) { my ($BMA,$observed,$h_fc) = @_; my %BMA = %$BMA; my %observed = %$observed; my %h_fc = %$h_fc; my $t_start = dtg::dtg2min($BMA{'start_training'}); my $t_stop = dtg::dtg2min($BMA{'stop_training'}); my $t_intv = $BMA{'interval'}; my @locations = @{$BMA{'locations'}}; my @models = @{$BMA{'models'}}; my %bias = (); my %n_bias = ();
my @l = sort keys %observed; foreach my $s(@locations) {
$bias{"$k:$s"} = $h_fc{$indx} - $obs; $n_bias{"$k:$s"} = 1;
} else {
$bias{"$k:$s"} += $h_fc{$indx} - $obs; $n_bias{"$k:$s"}++; } } } } } }
foreach my $indx (keys %bias) { $bias{$indx} /= $n_bias{$indx};
print "bias $indx = $bias{$indx}\n" if $debug; }
return %bias; }
#################################################################### ### iteration: expectation-maximization algorithm
#################################################################### sub compute_weights() { my ($BMA,$observed,$h_fc,$bias) = @_; my %BMA = %$BMA; my %observed = %$observed; my %h_fc = %$h_fc; my %bias = $bias; my $start_training = dtg::dtg2min($BMA{'start_training'}); my $stop_training = dtg::dtg2min($BMA{'stop_training'}); my $interval = $BMA{'interval'}; my @locations = @{$BMA{'locations'}}; my @models = @{$BMA{'models'}}; my $n_iterations = $BMA{'n_iterations'}; ### general variables, used in the BMA algorithm
my %weight = (); # weights for each forecast model
my %sum_weight = (); # average weights for each forecast model
my %n_weight = (); # average weights for each forecast model
my %z; # z-function (see literature)
my $nmodels = scalar(@models); ### initialise
foreach my $k (@models) {
$weight{$k} = 1/$nmodels; # intitially the weight of all models is equal
$sum_weight{$k} = 0; $n_weight{$k} = 0; if($sigma{$k} <= 0) {
$sigma{$k} = 0.2; # if unknown, sigma is initialised to 20 cm. }
}
### iteration
for(my $i=0; $i<$n_iterations; $i++) { print "iteration $i\n";
foreach my $s (@locations) {
for (my $t = $start_training; $t<$stop_training; $t+=$interval) { if(defined($observed{"$s:$t"})) { my $norm = 0; my $obs = $observed{"$s:$t"}; foreach my $k (@models) { my $indx = "$k:$s:$t"; if(defined($h_fc{$indx})) { my $fc = $h_fc{$indx} - $bias{"$k:$s"};
$z{$indx} = normal::normal_PDF($obs, $fc, $sigma{$k}); $norm += $z{$indx};
} }
foreach my $k(@models) { ### normalise z
my $indx = "$k:$s:$t"; if(defined($h_fc{$indx})) { if($norm <= 0) {
print "norm=$norm, indx=$indx, k=$k\n";
print "fc=$fc, obs=$obs, sigma=$sigma{$k}, z=$z{$indx}\n"; } else { $z{$indx} /= $norm; } } } if($norm <= 0) { my $dtg = dtg::min2dtg($t);
print "no forecasts for $s at t=$dtg, exit.\n"; my $done = <>; exit(1);
} } } }
### step 2: maximization, compute sigma for my $k (@models) {
my $sigma2_k = 0; my $sum_z = 0;
for my $s (@locations) {
for (my $t = $start_training; $t<$stop_training; $t+=$interval) { if(defined($observed{"$s:$t"})) { my $indx = "$k:$s:$t"; $sigma2_k += $z{$indx}*normal::sqr($h_fc{$indx} -$bias{"$k:$s"} - $observed{"$s:$t"}); $sum_z += $z{$indx}; } } } if($sum_z > 0) { $sigma{$k} = sqrt($sigma2_k/$sum_z); } else { $sigma{$k} = 0; }
$weight{$k} = 0; my $nst = 0;
for my $s(@locations) {
for (my $t = $start_training; $t<$stop_training; $t+=$interval) { my $indx = "$k:$s:$t"; if(defined($z{$indx})) { $weight{$k} += $z{$indx}; $nst++; } } } if($nst > 0) { $weight{$k} /= $nst; } else { $weight{$k} = 0; } $sum_w += $weight{$k}; }
### sum weights may not be unity due to missing observations, make a correction for my $k(@models) { if($weight{$k} > 0) { $weight{$k} /= $sum_w; $sum_weight{$k} += $weight{$k}; $n_weight{$k}++; }
print "weight($k) = $weight{$k}\n" if $debug; }
}
return (\%sigma,%weight); }
#################################################################### ### make the actual BMA forecasts, using the Bayes weights
#################################################################### sub make_forecast() { my ($BMA,$weight,$h_fc,$bias) = @_; my %BMA = %$BMA; my %weight = %$weight; my %h_fc = %$h_fc; my %bias = $bias; my $start = dtg::dtg2min($BMA{'start_forecast'}); my $stop = dtg::dtg2min($BMA{'stop_forecast'}); my $interval = $BMA{'interval'}; my @locations = @{$BMA{'locations'}}; my @models = @{$BMA{'models'}}; print "Make forecast using weights:\n"; for my $k(@models) {
my $w = $weight{$k}; print "$k\t$w\n"; }
for(my $t=$start; $t<=$stop; $t+=$interval) { for my $s(@locations) {
my $indx = "$k:$s:$t"; if(defined($h_fc{$indx})) { my $fc = $h_fc{$indx} - $bias{"$k:$s"}; $nmodels++; $sumweights += $weight{$k}; $avg_forecast{"$s:$t"} += $fc; $BMA_forecast{"$s:$t"} += $weight{$k} * $fc; } } if($sumweights > 0) { $avg_forecast{"$s:$t"} /= $nmodels; $BMA_forecast{"$s:$t"} /= $sumweights; } else { my $dtg = dtg::min2dtg($t);
print "no forecasts for $s at t=$dtg.\n"; $avg_forecast{"$s:$t"} = 0; $BMA_forecast{"$s:$t"} = 0; } } } return %BMA_forecast; } 1; dtg.pm #!/usr/bin/env perl package dtg; use strict; use Time::Local; ######################################################################## # module dtg
# dtg2min() calculate minutes since jan 1 2000
# min2dtg() calculate dtg from minutes since jan 1 2000 #
######################################################################## sub dtg2min($) {
# Purpose: convert gregorian time to julian minutes # args: date/time string in format YYYYMMDDhhmm # output: julian minutes
my $dtg =shift; my ($year,$month,$day,$hour,$min) = $dtg=~/^(\d\d\d\d)(\d\d)(\d\d)(\d\d)(\d\d)/; my $secs = timegm(0,$min,$hour,$day,$month-1,$year); my $minutes = $secs/60; return $minutes; } sub min2dtg($) {