Data mining, knowledge discovery and data-driven modelling

(1)

Acknowledgement

The authors acknowledge all partners mentioned in the report.

Conditions of (re-)use of this publication

Parts of this report cay be cited only upon the written permission of the authors.

Title:

Data mining, knowledge discovery

and data-driven modelling

Author:

D.P. Solomatine

Institute:

IHE Delft

Author:

_{S. Velickov}

Institute:

_{IHE Delft}

Author

B. Bhattacharya

Institute:

IHE Delft

Author

_{Bas van der Wal}

_STOWA

June 2003

Number of pages

Keywords (3-5) :

Data mining, data-driven modelling, neural networks, fuzzy systems, chaos theory, cone penetration tests, water levels, sedimentation, polder management

DC-Publication-number : DC1-752-1 Institute Publication-number

(optional) :

Report Type : Intermediary report or study

: Final project report

DUP-publication Type : DUP Standard

(2)

Abstract

The project was aimed at exploring the possibilities of a new paradigm in modelling – data-driven modelling, often referred as “data mining”. Several application areas were considered: sedimentation problems in the Port of Rotterdam, automatic soil classification on the basis of cone penetration tests, prediction of water levels in the ocean, and others. The methods applied included: artificial neural networks, fuzzy systems, M5 model trees, chaos theory. The knowledge was disseminated using the following means: several publications at the conferences and peer-reviewed journals, Symposium for 80 Dutch participants from various civil engineering organisations (April 2002), Web-based platform with the main project results, software and knowledge base of best practices and publications. The developed research methodologies were practically used in the Delft Cluster research Themes and by external partners. Two PhD researches are underway and 4 MSc projects were successfully completed. A comprehensive Knowledge Platform (http://datamining.ihe.nl) was built; it includes the educational components, links to the news and events in the area of data-driven modelling, knowledge base with the publications, software and best practices, and software with the examples of real case studies.

The achieved results make possible to conclude that the data-driven methods can be effectively used in a wide range of civil engineering problems, complementing and sometimes replacing more traditional simulation models.

PROJECT NAME: Data mining, knowledge discovery and

data-driven modelling PROJECT CODE: 07.05.02

BASEPROJECT NAME: Encapsulated knowledge systems BASEPROJECT CODE: 07.05 T H E M E N A M E : Knowledge management T H E M E C O D E : 07

(3)

Executive Summary

Data mining, machine learning and data-driven modelling are becoming one of effective approaches to solving modelling problems. The project was aimed at exploring the possibilities of a this new

paradigm. A number of applications were considered: sedimentation problems in the Port of Rotterdam, automatic soil classification on the basis of cone penetration tests, prediction of water levels in the ocean, and others. The methods applied included: artificial neural networks, fuzzy systems, M5 model trees, chaos theory. The knowledge was disseminated using the following means: several publications at the conferences and peer-reviewed journals, Symposium for 80 Dutch participants from various civil engineering organisations (April 2002), Web-based platform with the main project results, software and knowledge base of best practices and publications. The developed research methodologies were practically used in the Delft Cluster research Themes and by external partners. Two PhD researches are underway and 4 MSc projects were successfully completed.

For dissemination of knowledge, the Knowledge Platform (http://datamining.ihe.nl) was built; it includes the educational components, links to the news and events in the area of data-driven modelling, knowledge base with the publications, software and best practices, and software with the examples of real case studies.

It can be concluded that the data-driven methods can be effectively used in a wide range of civil engineering problems, complementing and sometimes replacing more traditional simulation models.

PROJECT NAAM: Data mining, knowledge discovery and

BASISPROJECT NAAM: Encapsulated knowledge systems BASISPROJECT CODE: 07.05 T H E M A N A A M: Knowledge management T H E M A C O D E : 07

(4)

Applicability for the sector

There are clear possibilities for applicability of the methods developed in industry. Wherever there is enough data describing the processes under study, the data-driven modelling methods can be applied. They can successfully complement or even replace some traditional physically-based (behavioural) models.

Some of applications (sedimentation modelling, real-time control of water systems, identification of the soil types on the basis of cone penetration tests and others) are described in the report in detail.

PROJECT NAME: Data mining, knowledge discovery and

BASEPROJECT NAME: Encapsulated knowledge systems BASISPROJECT CODE: 07.05

(5)

Societal Relevance of the research

The societal relevance of this research stems from the fact that the engineers have better tools to model the relationships between various factors influencing the civil engineering objects. This has a positive influence on the way management is done since manager will have more accurate estimates of the consequences of their actions, and consequently, people will benefit.

The knowledge generated could be interesting to mass media since it demonstrates how methods developed in one area of science (computational intelligence) can be successfully used in another, much more traditional, area – civil engineering.

PROJECT NAME: Data mining, knowledge discovery and

(6)

1 Basic information about the project

This section is based on the Project Proposal but is reformulated and complemented in a way that gives indication of the achieved results.

1.1 Project objectives

1.1.1 Short term

• to initiate work packages and pilots together with other DC Themes and external parties oriented towards the use of data mining tools in solving practical civil engineering problems;

• to initiate Ph.D. research into specialised data mining and data-driven modelling techniques and tools that will constitute components of the KM platform.

1.1.2 Long term

• to study and select the existing data mining, knowledge discovery and data-driven techniques, and practically apply them to solve practical problems posed by the DC Themes and partners;

• design and develop prototypes of data mining, knowledge discovery and data-driven modelling tools, and to make components of them available on Internet.

• through the practical use in Themes and external partners, to introduce these procedures and tools into the research and engineering practice of DC organisations and external partners, taking into account the possible cultural and sociological aspects;

• to conduct Ph.D. and MSc studies into specialised data mining and data-driven modelling techniques and their practical applications.

1.2 Relationship to the objectives and programme of DC

1.2.1 Relationship to the objectives of DC

DC programme is seen as a support for the strategic long-term research in the denoted areas. It is widely accepted that such research and development activities must be supported by the state-of-the-art computer-based tools, including those supporting knowledge management and modelling

activities. Such modelling systems are seen as electronically encapsulated knowledge. The basisproject Encapsulated knowledge systems is subdivided into two main parts: one oriented at improving the methods and efficiency of the physically-based modelling efforts, and the component oriented at the so-the practical application of the called data-driven modelling and data mining efforts. Data mining and knowledge discovery in datasets, data management is an important part of the knowledge management. Availability of tools, efficient access to advanced data mining and

knowledge discovery tools are of crucial importance for many DC partners. It is important to research the availability of such techniques and tools, analyse their applicability in DC-related problems and on this basis develop the suite of tools to be included to the Knowledge Platform.

To date, most of the DC organisations (WL, IHE, TNO) have carried out applied research in some areas related to specific engineering applications of data mining and data-driven modelling techniques. However, in general, it must be admitted that civil engineering community is lagging behind in acceptance and use of the most advanced and efficient tools, and hence missing vast

benefits of using such tools. Development and wider exploration and acceptance of the generic on-line data-driven modelling (DDM) tools and their connection to the available physically-based models will

(11)

allow to improve the modelling component of the working culture in DC-related areas, to raise the level of research and development and hence is clearly needed.

Overall, advances in data mining and data-driven modelling will considerably contribute to achieving the final goals of the basisproject and DC as a whole.

By attracting the achieved results in the area of data mining and knowledge discovery, by making modelling and data mining components available over the Internet as the part of the Knowledge platform, and by integrating them with other applications in Internet environment, the DC TGs and partners as well as their customers will benefit from:

• more accurate representation and modelling of data-rich processes in civil engineering, which would lead to more effective engineering decisions and resulting commercial benefits;

• adoption of data-driven modelling techniques in work practices as a complement to the more traditional physically-based models;

• sharing knowledge and components encapsulating this knowledge across networked environments, thus reducing development and maintenance costs.

1.2.2 Activities in cooperation with the other Theme groups

Success of this project depends on how other DC themes will use its results. As planned, several activities oriented towards very practical applications of the DDM technologies were run.

TG5 (two projects)

Project 05.02.03 "Subsurface structure prediction (semi-automatic interpretation of CPTs)". Analysis of the datasets related to soil properties together with TG5; involvement of TUDelft and GeoDelft was performed. In this activity it was planned to select, test and demonstrate the use of data mining

techniques in the problems of cone penetration tests (CPT) interpretation, namely semi-automatic analysis of CPTs for the prediction of soil types and parameters. This was done on the basis of

statistic correlation of geological, geomechanical and geohydrological parameters. As a result, a better image of the subsurface structure and properties can be drawn using site investigation results that are already widely available in the Netherlands.

Project 05.02.07 "Sedimentation model for the port of Rotterdam". Data exploratory analysis applied to the sedimentation characteristics of the approach channel of the port of Rotterdam was performed. Most of the effort was given to building a data-driven model of the sedimentation of Rotterdam harbour. This model was based on artificial neural networks.

TG6

Project 06.02.07 "Kennishoudelijk aanvulling Standaard Raamwerk Water": Investigation of various techniques that can be used in "data-rich" automatic calibration routines. This work was conducted together with the partners of Standaard Raamwerk Water (SRW) consortium (TNO-NITG, Alterra, WL, STOWA, RIZA). Various randomised search techniques were investigated (controlled random search, genetic algorithms, etc.) and framework for implementation of one or more techniques in the SRW framework was set up. Reader is referred to the Final report of the project 06.02.07.

All themes involved in the "Open Systems" project (TG1, TG5, TG7). This work was aimed

mainly at establishing common understanding of standards for data exchange between different modelling components developed in Open Systems and this project.

(12)

present an alternative to the latter. Instead of using physical description and relevant (differential) equations, DDM is based on the analysis of historical data sets describing the system and often aims at establishing functional relationships between input and output and at various ways of analysing time series.

DDM draws on the results achieved in computer science and areas such as database systems,

statistics, machine learning, wavelets, non-linear dynamics (chaos theory), data visualisation, pattern recognition, artificial neural networks, fuzzy logic and global optimization (e.g., genetic algorithms). A large set of data analysis methods have been developed in statistics, and machine learning has also contributed significantly to classification and induction problems. Neural networks have shown their effectiveness in classification, prediction, and clustering analysis tasks. Lately, chaos theory allowed to deal with predictions of natural and human-generated processes (like geophysical processes, precipitation etc.) with unprecedented accuracy.

Last decade large organisations and companies (business, marketing, medical companies,

telecommunications, banks, infrastructural companies etc.) started to use actively the advanced data mining, knowledge discovery and data-driven modelling (DDM) tools in their everyday activities. Research in public and private sector in this area is very active. However, it must be admitted that civil engineering community, especially in Europe, is lagging behind in acceptance and use of the most advanced and efficient tools. Development of the generic on-line DDM tools, their wider

exploration, connection to the available physically-based models and wider acceptance is in the line of the DC overall objectives and is urgently needed.

1.3.1 Commercial and technology benefits

The USA is clearly ahead of Europe in this regard, where large organisations and companies

(business, marketing, medical companies, telecommunications, banks, infrastructural companies etc.) already benefit from data mining and the availability of modelling tools on the Web. Fundamental research in data mining research (classification, support vector machines, chaos, neurofuzzy methods) is done by many large corporations, such as IBM and AT&T and by governmental bodies, such as Environmental Protection Agency, with vast benefits for them and their clients.

There are examples known when data mining is used successfully for data-driven modelling and decision support within the fields of water distribution, irrigation, sewage systems and environmental protection:

• convincing applications of using Neural networks and fuzzy systems in hydrodynamic and hydrological modelling, in aerial photos pattern recognition, water levels predictions etc.;

• using pattern recognition algorithms, neural and fuzzy systems in water management and optimal control;

• using non-linear dynamics (chaos theory) in accurate prediction of floods, surge water levels in coastal waters.

It would be of utmost importance to advance these methods and approaches further, to adapt them for the particular needs of civil engineering in the context of Delft Cluster. Attention will be given also to proper relation of data-driven models to physically-based models. The commercial success of such undertakings in other areas of human activities is too convincing.

In the framework of DC the joint work has already started with Directie Noordzee (Rijkswaterstaat) and STOWA and very promising results were achieved. There was a considerable interest expressed also from the Dept. of Tidal Research of Rijkswaterstaat, water boards. Discussion on using data-driven methods in CPT analysis (together with TG5) were conducted with W+B and other companies.

(13)

1.4 Relation to other national and international projects

Main participants of the project have a wide experience in development and application of data mining and data-driven modelling methods, actively publish in peer-reviewed journals and participate in the international conferences. These contacts allowed for being in touch with renowned experts in this field in the world and through them with the related international projects.

There are several opportunities for co-operation with other ICES programmes that are already established or with the organisations that have ambitions in the line of those of the DC members. A number research projects and applications of specific data mining techniques (such as neural networks, genetic algorithms, optimisation techniques, fuzzy logic) to solve particular water-related problems are being applied in Europe and the Netherlands especially.

Therefore, the project had close relations and information exchange with Dutch and EC-funded projects such as ELTRAMOS, LWI, ongoing STW-funded projects in TU Delft and through SNN, projects with the involvement of Rijkswaterstaat and other research projects. Discussions concerning using DDM for ecological problems have been conducted with the representatives of the Wageningen University.

1.5 Results and deliverables

1.5.1 Final Results

Finally, the long term objectives (see above) were achieved. The following final deliverables were produced:

• report, including the research results and practical applications

• working prototypes of tools oriented towards data mining and knowledge discovery, together with the Web-enabled Knowledge Base for the better knowledge dissemination;

• publications in international journals and conferences;

• MSc. theses;

• intermediate results related to the PhD thesis;

• workshops proceedings, newsletters and Web-based information sources for the interested parties.

1.5.2 Project exploitation

The main users of the results will be researchers and practitioners of the mentioned DC projects, and from the external parties. The components developed in this project were integrated into the final Knowledge Base platform, so that the users of the Platform will have access to DMM-related knowledge.

The main users of the project results in industry will be the organisations that have already shown commitment to contribute to the project possible follow-ups: North Sea Directorate (Rijkswaterstaat), STOWA, Gemeentewerken Rotterdam and the Port of Rotterdam.

(14)

• software and Knowledge Base development and integration, including the testing and tuning of available commercial packages;

• research activities in the framework of PhD and MSc programmes. Activities were grouped in the following work packages.

1.6.1 WP1. Identification of problem areas, selection of methods and applied research. • Overview of the existing methods, identification of the problem gaps, selection of adequate

methods

• In-depth research and identification. of civil engineering applications of the promising data-driven methods

1.6.2 WP2. Software and Web development and integration.

It is foreseen that the prototypes of the software components will be developed and a Web-enabled Knowledge Base implemented.

1.6.3 WP Coast. Joint research with TG3.

To investigate the applicability of data mining techniques for making predictions of the behaviour of the coastal line (sand wave) on the basis of yearly measurements of RWS. Preliminary results show the applicability of the proposed methods and are reported in a separate report by IHE participant Mr. Jeng-Fei Li “Using Data-driven Methods in the Investigation of the Dutch Coastal Morphology” available at IHE (abstracts can be found in the Appendix).

1.6.4 WP CPT. Joint research with 05.02.03 "Data clustering to better predict the structure of the subsurface".

To select, test and demonstrate the use of data mining techniques in the problems of cone penetration tests (CPT) interpretation, namely (semi-)automatic analysis of CPTs for the prediction of soil parameters. This will be done on the basis of a statistic correlation and data mining of geological, geomechanical and geohydrological parameters. As a result, a better image of the subsurface structure and properties can be drawn using site investigation results that are already widely available in the Netherlands.

A case studies was the Holocene river and tidal deposits in the Western Netherlands. In case of successful application, to develop a simple prototype tool for automated CPT interpretation. An important objective was to demonstrate the applicability of DM techniques to the intereasted parties in the sector.

1.6.5 WP.Sedimentation. Joint research with 05.02.07 "Sedimentation model for the port of Rotterdam".

To investigate the possibilities of using artificial neural networks in modelling the sedimentation process in the Rotterdam harbour.

1.6.6 WP.Hydrology.

Exploration and application of the data-driven methodologies and techniques in the area of hydrological modeling. There are two MSc reports prepared in the framework of this project:

“Application of data-driven techniques in hydrological modeling” by Khada N. Dulal and “Flexibility and optimality in model tree learning with applications to water-related problems” by M.Baskara Siek (abstracts can be found in the Appendix).

(15)

1.6.7 WP.Calibration. Joint with 06.02.07 " Kennishoudelijk aanvulling Standaard Raamwerk Water".

To identify appropriate automatic calibration techniques to be used in SRW. The results are covered in the Project 06.02.07 Final Report.

1.6.8 WP.DNZ. Joint with Directie Noordzee.

To review the existing methods in the area of neural networks, non-linear dynamics and other data analysis methods that are applicable to the problems of prediction of the water levels and currents. Apart from the section in this report covering this topic, there are two separate IHE MSc reports by Carlos Rojas Pupo “Application of Data Mining Techniques and Chaos Theory for Surge Water Level Predictions (Hook of Holland Case Study)” and by Mr. Shaakeel Hasan “Exploratory analysis of tidal current in the Maas channel” available (abstracts can be found in the Appendix).

1.6.9 WP.STOWA. Joint with STOWA.

The existing methods in the area of neural networks and fuzzy logic that are applicable to the problems of water management were reviewed. Two cases studies ere undertaken (Overwaard and Salland). The results are covered in the separate STOWA report that is distributed to the STOWA members.

(16)

2 Data-driven modelling: paradigm, methods, experiences

Abstract: Management and control of water resources is normally based on behaviour-driven, or

physically-based models based on equations describing the behaviour of water bodies. Since recently models built on the basis of large amounts of collected data are gaining popularity. This modelling approach can be called data-driven modelling; it borrows methods from various areas related to computational intelligence - machine learning, data mining, soft computing etc. The chapter gives an overview of successful applications of several data-driven techniques in the problems of water resources management and control. Conclusions are drawn on the applicability of the mentioned methods and the future role of computational intelligence in modelling in civil engineering and water resources.

2.1 Modelling: knowledge of processes and data

A model can be defined as a simplified representation of reality with an objective of its explanation or prediction. Modelling includes the studying the system, posing the problem, data collection and preparation, building the model, testing it, using it, interpreting the results, and, possibly, reiterating. In engineering, the term model is used traditionally in one of two senses:

• (a) a mathematical model based on the description of behaviour (often physics, or first-order principles) of a phenomenon or system under study, referred later as behavioural (also process, or physically-based) models;

• (b) a model built of material components or objects, which is often referred as scale (or physical) model (since it is usually smaller than the real system).

These views at a model are widely adopted and taught. Understandingly, in social and economical studies scale modelling would be a difficult undertaking, but behavioural models based on

mathematical descriptions of processes are widely spread and used.

Traditionally, management and control of water resources was based on a good understanding of the underlying processes and used the so-called physically-based (or knowledge-driven, behavioural,

process, simulation) models. These could be for example, models based on Navier-Stokes equation

describing behaviour of water in particular circumstances.

Another approach is based on the analysis of all the data characterising the system under study. A model can then be defined on the basis of connections between the system state variables (input, internal and output variables) with only a limited knowledge of the details about the "physical" behaviour of the system. Statistical models, like a linear regression model, follow this approach. Such models can be called data-driven models. During the last decade, due to the availability of data, such models became quite popular. The most popular technique by far is an artificial neural network (ANN). But it is not the only one.

2.2 Data mining, knowledge discovery, neural networks, machine learning,

computational intelligence: is it all the same?

The several terms (expressions) in the title compete to name the same interdisciplinary area. It is difficult, if not impossible, to accommodate in a formal definition disparate areas with their own established individualities such as fuzzy sets, neural networks, evolutionary computation, machine learning, Bayesian reasoning, etc. Following a good academic tradition, an individual or a group of researchers often identifies an area which is slightly different from an already existing one, introduces terminology, organises a new conference, a journal, professorship positions, school of thought, etc. This is what was happening with areas close to artificial intelligence (AI) during the last two decades. Data mining (DM), knowledge discovery in databases (KDD), computational intelligence (CI),

(17)

machine learning (ML), intelligent data analysis (IDA), soft computing, pattern recognition – all these areas very much intersecting, with a similar focus and application areas. It is really difficult to find a clear-cut difference between them. Still certain differences can be formulated:

• CI is seen as “a new name” for AI embodying all other areas;

• ML is an area of computer science, a sub-area of AI concentrating on the theoretical foundations. Classification (pattern recognition) problems are addressed by ML more often than regression (numerical prediction) problems. Technically speaking, most of ML problems can be formulated as problems of function approximation.

• DM and KDD are focused often at very large databases and are associated with applications in banking, financial services and customer resources management (CRM). DM is seen as a part of a wider KDD. Methods used are mainly from statistics and ML.

• IDA is relatively new and seem to concentrate more on the data analysis in medicine and research. Methods used are also from statistics and ML;

• soft computing, and in particular fuzzy rule-base systems induced from data.

In this connection we see the data-driven modelling (DDM) as an approach to modelling that focuses on using the ML methods in building models (often of natural systems) that would complement or replace the “knowledge-driven” models describing behaviour of physical systems. DDM uses methods developed in the fields mentioned above and tunes them to particular application areas.

2.3 Machine learning is the basis for data-driven modelling

We see the machine learning as the main source of methods for data driven modelling (Fig. 1). A

machine learning method is an algorithm that estimates hitherto unknown mapping (or dependency)

between a system's inputs and its outputs from the available data (Mitchell 1998). By data we understand the known samples that are combinations of inputs and corresponding outputs. As such a dependency is discovered, it can be used to predict (or effectively deduce) the future system's outputs from the known input values.

There are four main styles of learning considered:

• classification - on the basis of classified examples, a way of classifying unseen examples is to be found;

• association - association between features (which combinations of values are most frequent) is to be identified;

• clustering - groups of objects (examples) that are "close" are to be identified

• numeric prediction (regression) - outcome is not a class, but a numeric (real) value.

The oldest area of estimating dependencies from data is statistics, as represented by multivariate regression and classification. In the 60s and 70s, new techniques which were often not based on the assumptions of "well-behaved" statistical distributions of random processes started to emerge, and these were used in many successful applications. Among these techniques were: pattern recognition and cluster analysis, methods trying to imitate the human brain and perception like neural networks

Input data Input data Modelled Modelled (real) (real) system system X X Actual (observed) Actual (observed) output Y output Y Data-driven Data-driven model model

Predicted output Y’ Predicted output Y’

Learning is aimed Learning is aimed at minimizing this at minimizing this difference difference

(18)

In statistics the following four types of data are considered: nominal, ordinal, interval and ratio (real-valued). In machine learning, for simplicity, we often speak only of two data types: nominal (classes) and real-valued.

2.4 Models driven by nominal data: cluster and classify

Classification is treated often as finding classes of data points {ai}

∈

Rn . Classes must be such that

points in a class are close to each other in some sense, and classes are far from each other. Clustering is finding groups (subsets) of data without assigning them to particular classes. The resulting model is a mapping from the space of input data to classes or groups.

Among the most important methods currently used the following can be mentioned: partition-based clustering (K-means, fuzzy C-means, based on Euclidean distance); density-base spatial clustering DBScan (for clusters of arbitrary shapes); SOF maps (Kohonen neural networks) clustering; Bayesian classification; decision trees classification (Quinlan 1992, Witten & Frank 2000); support vector machines (SVM) classification (Vapnik 1998).

A number of examples of using classification and clustering in water management and control were reported:

• Hall et al. (2000) used SOFM for classifying catchments into groups based on their 12 characterisitics, and then applying ANN to model the regional flood frequency.

• Hannah et al. (2000) used clustering for finding groups of hydrographs on the basis of their shape and magnitude; clusters are then used for classification by experts.

• Harris et al. (2000) applied clustering to identify the classes of river regimes.

• Frapporti et al. (1993) used the method of fuzzy c-means clustering in the problem of classifying shallow Dutch groundwater sites into homogeneous groups.

• The use of fuzzy classification in the problem of soil classification on the basis of cone penetration tests (CPTs) is addressed by Zhang et al. (1999).

Our experience of using classification methods includes the following:

• using self-organizing feature maps (Kohonen neural networks) as clustering methods, and SVM as classification method in aerial photos interpretation. In this application various classification methods were used in the problem of interpreting a aerial photo of the size of 4387x2786 pixels. Four land cover classes were identified – wood and scrub, agriculture, meadow, and urban area (Velickov, Solomatine, Yu and Price, 2000);

• using decision trees in classifying surge water levels in the coastal zone depending on the hydrometeorological data (Solomatine, Velickov, Rojas and Wust, 2000);

• classification of the river flow levels according to their severity in the problem of flood control.

2.5 Models driven by real-valued data: combine simple functions

Most engineering problems are formulated with the view of real-valued data. The problem of

prediction of real-valued variables is also called a regression problem. Since machine learning aims at finding a function that would best approximate some given function, it can be seen also as a problem of function fitting. and this prompts for the use of the corresponding methods already available like linear regression, polynomial functions in splines or orthogonal polynomial functions, e.g.,

Chebyshev polynomials. Current trend in data-driven modelling however is in combining many simple functions.

Radial basis functions (RBF) could be seen as a sensible alternative to the use of complex

polynomials. Consider a function z = f(x), where x is a vector {x1,..., xI} in I-dimensional space. The

idea is approximate a function z = f(x) by another function F(x) in a proximity to some

“representative” locations (centers) wj ,j=1,...,J Finding the position of centers wj and the “height

parameter” of the functions F(x) can be done by building a radial-basis function neural network; its training allows to identify the unknown parameters.

(19)

Multilayer perceptron (MLP) is another type of ANN. It has been mathematically proven that

adding up simple functions, as an ANN does, allows for universal approximation of functions (Kolmogorov 1957). After mid-1980 the methods for finding these functions (training an MLP) were found, and this made MLP the most popular machine learning of today. Various types of ANNs are widely used in numerical prediction and in classification.

M5 Model trees (regression splines). Decision trees, widely used in classification problems, can be

generalised to regression trees and model trees that can deal with continuous attributes. Trees-structured regression is built on the assumption that the functional dependency is not constant in the whole domain, but can be approximate as such on smaller subdomains. For the continuous variables, these subdomains then can be searched for and characterized with some “local” model. Depending on the nature of such model, there are several types of trees used for numerical prediction:

• if the local model gives an average value of the instances for this local scope, then the overall approach is called a regression tree. Regression trees were introduced in the CART system of Breiman et al. (1984).

• if a local model is a linear regression function of the input variables then the overall approach is called a model tree. There are two (similar) approaches known: (a) M5 models trees (Quinlan 1992) implemented in Cubist software (www.rulequest.com) and, with some changes, Weka software (Witten and Frank 2000), and (b) approach by Friedman (1991) in his MARS (multiple

adaptive regression splines) algorithm implemented as MARS software (www.salford-software.com).

The construction a model trees is similar to that used in construction of decision trees, although the splitting criterion is different. Each leave then represents a local model and in principle could be (locally) more accurate than a global model (even a non-linear one, e.g., a neural network) trained on the whole data set. The M5 model trees splitting criterion is SDR (standard deviation reduction) which is used to determine which attribute is the best to split the portion of the training data that reaches a particular node.

The linear regression method is based on an assumption of linear dependencies between input and output. In M5 model tree a step towards non-linearity is made – since it builds a model that is locally linear, but overall is non-linear. In fact M5 tree is a modular model – it consists of modules that are responsible for modelling particular subspace of the input space. Model trees may serve as an alternative to ANNs (which are global models), are often almost as accurate as ANNs and have important advantages:

• training of MT is much faster than ANN, and it always converges;

• the results can be easily understood by decision makers;

• by applying pruning (that is making trees smaller by combining subtrees in one node) it is possible to generate a range of MTs – from an inaccurate but simple linear regression (one leave only) to a much more accurate but complex combination of local models (many branches and leaves).

In order to compare the performance of ANNs and M5 model trees, both were used in a flow prediction problem. Hourly data on rainfall and flow in a catchment for 3 months were available. Training set included 1854, and the verification set – 300 instances. The problem was to predict the discharge value Qt+1 for the next hour (and also Qt+3 for 3 hours ahead). Analysis of the catchment

and the mutual dependencies between variables allowed for selecting the following input variables for the 1-hour prediction: effective rainfall (RE) for times t, t-1, t-2, t-3, t-4, t-5, and discharges Q at times t, t-1, t-2, t-3. An example of a pruned (reduced) model tree (to 3 rules) from the total of 16 rules is shown below.

(20)

LM1: Qt+1 = 0.0388 + 0.0108REt + 0.0535REt-1 + 0.0173REt-2 + 0.0346REt-3 + 1.01Qt - 0.0127Qt-1 + 0.00311Qt-2 LM2: Qt+1 = -0.221 + 0.0108REt + 1.68REt-1 + 0.0626REt-2 + 7.3REt-3 + 1Qt - 0.0127Qt-1 + 0.00311Qt-2

LM3: Qt+1 = 3.04 + 2.46REt + 4.97REt-1 - 0.04REt-2 + 1.75Qt - 1.08Qt-1 + 0.265Qt-2

It can be seen that this actually three linear models were built – for low, medium and high flows. Analysis of the influence of previous flows becomes easy. This M5 model had RMSE on a verification set equal to 3.6 m3_{/s which is lower than that of the ANN being 5.3 m}3_/s.

Data-driven methods, especially neural networks, know dozens of successful applications in the water sector. For example, the use of ANNs to model the rainfall-runoff process is addressed in the works of Hsu et al. (1995), Minns & Hall (1996), Abrahart and See (2000), and in a collection of papers edited by Govindaraju and Ramachandra Rao (2000).

Our experience of using machine learning methods in real-valued prediction includes:

• replicating behaviour of hydrodynamic/hydrological model of a river basin where ANN is used in model-based optimal control of a reservoir (Solomatine & Torres, 1996);

• modelling a channel network using ANN (Price et al. 1998);

• building ANN-based intelligent controller for real-time control of water levels in a polder (Lobbrecht & Solomatine, 1999);

• modelling rainfall-runoff process with ANNs (Dibike et al., 1999);

• surge water level prediction in the problem of ship guidance using ANN;

• modelling stage-discharge relationship with ANN (Bhattacharya & Solomatine, 2000);

• using M5 model trees to predict discharge in a river;

• using SVMs in prediction of water flows for flood management (Dibike et al., 2001).

Chaos theory (formulated by Lorentz in 1963) appeared to be an excellent predictive tool for time

seria. The model built uses only the time series itself, without the use of other related variables, so it is applicable when the time series carries enough information about the behaviour of the system. The predictive capacity of chaos theory, based on an idea that the system behaves in the future in a similar manner as in the past, supersedes the linear models like ARIMA. We used chaos theory to predict the surge water level in the North Sea close to Hook of Holland; the data set included measurements of surge for 5 years with the 10-minute interval. For two-hours prediction the error was as low as 10 cm and superseded the methods used earlier (Solomatine et al. 2000).

Fuzzy rule-based systems. Fuzzy rule-based systems (FRBS) can be built by interrogating humans,

or by processing the historical data. The basics of the latter approach is described in the books of Bardossy and Duckstein (1995), and Kosko (1997). Applications of FRBS in water resources can be found in Bardossy et al. (1995) and many others.

Our experience in using fuzzy systems includes:

• prediction of precipitation events (Abebe et al., 1999);

• analysis of groundwater model uncertainty (Abebe et al., 2000);

• control of water levels in polder areas (Lobbrecht and Solomatine (1999).

2.6 Conclusion

Data-driven methods (in other words, methods of machine learning and data mining) have proven their applicability in many areas, including financial sector, customer resource management,

engineering, etc. Our experience shows their applicability to a wide range of problems associated with management of water resources. Normally a particular domain area will benefit from data-driven modeling if: there is a considerable amount of data available; there are no considerable changes to the modeled system during the period covered by modeling; it is difficult to build knowledge-driven

(21)

simulation models, or in particular cases they are not adequate enough; there is a necessity to validate the results of simulation models with other types of models.

Successful analysis and prediction should be always based on the use of various types of models. For example, our experience shows that M5 model trees, combining local and global properties, could be close in accuracy or supersede ANNs (being global, that is trained on the whole data set), and are easier accepted by decision makers due to their simplicity.

The future is seen in using the hybrid models combining models of different types and following different modeling paradigms, including the combination with physically-based models . It can be foreseen that the computational intelligence (machine learning) will be used not only for building data-driven models, but also for building optimal and adaptive model structures of such hybrid models.

2.7 References

A. J. Abebe, D.P. Solomatine & R. Venneker (1999). Application of adaptive fuzzy rule-based models for reconstruction of missing precipitation events. Hydrological Sciences Journal, vol 45.

A.J. Abebe V. Guinot & D.P. Solomatine (2000). Fuzzy alpha-cut vs. Monte Carlo techniques in assessing uncertainty in model parameters. Proc. 4th Int. Conference on Hydroinformatics, USA. Abrahart R.J., See L.(2000). Comparing neural network and autoregressive moving average

techniques for the provision of continuous river flow forecast in two contrasting catchments.

Hydrological processes, 14, pp. 2157-2172.

B.Bhattacharya and D.P.Solomatine (2000). Application of artificial neural network in stage-discharge relationship, Proc. 4th Int. Conference on Hydroinformatics, USA.

B. Bhattacharya, A.H. Lobbrecht & D.P. Solomatine (2002). Control of water levels of regional water systems using reinforcement learning. Proc. 5th_{Int. Conference on Hydroinformatics, Cardiff, UK.}

Y. Dibike, D.P. Solomatine & M.B. Abbott (1999). On the encapsulation of numerical-hydraulic models in artificial neural network. Journal of Hydraulic Research, 37, No. 2, 147-161.

Dibike Y.B., Velickov S., Solomatine D.P. and Abbott M.B. (2001). Model induction with support vector machines: introduction and applications. ASCE J. of Computing in Civil Engineering, 15 (3), 208-216.

R.S. Govindaraju and A. Ramachandra Rao, eds. (2001). Artificial neural networks in hydrology. Kluwer, Dordrecht.

Hall, M. J. and Minns, A. W. (1999). The classification of hydrologically homogeneous regions.

Hydrol. Sci. J., 44, 693-704.

Hsu, K.L., Gupta, H.V. and Sorooshian S. (1995). Artificial neural network modelling of the rainfall-runoff process. Water Resources Research, 31 (10), 2517-2530.

A.N. Kolmogorov (1957). On the representation of continuous functions of several variables by superposition of continuous functions of one variable and addition. Doklady Akademii Nauk SSSR,

114, 953-956.

B. Kosko (1997). Fuzzy engineering. Prentice-Hall.

A.H. Lobbrecht & D.P. Solomatine (1999). Control of water levels in polder areas using neural networks and fuzzy adaptive systems. In Water Industry Systems: Modelling and Optimization

Applications. D. Savic, G. Walters (Eds). Research Studies Press Ltd., pp. 509-518.

T.M. Mitchell (1998). Machine learning. McGraw-Hill.

R.K. Price, J. Samedov & D.P. Solomatine (1998). Network modelling using artificial neural networks. Proc. 3rd_{Int. Conference on Hydroinformatics, Balkema, Rotterdam.}

J.R. Quinlan (1992). C4.5: program for machine learning, Morgan Kaufmann.

D.P. Solomatine & L.A. Torres (1996). Neural network approximation of a hydrodynamic model in optimizing reservoir operation. Proc. 2nd Int. Conference on Hydroinformatics, Zurich, 201-206.

(22)

S. Velickov, D.P. Solomatine, X. Yu & R.K. Price (2000). Application of Data Mining Techniques for Remote Sensing Image Analysis. Proc. 4th Int. Conference on Hydroinformatics, USA. S. Velickov & D.P. Solomatine (2000). Predictive data mining: practical examples. 2nd Workshop on

AI methods in Civil Engineering, Cottbus, March.

C. Watkins & P. Dayan (1992). Q-learning, Machine Learning, 3 (8), 279-292. I.H. Witten & E. Frank (2000). Data mining. Morgan Kaufmann Publishers.

(23)

3 Sedimentation modelling for the port area of Rotterdam

Abstract: The DC projects 07.05.02 and 05.02.07 had an overlapping interest area in predicting

the sedimentation in the harbour basins of the Rotterdam Port. Sedimentation in shipping channels is influenced by several factors such as upland discharge, grain-size distribution of sediments, meteorological conditions, environmental (including biological) causes, dredging, training measures etc. To safeguard navigation and provide a stable navigable depth port authorities throughout the world resort to dredging. A perfect planning of dredging only can be mapped from a reasonable estimate of sedimentation taking into account all the important causes that affect it. In the above-mentioned two DC projects the sedimentation problem in the approach channel of the Port of Rotterdam has been investigated. The factors affecting sedimentation have been studied, their time series data was analysed, missing data was filled in with artificial neural networks (ANN) and finally, the salient features of the causes those affect sedimentation were

parameterised. These features were used in encapsulating the existing knowledge of the physical process in data-driven models such as an ANN and an M5 model tree. The data-driven mdoels were used for predicting sedimentation in the approach channel of the port for a period of about three years and promising results were obtained.

The project work was carried out in close collaboration with the DC project 05.02.07. The members of the project team are:

Name

Organization

M. Maurenbrecher

Delft University of Technology

A.K. Turner

Delft University of Technology

R. Vuurens

Delft University of Technology

S. Karstens

Geodelft

I. Deibel

Port of Rotterdam

R. Bol (in September 2001

replaced by A. de Gelder)

Rijkswaterstaat, Directie Noordzee

H. van der Gouwe

Rijkswaterstaat, Directie Zuid-Holland

B. Bhattacharya

UNESCO-IHE Institute for Water

Education

D.P. Solomatine

UNESCO-IHE Institute for Water

Education

G. Kant

WL| Delft Hydraulics

(24)

3.1 Introduction

Sedimentation in a channel is caused by various reasons those are influenced by several factors such as upland discharge, grain-size distribution of sediments, meteorological causes, environmental (including biological) causes, human intervention (in the form of dredging and training measures) etc. The quantity and type of sediments brought by upland flow are characteristics of the catchment and depend on the soil composition, degree of afforestation, climate and many other factors.

Meteorological extreme conditions cause movement of large amount of sediment in coastal areas. Tides bring sediments from the sea. The quantity and type of such sediments also depend on the location, strength of astronomical tide, upland flow etc. Hydrodynamic conditions may cause erosion of channel bed and banks which contribute to sedimentation at other places. Human intervention comes in the form of dredging, disposal of dredge spoil, training measures, shipping etc. Dredging causes artificial deepening of channels which act as silt traps. Recirculation from points of disposal of dredge spoil to main channels is often a cause of sedimentation. Training measures to guide the flow can cause flow concentration at some places at the cost of reduced flow at other places. The former causes erosion and the latter affects sedimentation. Movements of ships generate waves which too affect sedimentation.

Assessment of sedimentation in the bars and bends of shipping channels leading to a port, or in general in the rivers, are carried out frequently. Such a morphological assessment is not an easy task due to the complexity of the problem. Part of the complexity is natural, and part is due to human intervention. Because of these complexities a satisfactory estimation of sedimentation is an extremely difficult job. On the other hand, prediction of sedimentation in a channel is very important for

forecasting draughts and planning shipping and dredging activities. To safeguard navigation and provide a stable navigable depth port authorities throughout the world resort to dredging. A perfect planning of dredging only can be mapped from a reasonable estimate of sedimentation taking into account all the important causes that affect it.

Europlatform

Approach channel

Figure 1: Location plan showing the harbour basin at Maasmond and the wave and wind measuring stations

(25)

The Port of Rotterdam, which stands as the first among all ports in terms of tonnage handled, has its approach channel through Maasmond (Fig. 1) where the navigable depth is 24 m. Sedimentation in the Maasmond area depends upon several reasons mentioned above and have a high degree of variability. The sedimentation rate in the Maasmond area is governed by the availability of (suspended) sediment and the transport processes. The availability of the sediment has strong stochastic behaviour driven by erratic meteorological effects: sediment is mobilised by wave action and carried by river floods. The transport of the suspended sediment towards the areas prone to siltation has important deterministic components (tide, density currents, etc), but also stochastic components because of the meteorological effects. Due to the complex nature of the problem, developing a deterministic model for assessing sedimentation in the Maasmond area is an extremely difficult task.

On the other hand many water engineering problems have been successfully modelled with data-driven modelling techniques such as artificial neural networks (ANN). Examples include rainfall-runoff modelling (Minns and Hall, 1996), stage-discharge relationship (Bhattacharya and Solomatine, 2000, 2002), hydrodynamic modelling (Solomatine and Torres, 1996), water control problems

(Bhattacharya, Lobbrecht and Solomatine, 2002) etc. Based on these success stories it was decided to investigate the applicability of data-driven modelling approaches in modelling the sedimentation problem.

The project work was based on experiences with previous numerical models, and with knowledge of data-driven modelling and their successful application to resolving complex, non-linear response systems. In particular, the basic premise of this project developed from a previous research conducted within the H-Sense project that used ANN techniques to predict sediment distributions in Swedish harbours (Rosenbaum, 2000).

3.1.1 The objectives set for the project:

to predict the sedimentation in harbour basins and to try to find a relation with possible causes of

sedimentation.

A better prediction of the sedimentation may result in savings on dredging volumes/costs, or survey costs, or both. Savings may not necessarily result from less dredging or surveys but by making more effective use of available resources.

The research objectives were set as the following:

1. Quantification of significant relations between one or more variables of the selected port areas and the sedimentation;

2. Evaluation and selection of important parameters of influence on the sedimentation process; 3. Application of data-driven modelling methods to construct a predictive model;

4. Prediction of the accretion of the dredging material and the composition of the dredging material (sand/ sludge);

5. Evaluation of the existing data collection and data management procedures and techniques.

3.2 Sedimentation characteristics in the port area of Rotterdam

The Port of Rotterdam is situated on the bank of river Rhine. River Rhine discharges in the North Sea and the area near to its confluence is known as Maasmond which is studied in the present research (Fig. 1). The sedimentation characteristics in the Maasmond area largely depend upon the

sedimentation and meteorological conditions of the North Sea. Almost the entire amount of sediment that is deposited in the Maasmond area comes from the North Sea. It is important therefore to understand the sedimentological characteristics of the North Sea.

(26)

Dover Strait. Rivers add sediments brought from their catchments. Dredged material from the bars along the shipping channels of the Netherlands, United Kingdom, Belgium, France and Germany are also disposed off in the North Sea. Portions of these sediments are settled in several sedimentation traps during fair weather season and act as a sediment source during rough weathers. Several researchers (Eisma and Irion, 1988; McManus and Prandle, 1997) have attempted to quantify sediments coming to the North Sea from these sources and a glimpse of that is provided in Table-1.

Table 1: Estimated sediment influx (in million ton/year) to the North Sea from different sources (Eisma and Irion, 1988)

Dover Strait

Atlantic Ocean and Baltic Sea

Coastal erosion Bed erosion River Total 20 to 30 10.5 2.2 9 to 13.5 4.8 46.5 to 61.0

Around 80% of the sediment volume that accumulates in the Maasmond area has its origin in the British Channel and the French coast (Vuurens, 2001). In this region enormous amounts of material become available every year due to coastal erosion. Along with the northerly long-shore currents, this material is transported along the French, Belgian and Dutch coastlines into the North Sea basin. On its way northward the material encounters several ‘sedimentation traps’ some of which are listed below (Fig. 2):

• The first major trap is located offshore at Zeebrugge (Belgium) at the mouth of the Western Scheldt River. Sediment settles in the shallow waters of this area and can either stay there or be eroded again, and taken further north.

• The second entrapment area is at the Eastern Scheldt mouth. West of the storm surge barrier a system of sandbars is present and the water is shallow.

Longshore current

Maasmond area Haringvliet sluices Eastern Scheldt outlet Western Scheldt outlet

Sediment from French coast

Sediment from Rhine river

Sediment traps

UK

Belgium

France

Figure 2. Location map showing the sedimentation traps

• The third major entrapment area is the Haringvliet outlet. This outlet discharges a major volume of the water from Rhine and Meuse into the North Sea. No water is allowed to enter the sluices from the seaside, so only river-discharge is allowed. Especially in relatively quiet summer conditions large amounts of silt are deposited in this area (to be eroded again in winter storms).

(27)

• The fourth major entrapment area is the Maasmond, the area of our investigation. A deep (24 M) harbour entrance channel is present to allow passage of large ships to the Port of Rotterdam. The whole channel acts as a sediment trap and has to be dredged continuously to maintain the present

nautical depth of 24 M. The Maasmond is the second outlet of the Rhine and Meuse River. The

amount of sediment from sea (mainly sands) exceeds the amount of river sediments (mainly clays and silts).

• A fifth major sediment trap further north of Maasmond is the Wadden Sea along the Dutch and German coast.

Sediments in the Maasmond is mainly marine with some fluvial sediments. The marine sediments originate from erosion of downstream coastal areas whereas fluvial sediments originate from the Alps and the low-altitude mountain ranges in Northern France and Belgium. The bed material in the Maasmond is silty-sand with a D50 of 0.055 mm. Landward side of Maasmond contain more silt than

the sea-side.

The tide in the North Sea is semi-diurnal. The amplitude of the M2 component is much larger than the other components such as N2 and S2. The tide plays a significant role in the movement of water and

sediment in the North Sea. During the first 2 to 2.5 hours of rising tides, the predominant flow is towards the South. During the first 2 to 2.5 hours of falling tides the flow is more or less Northern. Residual flows generated by the tide are anti-clockwise throughout the North Sea. Magnitude of the residual flow can be up to a few cm/s except near the surface where it is affected by wind and stratification increasing its value substantially.

The discharge of fresh water from river Rhine creates sediment-driven density currents. A freshwater body known as Coastal River of a width of about 10 to 20 km is formed along the Dutch coast due to the density currents and Coriolis force (WL | Delft Hydraulics, 2001). Within the Coastal River gravitational circulation perpendicular to the coast are generated, which result in an accumulation of fine-grained sediments in the coastal zone with significant higher sediment concentrations near the coast than that further offshore. Due to density currents sediments are drawn near to the coast resulting in an increase in suspended sediment concentration. The Coastal River exhibits a strong vertical fresh-saline water induced stratification.

Tidal currents, wind-induced currents and wave-induced stirring govern the sediment transport in the North Sea. Transport of this sediment to the Maasmond area is additionally influenced by the Rhine discharge. 95% of the sediments that is deposited in the Maasmond area comes from the North Sea, the contribution of sediments carried by Rhine is limited to only 5% (de Kok, 2002). At further upstream areas (such as Botlek) the portion of sediments carried by the North Sea decreases. The North Sea bed merely acts as the buffer zone for the sediments those are deposited in the Maasmond. Sediments carried from the downstream coastal erosion are deposited in the North Sea bed to be eroded later.

Biological activity can initiate flocculation which can increase the settling velocity and thus affects the sediment dynamics. This is particularly important for deeper water which is indeed the case for the Maasmond. During the tranquil periods of summer flora and fauna cause consolidation of the

deposited sediment over the sea bed raising the impermeability substantially. This causes easier erosion of sediment during the early summer than during the early winter.

Winterwerp et al (1998) examined the causes of rapid siltation in harbour basins. With simulation studies they argued that during the rough weather conditions the suspended sediment concentration (often > 10 mg/litre) reaches its saturation limits (for most areas of the coast). During slack water a thin, temporal layer of fluid mud is formed which is entrained rapidly during accelerating tides. These

(28)

concentration profile collapse, forming a fluid mud layer. Sediment-driven flow velocities along the approach channel carries this collapsed mud layer to the harbour basins causing serious

sedimentation.

3.3 Time history of sedimentation in the Maasmond area

The Maasmond area is traditionally schematised by the port authorities as consisting of two boxes, E and F (see Fig. 3) which are regularly surveyed by the Directorate of Public Works and Water Management- South-Holland Directorate. An echosounder with a single beam transducer with a 210 KHz frequency is used. The navigable depth in the Maasmond is defined as the level where the density is equal to 1.2x103_kg/m3_{. Above this level, where densities are lower, ships are able to}

manoeuvre without difficulties. Usual frequency of hydrographic survey is 2-3 weeks in winter and 3-8 weeks in summer. The source data of the hydrographic survey has been used to create raster maps of cell size 5m x 5m in an IDRISI32 GIS (for details see Vuurens, 2001).

Accumulation between successive surveys is calculated with IDRSI32 GIS and it provides the volumetric changes between successive surveys. Merckelbach (1995) used some bed samples from the Maasmond area to arrive at the bulk density of bed material below the 1.2x103_kg/m3_density

level, considering maximum attainable depth as 26 m and suggested using 1.213948x103_kg/m3_{as the}

average density. This density value has been used in converting the volumetric changes between successive surveys to mass. Using a fixed bulk density perhaps brings an error in the computed mass and gives a room for thought to tackle the variation of bulk density with space and time.

Dredging is carried out in the Maasmond area using trailing suction hopper dredgers by dredging contractors who have the responsibility to maintain a navigable depth of 24 m in the Maasmond area. Dredging is planned as per the hydrographic surveys. Dredged spoil is disposed off in the North Sea at a location Northwest of Hoek van Holland. The dredged quantities are measured in tons of dry solid (TDS) since 1992. Prior to 1992, the measurements were in m3_{. Merckelbach (1995) used several}

samples of dredged spoils to determine its bulk density which he found to be 1594 kg/m3 _{(for box E).}

This density has been used in the present research to find the mass of dredged quantity from

volumetric measurements for the year 1991. Annual quantity of dredging varies a lot, depending upon the sedimentation in that year and on an average matches the sedimentation quantity. During the period 1991-2000 the mean, maximum, minimum and standard deviation of dredging was 1.65, 2.40, 0.73 and 0.52 million ton.

Box F

Box E

Hoek van Holland

Maasvlakte

Figure 3: Box E and F of Maasmond

Deposition/erosion of sediments due to hydrodynamic reasons and transport of sediments from nearby vicinity causes accumulation/erosion. Accumulated sediments are removed through dredging to

Data mining, knowledge discovery and data-driven modelling

Data mining, knowledge discovery

and data-driven modelling

D.P. Solomatine

IHE Delft

S. Velickov

IHE Delft

B. Bhattacharya

IHE Delft

Bas van der Wal

STOWA

June 2003

Abstract

Executive Summary

Applicability for the sector

Societal Relevance of the research

Table of contents

1 Basic information about the project

1.1 Project objectives

1.2 Relationship to the objectives and programme of DC

1.4 Relation to other national and international projects

1.5 Results and deliverables

2 Data-driven modelling: paradigm, methods, experiences

2.1 Modelling: knowledge of processes and data

2.2 Data mining, knowledge discovery, neural networks, machine learning,

computational intelligence: is it all the same?

2.3 Machine learning is the basis for data-driven modelling

2.4 Models driven by nominal data: cluster and classify

∈

2.5 Models driven by real-valued data: combine simple functions

2.6 Conclusion

2.7 References

3 Sedimentation modelling for the port area of Rotterdam

Name

Organization

M. Maurenbrecher

Delft University of Technology

A.K. Turner

Delft University of Technology

R. Vuurens

Delft University of Technology

S. Karstens

Geodelft

I. Deibel

Port of Rotterdam

R. Bol (in September 2001

replaced by A. de Gelder)

Rijkswaterstaat, Directie Noordzee

H. van der Gouwe

Rijkswaterstaat, Directie Zuid-Holland

B. Bhattacharya

UNESCO-IHE Institute for Water

Education

D.P. Solomatine

UNESCO-IHE Institute for Water

Education

G. Kant

WL| Delft Hydraulics

3.1 Introduction

3.2 Sedimentation characteristics in the port area of Rotterdam

UK

Belgium

France

3.3 Time history of sedimentation in the Maasmond area

_{S. Velickov}

_{IHE Delft}

_{Bas van der Wal}

_STOWA