Advanced GPR data processing algorithms for detection of anti-personnel landmines

(1)

(2)

(3)

Advanced GPR Data Processing Algorithms for

Detection of Anti-Personnel Landmines

Proefschrift

ter verkrijging van de graad van doctor aan de Technische Universiteit Delft,

op gezag van de Rector Magnificus prof. dr. ir. J. T. Fokkema voorzitter van het College voor Promoties,

in het openbaar te verdedigen op donderdag 7 december 2006 om 12:30 uur

door

Vsevolod KOVALENKO

Specialist (Master of Sciences) in Applied Mathematics Karazin Kharkiv National University, Oekraïne

(4)

Dit proefschrift is goedgekeurd door de promotoren: Prof. dr. ir. L. P. Ligthart

Prof. dr. sci. O. G. Yarovyi

Samenstelling promotiecommissie:

Rector Magnificus voorzitter

Prof. dr. ir. L. P. Ligthart Technische Universiteit Delft, promotor Prof. dr. sci. O. G. Yarovyi Karazin Kharkiv National University, promotor Prof. ir. P. van Genderen Technische Universiteit Delft

Prof. dr. ir. A.P.M. Zwamborn Technische Universiteit Eindhoven Prof. dr. M. Sato Tohoku University

Prof. dr. J.-Y. Dauvignac Universite de Nice-Sophia Antipolis

Prof. dr. sci. V.V. Sazonov Moscow Institute for Physics and Technology Prof. dr. ir. M. H. G. Verhaegen Technische Universiteit Delft

This research has been supported by the Technology Foundation STW, applied science division of the Dutch Organization for Scientific Research NWO.

ISBN-10: 90-76928-11-8 ISBN-13: 978-90-76928-11-1

Keywords: ground penetrating radar data processing, antipersonnel mines detec-tion

All rights reserved. No part of the material protected by this copyright notice may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying, recording or by any information storage and retrieval system, without written permission from the author.

(5)

(6)

(7)

CONTENTS

CHAPTER 1 INTRODUCTION... 1

1.1 GPR TECHNOLOGY FOR MINE DETECTION... 2

1.1.1 Hand-held devices... 2

1.1.2 Vehicle-mounted devices... 3

1.1.3 IRCTR Scanner-mounted GPR ... 3

1.1.4 The main challenge in mine detection with GPR ... 5

1.2 DATA PROCESSING ALGORITHMS... 5

1.2.1 Clutter suppression algorithms... 6

1.2.2 Focusing algorithms... 7

1.2.3 Feature based target detection algorithms... 8

1.2.4 Feature Fusion ... 9

1.3 CONCLUSION FOR CHAPTER 1... 10

CHAPTER 2 STATISTICAL APPROACH TO FEATURE-BASED MINE DETECTION USING GPR... 13

2.1 AUTOMATED DATA PROCESSING SCHEMES... 16

2.1.1 Clutter suppression ... 16

2.1.2 Feature extraction and construction of a confidence map... 17

2.1.3 Local Maxima Automated Detector... 17

2.1.4 Parameters of a decisive feature distributions... 18

2.1.5 Target/clutter dichotomy and ROC curves... 20

2.2 FEATURE FUSION TECHNIQUES... 23

2.2.1 Maps Reconciliation and Hard Fusion... 24

2.2.2 Parameters of feature vector distributions... 26

2.2.3 Linear and quadratic fusion rules ... 28

2.3 FEATURE NORMALIZATION... 30

2.3.1 Johnson’s Transform... 30

2.3.2 Pseudo-normalization of the clutter distribution ... 35

2.3.3 Simultaneous pseudo-normalization of the distributions of target and clutter classes 40 2.4 CONCLUSION FOR CHAPTER 2... 42

CHAPTER 3 WAVEFORM BASED CLUTTER SUPPRESSION ALGORITHMS 43 3.1 TARGET RESPONSE WAVEFORM AS A FEATURE... 43

3.1.1 Amplitude of APM responses to GPR probing pulse ... 45

3.1.2 Shape of APM responses to GPR probing pulse ... 47

3.1.3 Similarity Measure ... 49

3.1.4 Procedure to determine the reference wavelet... 52

3.2 PENALTY FUNCTIONAL... 53

(8)

_______________________________________________________________________________________________________________________________________________________________________________________

3.2.2 Gaussian penalty functional ... 55

3.2.3 Modifications of PLSM as an input to SAR ... 58

3.3 SUPERPOSITION OF SAR AND PLSM ALGORITHMS... 61

3.4 CONCLUSION FOR CHAPTER 3 ... 65

CHAPTER 4 FEATURE GENERATION AND SELECTION... 67

4.1 IMPROVED PROJECTION ALGORITHMS... 67

4.1.1 Windowed energy projection ... 67

4.1.2 Alternating sign windowed energy projection... 71

4.1.3 Two- and three- dimensional ASWEP ... 77

4.1.4 Modification of ASWEP for cross-polar radar ... 80

4.2 AUXILIARY FEATURES... 82

4.2.1 Features based on local statistics of confidence maps ... 82

4.2.2 Non-statistical features ... 85

4.3 PROCEDURE TO DETERMINE OPTIMAL FEATURES... 86

CHAPTER 5 IMPROVEMENT OF THE MINE DETECTION PROVIDED BY THE ADVANCED TECHNIQUES... 91

5.1 APPLICATION TO THE DATA FROM MC’02 ... 91

5.1.1 Dry sandy lane ... 91

5.1.2 Wet sandy lane ... 97

5.1.3 Grass lane ... 98

5.2 APPLICATION TO THE DATA FROM MC’04 ... 101

CHAPTER 6 CONCLUSIONS AND RECOMMENDATIONS ... 109

APPENDIX A IRCTR GPR... 115

A.1 DESIGN CONSIDERATIONS... 115

A.2 DATA PRE-PROCESSING... 117

APPENDIX B MEASUREMENT CAMPAIGN OF SUMMER 2002... 125

B.1 DRY SANDY LANE... 125

B.2 WET SANDY LANE... 127

B.3 GRASS LANE... 127

APPENDIX C MEASUREMENT CAMPAIGN OF 2004 ... 129

C.1 GOALS OF THE CAMPAIGN... 129

C.2 MEASUREMENT SITE AND SET UP... 130

C.3 TECHNICAL EVALUATION OF MEASURED DATA... 133

LIST OF ACRONYMS... 137

REFERENCES ... 139

SUMMARY... 147

(9)

____________________________________________________________________________________________________________________________________________________________________________________

(10)

(11)

Chapter 1 Introduction

Antipersonnel landmines (APM) represent a constant threat to civilians and military units in many countries all over the world. Land mines are used to secure disputed borders and to restrict enemy movement in times of war. In a military theory APM serve a purpose similar to barbed wire or concrete dragon's teeth ve-hicle barriers, channeling the movement of attacking troops in ways that permit the defenders to engage them more easily. From a military perspective, land mines serve as force multipliers, allowing an organized force to overcome a larger en-emy. But being comparatively cheap the APM are dispensed over large areas where they remain long after a war has ended. Presence of the APM often causes civilian deaths or mutilation and the fear of them results in economical disasters in rural areas affected by wars. According to anti–land mine campaigners, in Cambo-dia alone, mines have resulted in 35,000 amputees after the cessation of hostilities. Removal of landmines is dangerous, slow and costly. The total amount of depos-ited APM is estimated in excess of 100 millions and is growing [1, 2].

Although they may come in different shapes and sizes, the majority of the most widely spread APM follow the description of a small plastic cylinder of 5 to 12 cm in diameter and 4 to 8 cm in height filled with TNT [3]. Inner composition of APM varies from type to type and some of the developed technologies are in principle able to differentiate between the types. However, the main problem on the practical level remains low or no metal content of some of the APM of modern types. It is generally percept that the versions of the APM, which possess consider-able amount of metal, represent lesser challenge in their detection.

The detection and clearance of the APM is very difficult and very important task. An area is counted as ‘cleared of mines’ if 99.6% of all mines formerly pre-sent in it have been detected and removed [2]. Although there is no clear restriction on the false alarm rate (FAR) but it is generally understood that the higher the FAR, the more time consuming, tiresome, and expensive the demining becomes. Currently, the main tool to locate an APM is still a prod. The prodding during which the whole area is meticulously tested by manually inserting a prod into the ground is a slow and dangerous process. The FAR here often counts to 100 false alarms per correct detection. Therefore the prodding is often preceded with the use of an inductive metal detector. This device allows fast and reliable detection of metal. There are two problems related to it: a) some of the mines contain little or no amount of metal and b) former battle-fields often contain too much metallic debris, which prevent tuning of the detector to low metal content.

(12)

_______________________________________________________________________________________________________________________________________________________________________________________

This thesis is focused on the development of the algorithms of improved de-tection of the plastic-cased landmines with Ground Penetrating Radar. The Intro-ductory chapter continues with the state of the art description of GPR technology. The approach taken to the developments of the thesis based upon this description is outlined in the conclusion to the chapter.

1.1 GPR technology for mine detection

Ground penetrating radar is a noninvasive electromagnetic geophysical tech-nique for subsurface exploration, characterization and monitoring [4]. Its operation is based on emitting of electromagnetic waves into the ground where they are scat-tered by the nonhomogeneities. The energy scatscat-tered by the inhomogeneities in the soil is registered by the radar. The data acquired during these measurements are used to infer the structure of the subsurface. The reason of applicability of GPR to the problem of detection of APM coincides with the source of the main difficulty in its use. GPR is sensitive to any inhomogeneity in the ground. Therefore any APM regardless of the metal content can be detected. On the other hand, all the inhomogeneities, which do not represent mines, show up as a clutter in GPR im-ages.

By the way GPR devices emit their probing waves they can be divided into two groups: the stepped frequency continuous wave and video-impulse radar. In the first case radar constantly radiates a sinusoidal electromagnetic wave, which frequency is changed by a certain law. In the second case a short electromagnetic video-pulse is emitted. Both types of GPR are developed for the use in mine detec-tion. It is generally accepted the pros and contras of the two approaches have pri-marily technical nature and well balance each other.

The other way to divide mine-detection GPR devices is by the mode of their operation. GPR are made hand-held, vehicle mounted or platform mounted. This is a more principal division and the radars developed for different operation mode normally solve different problems.

1.1.1 Hand-held devices

The hand-held devices are supposed to be operated much like the inductive metal detectors. An operator of such radar covers the interrogated areas with sweeping motions trying to maintain constant velocity of movement and height of the antennas. As none of this requirement is easy to adhere to, the data processing and feature extraction algorithms are difficult to develop. On the other hand, such devices are easily transportable, may be used in difficult terrain conditions and what is also important are cheap [5].

(13)

re-_______________________________________________________________________________________________________________________________________________________________________________________

ported on the use of data processing based on hidden Markov models [6-8] and even Synthetic Aperture Radar [9]. Moreover, the hand held devices are often paired with inductive metal detectors into the sensor arrays [5, 9]. In the latter case it has been reported in [8] that the performance of the automated EMI/HH GPR sensor array was close to if not surpassing the performance of a human expert.

GPR of this class are aimed at the mine detection in the terrains difficult to access where no other solution is possible. On the other hand, there exist inherent limitations on the size and weight of these devices and problems related to their operation mode. These problems and limitations can be overcome by larger de-vices mounted on different types of vehicles.

1.1.2 Vehicle-mounted devices

The main alternative to the hand-held devices are the vehicle-mounted ones. These devices are mounted on various types of vehicles and may be used where the interrogated terrain allows the movement of the carriers. GPR of this type are di-vided into the two groups: forward-looking devices and downward-looking ones.

The forward looking devices, such as NIITEK-Wichman [10-12], FLGP-SAR [13], GPR developed in Stanford Research Institute [14], and some others operate in predicting mode. They illuminate the interrogated terrain obliquely and receive part of the energy scattered by the mines. In such a set-up plastic-cased APM are very difficult to detect and the whole approach is more characteristic to the anti-tank mine-detecting devices. At the same time the technology of detection of the anti-tank mines with the forward-looking GPR devices has been already developed to the industry level [15].

As a main alternative, the vehicle mounted GPR may be downward looking. In this case the device is fixed on a frame in front of or to the side from the carrier. The device then interrogates the terrain directly underneath it. Some of the systems implementing it have been developed to the industry or industry-prototype levels [16-20]. Most of the University level research is made around the systems that emulate the downward-looking vehicle mounted GPR approach, where a vehicle is often replaced with an accurate scanning platform [21-28]. The popularity of the approach is explained by the fact that if technically feasible it represents the best possible conditions from the radar point of view.

1.1.3 IRCTR Scanner-mounted GPR

(14)

_______________________________________________________________________________________________________________________________________________________________________________________

The resulting GPR simultaneously uses two orthogonal polarized transmit an-tennas [30] and four receiver anan-tennas connected to the independent channels cor-respondingly. This set-up results in four co-polar and four cross-polar receiving channels. The identical dielectric wedge antennas are used as transmitters, identical loops are used as the receivers. The receive antennas are placed directly underneath the transmitters in order to narrow the footprint of the antenna system and thus to enhanse its spatial resolution. One of the consequences of the design is that the direct wave is the largest signal in the system, which defines the upper limit of the system’s dynamic range. The radar is capable of collecting single echo-returns (called A-Scans) in the stop-and-go and continuous modes. The continuous data acquizition results in echo-profiles (called B-Scans). Collections of adjacent B-Scans constitute 3D datasets (called C-Scans) which are the primary format of the raw data for the device.

The radar (Figure 1.1) has 4 quasi-monostatic transmitter-receiver pairs (T1R2 and T2R1 co-polar; T1R3 and T2R4 cross-polar) and 4 significantly bi-stactic ones (T1R1 and T2R2 co-polar; T1R4 and T2R3 cross-polar). The radar is therefore capable of collecting fully polarimetric data. The heights of the receiving loops over the surfaces vary from 10 to 30cm depending of the scanner platform settings. The distance between the receiver loop and the aperture of the transmit antennas is 28 cm.

(15)

_______________________________________________________________________________________________________________________________________________________________________________________

The high quality of the hardware and the high stacking number provide a sub sample stability of the firing times of the generator. The jitter of the system has been determined equal to 6.47 ps on the entire C-Scan. After the slow trend of the generator towards the earlier firing time has been removed from the data the jitter becomes smaller and equals to approximately 3.5 ps. In both cases the jitter is well inside of one sample. The Radar and Scanner used to acquire the data considered in this thesis are detailed in Appendix C .

1.1.4 The main challenge in mine detection with GPR

It has been shown by the results of many researches that the problem of APM detection can not be solved on a straightforward hardware level [2, 5, 12, 15, 23]. That is, unlike the ‘signal detection in noise’ situation the mine detecting radar faces the following problem: the more sensible and accurate the hardware is the more of ever-present soil inhomogeneities and friendly objects it detects. More-over, the energy scattered by the plastic-cased APM is very often lower than that scattered be friendly objects and even in some cases by the surface roughness. This renders the ‘level threshold’ detection inapplicable and the necessity in develop-ment of feature based detection becomes apparent. In turn, any type of feature-based detection requires the development and application of sophisticated data processing algorithms.

From a humanitarian demining perspective I see the following main challenge in mine detection with GPR. The false alarm rate on the (nearly) 100% detection level must be lowered until acceptable level. The acceptable level supposes that the operation of mine detection becomes faster in comparison with currently existing techniques (hence metal detector aided prodding). In order to achieve this goal a framework of data processing, feature selection and feature fusion algorithms must be built.

The issues connected with the building up of such a network will be treated in the present thesis.

1.2 Data processing algorithms

(16)

_______________________________________________________________________________________________________________________________________________________________________________________

1.2.1 Clutter suppression algorithms

As it was mentioned above, the main problem for detection of APM with GPR is the clutter. The clutter, which is defined as any electromagnetic phenome-non not associated with targets, cannot in general be treated as white additive noise. This significantly complicates the issue of the clutter suppression. Most of the algorithms of clutter suppression are based on the background subtraction in various forms [32-37]. The main idea of the algorithms of this type is the definition of the background model and then subtracting it from the measured signals. The simplest form of such algorithm is mean value elimination, often used in civil GPR systems in the generic or windowed format. In the latter format it has been imple-mented with some success for the mine detection GPR [32]. In [32] a 3-D realiza-tion of this approach where the averaging is made over an elliptic cylinder centered on the A-Scan in question is used. The technique has been shown to successfully eliminate the direct coupling and to a large extent suppress the surface bounce. Success in the latter aspect generally depends on the apt selection of the size of the averaging cylinder. Various implementations of the same approach that use median filtering etc. were also considered [33-35]. All these approaches, however, fail at least partially if the surface roughness is such that the apt selection of the averag-ing (or median determination etc.) radius leads to the disruption of the target signa-tures.

Thus more sophisticated approaches to the clutter characterization are devel-oped. A characteristic example of such developments is reported in [36] where the scattering associated to background and targets are represented with damped expo-nential models. Differences in the model parameters for the cases of surface roughness and localized objects are successfully used to suppress the background clutter better than average elimination.

Another approach has been suggested in [37] where a Gaussian model for sur-face roughness was assumed and used for construction of the algorithm clutter suppression. The algorithm uses the times of arrival of clutter/target wavelet phe-nomena for optimal projection of them onto ‘ideal flat surface’ space. The differ-ences in the resulting projections between the ‘clutter alone’ and ‘clutter and tar-get’ situations are used for suppressing the former.

(17)

______________________________________________________________________________________________________________________________________________________________________________________

1.2.2 Focusing algorithms

Additive noise itself can be a problem for detection of plastic-cased APM due to their small size and weak scattering contrast. On the other hand, it is well known that electromagnetic phenomena associated to any localized object including APM leave hyperboloid-like traces in C-Scans. Exploiting this information it is possible to collapse these traces into localized bulbs using some kind of a focusing trans-formation [4]. The focusing is a common technique which ultimately uses a sort of a radon transformation [38] to coherently sum the signals scattered by localized objects. The improvement in signal to noise ratio (SNR) is caused by self-cancellation of the non-coherent noise phenomena, whilst object-related phenom-ena are summed coherently and thus enforced. Synthetic Aperture Radar (SAR) technique is a closely related concept from a remote sensing [39] that uses similar transformation to increase the images resolution. It can be shown that the SAR formulation implements the optimal filtering technique for imaging of a point ob-ject on the basis of a B-Scan or a C-Scan [40]. In [40] SAR 2D technique had been successfully applied to GPR data to improve resolution of the images of elongated objects. The generic form of a focusing algorithm in time domain can be repre-sented by:

( )

ξ

γ

γ ξ ξγ d d T C r F A

∫∫

∈ = , , r _(1.1)

where rr=

(

x,y,z

)

, the point being imaged, C_ξ,_γ

( )

⋅ and F

(

x,y,z

)

represent initial and focused C-Scans, A is the 2D aperture of the radar to be synthesized, and T is

the time needed for a probing pulse to travel from the transmit antenna to the point being imaged and backwards to the receive antenna. Techniques of same type can be applied in frequency domain, which leads to the similar formulations [13, 41]. Other developments in this direction include the combination of the delay-and-sum approach techniques with those originated in wave equation [42] and attempts to speed up the computation process by a technique resembling FFT in Cooley-Tukey formulation [43].

Another approach to the focusing is a wave-equation based migration. The wave-equation based migration is a technique originating from geophysics [44] where it used as a primary imaging tool. Some connections between the migration and SAR – based techniques were established [45, 46] but the former is lesser known in APM detection.

(18)

_______________________________________________________________________________________________________________________________________________________________________________________

1.2.3 Feature based target detection algorithms

As already discussed, the detection of APM mines with GPR on the basis of energy threshold alone leads to unacceptably high false alarm rate. Therefore, the detection must be made on the basis of one or more features. A feature in this

con-text means any scalar number parameter f , which can be associated to any object

(or set of coordinates) suspect for being (or hosting) a target. This definition de-scribes a generic target detection algorithm in a hypothesis testing theory terms:

( )

   + only clutter : clutter signature target : 1 0 f H f H (1.2) This is a fairly common definition of the detection problem to target detec-tion, so most of the publications referred above describe their approaches in these terms. It is understood that the more distinguishable the distribution densities of a feature f are for the classes of target and clutter, the better detection level is achievable. It is also understood, that more than one feature is employed for the decision making. However for the two-class hypothesis testing problem (1.2) the optimal decision is made on the basis of exactly one feature [45]. This means that the generic fusion process takes as an input an arbitrary amount of scalar features and results in a single fused (decisive) one.

A vast variety of features is used to detect the APM. In every particular case the features are selected taking into account the nature of the problem and the characteristics of the targets and the hardware. Hand held devices require very ro-bust approaches. In [6] the detection with hand held GPR is considered in depth. As mentioned before, only a limited set of features is robust enough to be used in this set-up. In the reference a measure of the deviation of measured data from what is predicted by a background model is used as a numerical parameter. Obtaining quite promising results in terms of detection the authors had to settle with quite high rate of the false alarms. They developed a more sophisticated technique [8] combining the hand held GPR with metal detector. The main feature used to dis-criminate between targets and clutter was the closeness of the observed figure in a frequency domain to a predefined ‘target pattern’. This feature has shown suprem-acy over energy detection in the presence of friendly objects.

(19)

______________________________________________________________________________________________________________________________________________________________________________________

In [17] a fuzzy logic approach is used to detect hyperboloid traces left by tar-gets. The detection is quite successful but since any localized object can be charac-terized with hyperboloid traces no robustness to the presence of friendly objects is outlined in this reference. Analogous results are obtained with the use of the level of polynomial fitness of the object traces as a primary feature [11]. Further devel-opments of the algorithms based on the hyperbola-distributed phenomena include the use of hidden Markov models [18, 19] and higher order statistics [12].

Almost entirely different features are used when the detection is made in fo-cused GPR images. In this case all localized objects are represented in images with bulb of various shapes and intensity. In [43] the time-frequency analysis apparatus is employed to distinguish between the targets and friendly objects. Coefficients of Choi - Williams distribution are used to form the decisive feature (see subchapter 1.2.4 on the details of the fusion process). In [44] essentially the waveform of the object image in focused A-Scan is used to distinguish between the targets and clut-ter.

Seemingly in all cases described in the references the detection of the plastic cased APM is achievable. On the other hand, achievement of an acceptably low rate of false alarm is much more challenging and an open problem. It is also quite visible that this problem cannot be solved by one measured feature. Therefore: a) methods of clutter suppression should be further developed and b) feature fusion techniques should be studied.

1.2.4 Feature Fusion

It has been seen already that the problem of detection of APM requires the use of multiple features. The optimal way of combining the information these features provide is called feature fusion. Besides, the problem of fusion of different features for target detection always arises when any multi-feature system is used. A system may comprise sensors of different nature, or just independently acting radars, or the same radar operating in different modes, or even the same flow of the raw data from the radar can be processed by different algorithms, which produce different features.

Three of the examples considered in the previous subchapter required the fu-sion of the used features. In particular, the hand-held GPR combined with an EMI sensor in [19] utterly provided the set of two confidences coming from each of the sensors. In that reference the fusion is organized by simple take of a geometric mean. In [43] the coefficients of time-frequency distribution taken as individual features are combined into the decisive feature by means of linear discriminant analysis [45]. The waveform based detection in [44] is organized via maximum deflection criterion based quadratic classifier [46].

(20)

_______________________________________________________________________________________________________________________________________________________________________________________

maximum likelihood theory to the problem. A multivariate target and clutter fea-ture distributions are evaluated from training data by means of Monte-Carlo like modeling. The performance of the resulting classifier is quite high when evaluated via leave-one-out method (all of the data except for one target are used for training and this one target is then detected). As no blind test is provided it is impossible to assess the robustness of the algorithms.

On of the two main alternatives to the generalized likelihood ratio approach in feature fusion is the use of the Dempster-Shafer evidence theory [48]. In [49] this theory is used to build-up a two-level belief function model for EMI/GPR/IR sen-sor array. The result of the fusion clearly demonstrates the superiority of the sensen-sor array over any given sensor. It is difficult to judge the robustness of the algorithm to the change of the environment and targets as no blind test is provided. In [50] the same theory is used to fuse the data from EMI/GPR sensor array. Although the belief system in this case has one level, the gain given by the use of the fusion is clearly demonstrated. The research is done in laboratory conditions and this pre-vents the assessment of the robustness of the algorithms

In [51] the problem of data fusion in polarimetric IR/GPR sensor array data fusion is considered. The author uses the second main alternative to the general-ized likelihood test, which is a development of the k-NN (k nearest neighbors) ap-proach. A learning vector quantifier (LVQ) classifier provides very impressive re-sults when tested by leave-one-out approach. But the performance was less impres-sive when the blind test is done. The variability of the feature from the training to the test site is given as the main reason for the deterioration of the performance. But the overtraining of the algorithm may also occur as the k-NN based algorithms often have quite complex decisive border [45].

Use of the generalized likelihood ratio approach (or Bayesian approach) is the optimal way for the feature fusion [45]. However, it is embarrassed by the neces-sity of evaluation of complex integrals to compute the feature densities. Moreover, the approach can also be plagued by the overtraining issue. At the same time the cheap alternative provided by the maximum likelihood based quadratic classifier is only optimal for the normally distributed features and fails when the feature butions are skewed. However there exist methods of quasi-normalization of distri-butions [52] and these methods will be adapted in the thesis to help the mine detec-tion with GPR.

1.3 Conclusion for Chapter 1

(21)

_____________________________________________________________________________________________________________________________________________________________________________________

• Statistics based multy-feature detection; • Waveform based clutter suppression

Suggested approach is sound since it is based on the firm theoretical basis both in feature-fusion [45] and clutter suppression [53] aspects. At the same time the approach is novel since the algorithms of feature pre-normalization and feature fusion are not found in the literature in application to land-mine detection. Also the importance and perspectives of the wave-form analysis in landmine detection is mentioned in several published papers [26, 43] but this aspect of GPR data proc-essing is not yet explored.

The following statements will be substantiated in the thesis:

1. An open automated framework of the multi-feature based detection of APM with use of GPR can be developed

2. This framework should use Bayesian approach to the feature fusion expressed by means of the linear-quadratic classifier

3. In order to enforce effectiveness of such classifiers certain measures should be undertaken to ensure the (quasi) normality of the decisive features

4. The current state-of-the art approach to the clutter suppression is based on the concept of decrypting the clutter and detecting any deviation from it. This ap-proach is important but in itself unsatisfactory for lowering the false alarm rate.

5. The wave-form based clutter suppression algorithms can be developed. In such an approach the clutter suppression would operate from the target model and suppress all waveforms, which cannot represent targets. It is supposed that such an approach when paired with the classical can significantly improve de-tectability of targets and suppress the false alarm rate.

The present thesis is built on the following assumptions:

1. It is supposed that any given decisive feature forms a confidence map in which the targets may be detected independently. Auxiliary features may be con-structed using such a map.

2. It is supposed that an optimal set of features representing targets can be de-vised from a training site and then used at a blind test site.

The thesis is organized in the following way. The core of the statistical ap-proach and feature normalization techniques are described in the Chapter 2. The Waveform based clutter suppression techniques are discussed in the Chapter 3. The process of forming of confidence maps is treated in the Chapter 4. Also in that Chapter the procedure of the selection of the best features to detect the targets in the given conditions is discussed. The algorithms suggested and described in the Chapters 2 -4 are tested on the measured data in the Chapter 5. The Chapter 6 represents the Conclusions for the thesis. The descriptions of technical character, such as details on the operating hardware, data pre-processing, measurement sites and set-ups are given in the Appendix.

(22)

_____________________________________________________________________________________________________________________________________________________________________________________

(23)

Chapter 2 Statistical Approach to Feature-based Mine

De-tection Using GPR

As it has been shown in the Chapter 1, GPR technology can be successfully used for detection of plastic cased APM. However, the false alarm rate, which cor-responds to a 99.6% detection level, may be unacceptably high for any single fea-ture or even single sensor. Therefore it is necessary to come up with a framework that would allow detection of APM on the basis of several different features ex-tracted from the GPR data. The GPR data in turn can be processed in several dif-ferent ways to allow the extraction of various features.

In this chapter I suggest such a framework. It represents a statistics based ap-proach to the mine detection incorporating a set of features, each having its own distribution for the classes of mines and clutter. The feature characterization ap-proach is based on sample probability density functions for classes of targets and clutter learned from a training set. In this approach for each feature map the sample probability density functions are modeled either parametrically or non-parametrically based on the data acquired at a training site.

A short description of the framework reads: a 3-D GPR dataset is processed to produce a 2-D map of certain decisive feature; each point on the resulted map represents a confidence of a mine being present in (or underneath) the given loca-tion according to the given feature; the probability of the mine or clutter presence in the location is computed on the basis of confidences and their distributions learned from a training dataset. Using the probability distributions of individual features a quadratic classifier is built based on either maximum likelihood (ML) or maximum deflection (MD) criterion. In order to ensure that the quadratic classifier does indeed correspond to one of the criteria the individual features are trans-formed in such a way as to ensure their normal distribution. The resulting classifier is used as a plug-in rule on a test site.

Unlike the approaches suggested in [50-52, 54-60] the current one makes use of quadratic classifiers [45]. These classifiers stem from generalized likelihood test approach and are more robust than nearest neighbor based approaches when used as plug-in rules. The approach is illustrated in the Figure 2.1 in the form of a flow diagram. Following the diagram, one starts acquiring the data with one or multiple data acquisition scenarios, thus producing P raw C-Scans. These raw C-Scans _R

undergo a data pre-processing step, which enhances the quality of the data in the technical sense (for the details see Appendix A.2 ). One or several Data Process-ing Schemes are applied to each of the pre-processed C-Scans thus yieldProcess-ing:

(24)

______________________________________________________________________________________________________________________________________________________________________________________

of these features, each possibly supplemented with some additional co-features derived from its map, constitutes the output of the data processing scheme. To-gether they form the pool of decisive features. The features constituting this pool are independently transformed in such a way that their distribution densities be-come (quasi) normal for training data. The normalized features are used for feature fusion resulting in one fused decisive feature. The final decision is made compar-ing this feature against a threshold.

Therefore the suggested approach is the non-supervised one in which any available data are processed independently to form the feature pool, then are nor-malized, and then are subjected to the classification by a quadratic classifier.

(25)

(26)

______________________________________________________________________________________________________________________________________________________________________________________

2.1 Automated data processing schemes

As it was shown in the Chapter 1.1, data processing schemes for automated detection of APM typically contain steps aimed at the clutter suppression, feature extraction, amplification and feature based target detection, labeling of the loca-tions suspect for the presence of the targets, assigning a decisive feature value as-sociated to each suspect spot, and finally target/clutter dichotomy by means of a threshold. Thus a generic DPS follows the paradigm:

ection InitialDet Mapping Confidence Extraction Feature n Suppressio Clutter → → → (2.2)

Further in this subchapter I describe the particular steps of (2.2) as they are devel-oped for this thesis.

2.1.1 Clutter suppression

As it was discussed in the subchapter 1.2.1, the state of the art in the clutter suppression for mine detection with GPR is set on the background removal. The successful clutter suppression means removal of the direct wave, ground bounce, and background scattering phenomena which are not connected to localized ob-jects. For the purposes of this study I select a cylindrical moving window average (MWAE, [23]) as the background clutter suppression tool. The MWAE operator is given by:

( )

∑

( )

≤ + − = 1 2 2 1 ~ γ β αξ ξγ t C N t C t Cxy xy (2.3)

where Cxy

( )

t is an A-Scan measured at the location <x, y>, and the parameters α

and β define the circular window in which the averaging is made. The values of the parameters are selected in such away that the circle they form defines a ‘fairly con-stant’ area on the ground. The opposing criterion is that (2.3) should least disturb the responses of targets. The factual radius of this circle is a subject to change from site to site depending on ground roughness and the size of the mines to be detected. For example, for sandy minefield simulation sites considered in this study a 16cm radius is shown to produce good results. This radius is used in the examples throughout this thesis unless explicitly stated otherwise.

(27)

_____________________________________________________________________________________ _______________________________________________________________________________________________

objects. The non-trivial issue of the suppression of this kind of clutter is treated separately in the Chapter 0.

2.1.2 Feature extraction and construction of a confidence map

As discussed in 2.1 some kind of the focusing algorithm is usually applied in a DPS unless a requirement of the causality is posed. In the case when the real-time and/or causality condition is imposed other hyperbola detection algorithms are used [11, 13, 62-64]. All these algorithms just like focusing exploit the hyper-bola-like spatial distribution of electromagnetic waves scattered by a localized ob-ject. In this research no causality restriction was in place and therefore I am using a scalar stack migration SAR algorithm for the data focusing in the thesis.

The SAR procedure used here takes into account the refraction of the probing pulse at the air/ground interface. A spatial interpolation of the raw A-Scans is also used to ensure that the smaller objects are correctly imaged [65].

An optimal construction of a confidence map from a focused C-Scan is a topic for a separate discussion, which takes place in Subchapter 4.1. Here I confine to the note that the energy contained in a focused A-Scan with coordinates <x, y> is a robust feature to determine whether or not an object is placed in this location at some depth [8]. The energy based map is obtained from simple integration along the depth direction:

( )

zdz F E z z xy xy=

∫

1 0 2 _(2.4) using some kind of approximation of an integral. In (2.4) called Energy Projection (EP)Fxy

( )

z represent focused A-Scans, which are smooth and quite slowly

chang-ing functions of depth. This approximation therefore does not represent any diffi-culty. In the thesis these types of integrals are obtained by a trapezoidal numerical integration.

In the suggested approach I treat the output of (2.4) E or any other projec-_xy

tion discussed later as a confidence map. Namely, the value taken by E in any xy

location <x, y> represents a confidence that a target is placed in it. Note that the confidence may or may not linearly correspond to the probability of a target pres-ence in the given location [51]. If the definition of the confidpres-ence level is reduced to the sensor output value (e.g. 2.4) it requires introduction of a specific tool for the initial target detection.

2.1.3 Local Maxima Automated Detector

(28)

______________________________________________________________________________________________________________________________________________________________________________________

in each point of a confidence map where the confidence feature finds its local maximum:

( )

   = = ∈Ω otherwise M M if M B xy xy xy , 0 , max , ,γ χ γ χ _(2.5) where B_xy is the resulting detections map, M_xy is the energy or any other confi-dence map, and Ω is the vicinity defining the system resolution.

The <x, y> coordinates of non-zero entries of the mapBxyform the initial

de-tection list. The values ofBxy represent the scalar feature value or the sensor

out-put. As it is discussed later in Chapter 4.2 a few additional scalar parameters may be computed from the statistical properties of the immediate vicinities of the initial detections. The list of the initial detections and one or more scalar feature associ-ated with each entry of the list constitute the output of a data processing scheme.

The effective performance of the LMD is guaranteed for the projection (2.4) or any other type of a confidence map construction provided that a decisive feature takes non-negative values. Use of the LMD is of great importance when dealing with high resolution GPR data. As it is discussed in Chapters 1.1.3, Chapter 1 and Appendix 1.1 a typical confidence map consists of approximately 40000 pixels. It is obvious that no testing that requires considerable computations for each pixel is possible before their number is severely reduced. The LMD provides this reduction producing 15-30 labels per square meter. This number is also unacceptably high for a human expert testing but it allows a fast automated processing of polynomial level of computational complexity.

2.1.4 Parameters of a decisive feature distributions

If a 99.6% detection requirement is relaxed or the amount of false is allowed to be arbitrary high, the mine detection problem can be solved using any one par-ticular map with one decisive feature. To predict the quality of such a solution, one has to possess a knowledge of the distributions of the decisive feature for the classes to separate. Application of a DPS results in an independent random vari-able x that can be described in terms of its probability distribution function

( )

f =

{

x≤ f

}

Π Pr (2.6) or its probability density function

( )

f ₌_∂_Π

( )

x /_∂f

(29)

_____________________________________________________________________________________ _______________________________________________________________________________________________

[66] or non-parametric kernel estimation [67]. In the latter case the probability density of a feature f is estimated from the measured dataset

{ }

xi 1nvia:

( )

∑

=       − = n i i h x f K nh f 1 1 ˆ π (2.8) where πˆ is the estimation of the density, n is the number of samples taken for the estimation, h is the window width, andK

( )

is a differentiable kernel function satisfying

( )

∫

= R dx x K 1 (2.9) Gaussian or Epanechnikov kernels are used the most often. The kernel estimation technique is more robust than the one based on the normalized histograms. More-over, it produces a result that is differentiable, which is of importance for some algorithms.

The most important for this study feature distribution parameters are conveyed in its first 4 moments. They can be estimated after the feature distribution function is described or alternatively computed directly from the measured data. The first two moments are estimated via:

∑

= = N k k x N 1 1 1 µ (2.10)

(

)

∑

= − = N k k x N 1 2 1 2 1 µ µ (2.11)

where N is the size of the dataset. These moments define centrality and spread of the data and the data itself if they are normally distributed. If the latter is not the case the higher order moments must be taken into account. In this study considera-tions will be confined to the moments of the third and forth order. These moments will be used in the normalized on the variation form in which they are called skewness and kurtosis. Their sample estimation are defined by

(

)

∑

= − = N k k x N Sq 1 2 / 3 2 3 1 1 µ µ _(2.12)

(

)

∑

= − = N k k x N Kr 1 ₂4 4 1 1 µ µ _(2.13)

(30)

______________________________________________________________________________________________________________________________________________________________________________________

2.1.5 Target/clutter dichotomy and ROC curves

It follows from the Bayes theorem that the optimal separation of L classes can be achieved using L-1 features [45]. For our two-class case this means that a) fea-ture fusion must result in exactly one fused feafea-ture and b) measure of classes’ separability can be assessed in 1D case. The feature fusion is addressed in the separate subchapter 1.1. This subchapter addresses the issue of representation of the separation achieved.

Several possibilities exist to measure the separation of two classes provided by a feature. If the distributions of the decisive feature for the two classes are given by functions p_i

( )

x one can obtain the Bayes error, which is the quantity describing the amount of misclassification errors in the case of the optimal separation:

( )

_∫

( )

∫

∞ ∞ − + = * * 1 2 _x x B p x dX p x dX ε (2.14) where x*is an optimum decision boundary and piare the probability densities of

the classes to separate. This is illustrated in the Figure 2.2, where the probability densities for the classes of target and clutter are shown. The densities are computed using kernel estimation approach (2.6) – (2.7) from samples of actually measured data. The optimal decision point x is at the intersection of the densities and the * Bayes error for the given case is εB≈0.027. The decision point is marked with black line in the figure.

(31)

_____________________________________________________________________________________ _______________________________________________________________________________________________

For the humanitarian demining, however, the decision point is preset to be at 0.996 detection probability and therefore the error should be calculated via

( )

∫

∞ = 0 1 x HD p x dX ε (2.15) where x0is the point corresponding to the 99.6% detection level. The point x0is marked with a pale line in the Figure 2.2; the corresponding errorεHD ≈0.022for the example given. Note, that the definition (2.15) does not include the error due to 0.04% of the missed targets.

Chapter 1. The numbers given by (2.14) and (2.15) are only estimates since they are computed on the basis of estimation of the probability density func-tions, which oftentimes are only given in measured samples. Alternatively one may obtain bounds for the Bayes error using first two moments estimated from the measured samples without making any assumptions about the distri-butions. Namely, generalized deflection

( )

(

)

(

)

T C C T d 2 2 2 1 1 1 α µ αµ µ µ α − + − = (2.16) and Bhattacharyya distance

(

)

C T C T C T C T B 2 2 2 2 2 2 2 1 1 2 log 2 1 4 1 µ µ µ µ µ µ µ µ ₊ + + − = (2.17) also represent the separation of the classes provided by the feature. It is of impor-tance, however, that these estimates are optimal only when the distributions under the test are normal. For the example given above these estimates are equal to 57 (for α=0.5) and 5.7 respectively, which correspond to the high separability. It must be taken into account though that these estimates are optimistically biased since they do not take into account the skewness of the distribution of the class 1 to the right.

The deflection criterion (2.16) is used further in the thesis for optimization of mask parameters in projection algorithms (Section 4.1) while the Bhattacharyya distance is used for the optimization of the parameters of s clutter suppression al-gorithm (Section 3.1). These numbers should in general be used to compare differ-ent features and their discriminative power to each other. The performance of a classifier is better described by a functional called Receiver Operator Characteris-tics (ROC) curve. There are two distinctive types of ROC curves:

• The parametric curve (threshold is the parameter) for which the probability of false alarm and probability of detection are the X- and Y- coordinates.

• The parametric curve (threshold is the parameter) for which the amount of false alarms is the X- coordinate and amount of correct detections is the Y- coordinate.

(32)

______________________________________________________________________________________________________________________________________________________________________________________

latter approach is used wider in the characterization of the demining systems: most papers describing the mine-detection devices include the ROC curves of this type [6-14, 17-28 etc.].

The construction of the probability-probability (PP) ROC curve is only possi-ble if one knows the distributions of the clutter and mines. In other words, to build such a curve it is necessary to characterize the probability density functions of the distributions of false alarms and mines. Then moving a threshold one obtains the theoretical ROC curve integrating the probability density functions from threshold value until infinity:

( )

∫

∞ ∞ = = θ θ ξ ξ ξ ξ d f y d f x target clutter (2.18)

where θ is the threshold value. Such ROC curve can be built using the densities estimated using kernel estimation approach or straight from the histogram. The advantage of this definition is that it allows prediction of the classifier performance on other sites provided that the classes to separate would be drawn from the same distribution as the corresponding classes in the training site. The disadvantage is the necessity to estimate the distributions for the classes.

The second definition of the curve does not require distribution modeling and the performance of the classifier is judged on the site, counting the amounts of cor-rectly and incorcor-rectly classified objects:

( )

{

}

( )

{

θ

}

θ > = > = O f O Amount y O f O Amount x _clutter | | target (2.19) where O_classrepresents objects from the classes of clutter and target and f

( )

O is the corresponding value of the decisive feature. Often the amount of correct detec-tions is given in percents to the total number of targets, while and the false alarms amount is normalized on a scanned area unit to yield a frequency of occurrence. Even though the normalized entities on the axes are sometimes called ‘probabili-ties’ in the literature [6-14], in fact these normalizations do not change the nature of the approach. It is of importance that the curve defined by (2.18) does not allow a comparison of performances of different demining systems unless the tests are made on the same ground. Moreover even prediction of the performance of the system on a test site is difficult on the basis of this curve built for a training site. This is due to the fact that in this curve the information on the mutual distributions of the decisive feature for mines and clutter is mixed with the information on the

amount of false-alarm generating objects present in the given lane.

(33)

_____________________________________________________________________________________ _______________________________________________________________________________________________ 10-3 10-2 10-1 100 0.93 0.94 0.95 0.96 0.97 0.98 0.99 1 Probability of FA Pr ob ab ilit y o f D etec tio n a 10-1 100 101 102 90 92 94 96 98 100 FA Rate (FA/m2₎ D ete ct io n P er ce nta ge ( % ) b Observed Predicted

Figure 2.3 Probability-probability (a) and detection percentage-FA rate (b) ROC curves

The practical curve may be predicted from the theoretic one using appropriate scal-ing and mean density of false alarms. In this thesis the curves built accordscal-ing to (2.18) are used to characterize algorithm performances on training sites. The curves built according to (2.19) are used mostly on the training phase for building detectors.

The following scalar parameters are used in the literature to characterize how good a theoretical ROC curve is: area above the ROC curve and probability of FA corresponding to 0.996 probability of detection. The latter parameter is introduced straightforwardly for practical curves, while the former requires rescaling of the FA rate axis [70].

2.2 Feature Fusion Techniques

In this subchapter I introduce a feature fusion technique in application for the landmine detection using GPR. The subchapter covers the reconciliation of the individual detection maps and determination of a fusion rules to be applied to the reconciled map.

The feature fusion procedure is slightly different for the phases of training and testing. In the training phase the process consists of 3 main steps:

1. The detection lists following from the different measurement scenar-ios and DPS are reconciled. This means, that each detected location in the lists is associated with the locations in the other lists. The proce-dure is equivalent to a decision level fusion and is detailed in sub-chapter 2.2.1. It results in two sets of vectors representing the classes of target and clutter by features.

2. The random vectors resulting from the step 1 are described parametri-cally or non-parametriparametri-cally. Possible ways for descriptions of feature vectors are briefed in 2.2.2.

(34)

_____________________________________________________________________________________________________________________________________________________________________________________

based on the first two moments of the multivariate distributions. These steps are described in 2.2.3.

In testing phase steps 2 and 3 are omitted. The step 1 results in only one set of vectors. The separation of classes of target and clutter is made by application of the fusion rule defined in the step 3 of the training phase and comparing the result against a threshold.

2.2.1 Maps Reconciliation and Hard Fusion

A set of local coordinates must be imposed on the (training or testing) site and a linear mappings are established between it and each of the confidence maps. Once these preconditions are met, the confidence maps resulting from several measurements and/or several DPS can be reconciled. The process of reconciliation although technical by nature may present a significant challenge, as noted in [47, 50, 51, 55]. To resolve this challenge in my study I apply the following strategy.

The initial detection by LMD (2.5) applied to each of the confidence maps at hand, produces several detection lists. These must be reconciled resulting in the initially empty list of the reconciled feature vectors. The vector is filled according to the reconciliation algorithm depicted in Figure 2.4.

As a starting point in the reconciliation I define a hello radius R and a fu-H

sionRule . The hello radius is a scalar defining the maximum distance between the

coordinates of two detections found in two different confidence maps that allows to associate them to the same physical object. Introduction of the hello radius in necessary due to a) unavoidable imperfectness of the local ↔ computer coordi-nate mappings and b) the fact that different measurement scenarios (MS) and DPS produce object images which maximum intensity varies in a range of several pix-els. Throughout this Thesis I allow 3 cm hello radius, which is about the radius of the smallest target. The fusion Rule is a number: 1≤Rule≤P_D. It defines the minimal amount of the detection lists in which any particular detection must be present in order to be retained in the reconciled list. Introduction of the Rule makes

the reconciliation process equivalent to the decision-level fusion. Obviously, set-ting the Rule equal to P makes the reconciliation equivalent to the hard fusion _D

(35)

___________________________________________________________________________________________________________________________________________________________________________________

Detection Lists LJ, J=1:P

J==P

Detections in the List λJ

I, I = 1:NJ

I==NJ

I on the Jth

place; P components

The address vector _ΑJ

I:= [0..I..0];

Detection Lists Lj, j=1:P,j≠J

Detections in the List _λj

i, i = 1:Nj d(λj i,λJI)≤Rh Aj:=i, i:=NJ SJI:= amount of non-zeros in Α J I SJI≥Rule

List of the reconciled feature vectors {Fn}1

N

Address: _ΑJ

I= [i1..iJ..iP];

Reconciled feature vector Fn

ΑJj I≠0 Fn p =Map(ip) Fn p =Ξ d stands for t he disctance exempt λp i No Ye s No j==P j++ I++ J++ i==Nj i++ No Ye s Ye s _No P components No Ye s

(36)

_____________________________________________________________________________________________________________________________________________________________________________________

The core of the reconciliation process is wrapped in 4 nested loops, as it is seen from the Figure 2.4. Hence one starts with any one of the detection lists (arbi-trary,_LJ_,

{

}

D

P

J∈1: ) and for each detection J I

l , I=1:N_J in this list forms an ad-dress vector _AJI∈_RPD_{. The vector is zeroed except for the J}th_{position, where the}

index I appears. One then checks whether or not a detection appears inside the hello radius in all other lists, j=1:P_D,j≠J. For each hit the address vector is updated by insertion to the jth_{position the index of the detection in the}

correspond-ing list. Once all the lists are checked the quantity S of non-zero components of the address vector is compared against the Rule. If the Rule is surpassed the detection must be retained and the featuring detections discarded from the lists using their indexes. The retained detection is added to the list

{ }

N

n

F ₁ of the reconciled detec-tions.

Definition of those elements of the resulting feature vector P

n R

F _∈ that cor-respond to the confidence maps where a detection was found is straightforward. For the elements of the vector corresponding to maps in which the detection was not found virtual detection coordinates must be constructed:

( )

, mean_p₁_:_P

{

(

xp,yp

)

}

= = γ

ξ (2.20) where

(

xp,yp

)

are the detections coordinates in those maps where the detections

are present. Once (2.20) is established, the corresponding element of the feature vector is taken to be

( )

ξ,γ j j _M = Ξ (2.21) The procedure (2.20) – (2.21) finalizes the forming of the reconciled detection. The algorithm continues the loop by detections J

i

l found in the Jth_{list until it is}

ex-hausted and then by the detection lists themselves. After all P lists are checked the retained vectors are supplemented with the additional features computed in the confidence maps as it mentioned in 2.1.2 and detailed in 4.2.

The described algorithm of maps reconciliation results in the list of detections represented by feature vectors. In this list all detections are represented by their optimally measured features. That is, all the features take the locally maximum values. Let P be the total number of the features used. Then each element of the list of reconciled detections can be treated as an independent observation_X _RP

∈ . In the training phase certain amount of information must be inquired about the distri-bution of X knowing its class affiliation. In the testing phase the affiliation of X must be inquired on the basis of the knowledge gained in the training phase.

2.2.2 Parameters of feature vector distributions

(37)

___________________________________________________________________________________________________________________________________________________________________________________

be treated as a random variable and thus can be described in terms of its probabil-ity distribution function

(

f fP

)

=

{

x ≤ f xP≤ fP

}

Π 1,.. Pr 1 1,..., (2.22) or its probability density function

(

)

( )

P P P X f f f f₁,.. =∂ Π /∂₁..∂ π (2.23) Just like in the 1D case treated in 2.1.4 the functions (2.22) or (2.23) must be estimated from the data measured in the training phase. A problem of mathemati-cally sound description of these functions could be more difficult than mine detec-tion per se. As it was discussed in the 1.2 no theory is available for adequate mod-eling of the behavior of at least some of electromagnetic-bound features peculiar to targets in an arbitrary medium. This generally prohibits parametric description of the multivariate feature distributions in the form of (2.22) or (2.23) for the case under consideration. In principle, the non-parametric approaches like kernel esti-mation techniques of (2.6)-(2.7) allow their expansion to P-dimensions. However, serious and often prohibitive difficulties arise as P grows. Firstly, the training set must grow significantly for the procedures to be effective. Secondly, technical problems of computation of high dimensional integrals may arise. And lastly, if the procedure of representation of the training site was successful, the effect of over-training may arise due to which the performance on a test site becomes unpredict-able.

On the other hand the moment parameters of the multivariate distributions can be easily estimated from the measured data without assuming any particular model. The following are the unbiased and consistent estimates [45] for centroids and co-variances:

∑

= = Μ N k k x N 1 1 (2.24) k k kF F N S

∑

′ − = 1 1 (2.25) where that P n R

F ∈ is a string-vector and therefore the product in (2.25) is a ma-trix. The parameters given by (2.24) and (2.25) are not enough to build Bayes clas-sifier in general case but constitute sufficient data to build linear-quadratic classifi-cation (fusion) rules.

The quality of separation that might be achieved with these rules can be pre-dicted with the use of the multidimensional analog of (2.16) with α=0.5 and (2.17) become Mahalanobis and Bhattacharyya distances given by:

(38)

_____________________________________________________________________________________________________________________________________________________________________________________

(

)

(

)

_T _C C T C T C T C T S S S S M M S S M M B log 2 2 1 2 8 1 1 + + −       ₊ ′ − = − (2.27) respectively.

2.2.3 Linear and quadratic fusion rules

The solution of the problem of definition of the optimal classification of the random vector X on the basis of its probability density follows from the Bayes’ theorem. This solution relays X to the class of either target or clutter on the basis of the likelihood ratio test [45]:

( ) ( )

_{( )}

1 2 2 1 1 0 P P f p f p f H H < > = r r r l (2.28)

where l is the likelihood ratio, fr=f

( )

X is the feature vector,p are the a-i

posteriori and P_i are a-priori probabilities of X belonging to the ith_{class. The} a-priori probabilities are normally not known to us in the humanitarian demining and will be assumed equal to ½ unless specified differently. The discriminant function associated with likelihood test (2.28) is then defined as

( )

log

( )

log

( )

0 log 1 0 2 1 H H f p f p f < > + − = − l r r r (2.29) These equations define the so-called Bayesian test for a minimum error, which also gives the optimal solution for our problem. But as it was stated above, we rarely possess adequate knowledge about the distributionsp . Since our knowledge about _i

these distributions is confined with its first two moments estimated from the sam-ples, the fusion rule must be based solely on them. Such rule exists and is given by the following expression [45]:

(

) (

)

θ 1 0 log 1 1 H H C T C C C T T T S S M f S M f M f S M f < > + − ′ − − − ′ − − r r − r r (2.30) where _θ is a threshold. Moreover, this rule represents the Bayes solution itself in the case where both classes are normally distributed. If, further, the classes share the covariance S1=S2=S the quadratic rule (2.30) reduces to the following linear