Analysis of Point Processes Observed with Noise with Applicational Example

(1)

A C T A U N I V E R S I T A T I S L O D Z I E N S I S

F O L IA O E C O N O M IC A 194, 2005

Jerzy Korzeniewski*

A NA LY SIS O F PO INT PR O C E SSE S O BSE R V E D W ITH N O ISE WITH APPLICATIO NAL EXAM PLE

Abstract

A n exam ple o f the application o f p o in t processes observed w ith noise are aerial p h o to g rap h s o f fo rests with th e aim o f estim ating th e actu al n u m b er o f trees on a given area. L und and R u d e m o (2000) p ro p o sed a m odel useful in this co n te x t, basing on the n u m b er o f “ trees c an d id ate s” visible o n the p h o to g ra p h . T h e p a ram ete rs o f co n d itio n al likelihood fun ctio n were estim ated tak in g in to a cc o u n t such v ariatio n s o f noise as p o in ts th in n in g , p o in ts displacem ent and a p p ea rin g o f e x tra g h o st po in ts. T h e a p p ro ac h p ro p o sed d o es n o t solve the p ro b lem o f the e stim atio n o f the actu al n u m b er o f trees.

In this p a p er a new a lg o rith m to estim ate directly the n u m b er o f actu al trees is pro p o sed . T h e o n ly a ssu m p tio n on which the new m easure dep en d s is th e n a tu ra l assu m p tio n a b o u t forest d en sity being locally co n stan t. T h e results achieved w ith the help o f the new m easure m ay be assessed as interesting.

Key words: p o in t process, m axim um likelihood m eth o d , noise, in com plete ob serv atio n , im age d a ta , c o m p u te r alg o rith m .

1. IN T R O D U C T IO N

F ig u re 1 depicts a m ap o f a p art o f a forest w ith 206 sm all circles and 171 dots. T h e d o ts were found on the basis o f an aerial p h o to g rap h of this p a rt o f the forest with the help o f a tem plate con stru cted by Larsen and R udem o (1998) and they represent candidates for trees (N orw ay spruce). Basically, the idea o f the tem plate construction is to choose pixels from black-and-w hite pho to g rap h the ellipse neighbourhood o f which gives suitably high correlatio n between the shades o f the grey colou r o f the neighbourhood pixels and the shades o f grey o f the ideal tem plate. T h e d o ts represent pixels for which the co rrelatio n was high enough. T h e circles represent true

* P h .D ., C h air o f S tatistical M e th o d s, U niversity o f Ł ódź.

(2)

trees found in the sam e region o f the forest by m an u al inspection. One pixel on the p h o to g rap h corresponds to ground area o f 0 .1 5 x 0 .1 5 m 2.

Figure 1. T ru e tree to p s (dots) and c an d id ate s fo r trees (circles)

T h e statistician ’s task is to investigate all the ph en o m en a influencing the picture th a t wc arrive at in connection with the tru e num b er o f trees that grow on the area being p h otographed and , if possible, to estim ate the n u m b er o f trees from the num ber o f candidates. L und and R udem o (2000) propose an ap p ro ach the idea o f which is to tre a t trees can did ates as realisations o f a p oint proccss contam inated by noise o f different kinds. T h e au th o rs found the conditional likelihood fu nctio n and analysed its beh avio ur to estim ate som e interesting p aram eters. T h is ap p ro ach is briefly outlined in the next section. H ow ever, the m entioned ap p ro ach , though m athem atically elegant does no t answ er the question th a t is m o st interesting to forest m en and ecologists i.e. w hat is the ap p ro x im ate n u m b er o f true trees? In the third scction wc propose a new m easure which tries to tackle this problem directly.

II. P O IN T P R O C E S S E S W IT H N O IS E

L und and R udem o proposed to consider tw o p o in t processes, one o f which i.e. Y is an im perfect observation o f the o th er process X . We assum e th a t X and Y arc point processes on a subset A o f á-dim ensional Euclidean spacc R d with a finite num ber o f points, X = { X ^ . i e M } , M = {1, m}, Y = { Y j : j e N } , N = {1, n}. A ssum e fu rth er th a t A is bounded with

(3)

a positive d-dim ensional volum e \A d\. Suppose th a t У is generated from the X process by the follow ing distu rbance m echanism s.

1. Thinning. Each point X „ fo r i e M , is thinned w ith probability 1 - p ( X J and retained w ith probability p ( X t). If an X p o in t is thin ned , then there will be any corresp o n d in g Y point. T hinings are assum ed to be independent for different points.

2. Displacement. F o r each rem aining p o in t X t a co rresp o n d in g Y, point is generated by displacem ent to a position with prob ab ility density k ( ■ |X .) with respect to Lebesgue m easure on R* G iven X , the displacem ents of different p oints are independent, m utually and o f th e thinnings.

3. Censoring. T he displaced points arc observed if they are w ithin the observ atio n region A; otherw ise they are censored and n o t observed. T h us censoring o f an unthinned p oint generated by X t occurs w ith probability

$ A 'k(y\X i)dy. H ere A c denotes the com plem ent o f the set A.

4. Superposition o f ghost points. In addition to the p o in ts generated as described above we have superposition o f extra “ g h o st” points. T hese points are assum ed to arise from a Poisson process on A w ith intensity ( IX.) w here X , as above, denotes the entire X -process.

The initial and basic result is the follow ing theorem w hich gives form ula for the co nditional probability o f a point process Y given a n o th e r process X .

ľheorem 1. Let X and У be tw o finite p o in t processes as specified above, on a bounded set A. Suppose th a t g ( y \ X ) an d k ( y |Х ;) for i e M , are c o n tin u o u s fu n ctio n s o f y e A. T hen th e co n d itio n a l likelihood o f

У given X is L ( Y \ X ) = e x p { \ A \ d - $ A g ( y \ X) d y } £ Z LiL 2L 3, (1) M , S M n e P ( M, , A , ) N t s N w here U = П P ( X d K Y M0 |Х (), IeM i ^2 = П { p i X i d L c k W d d y + l - p i X j } ieMIAÍ! L 3 = П 9 ( Y j \ X) , j e iVlt fj

and the reference m easure corresponds to the Poisson process on A with intensity 1. L ooking a t this form ula we can see th a t all possible noise “co m b in a tio n s” were tak en into account, because the sym bol P ( M l t N J denotes all possible one-to-one m appings from M i to N t .

(4)

T h e fo rm u ła given in T heorem 1 is to o com plicated to analyse in ord er to find its m axim um s, therefore, we can proceed w ith a couple o f sim p lifications. F irst sim plification is th at o f the hom og eno us intensity o f the ghost points i.e. g ( ■ |ЛГ) = A. T hen the likelihood function (1) simplifies to

Ц У \ Х ) = £ I T ( M l t N l t n) (2)

jWieAf itep(Af,,jV,)

|Af,| = |JV,|

with the sum m ed term s given by

T ( M j, N l t n) = p |Af,'AiJV'JV*1 e x p { (l — A J I/IJ } ! П K Y nW|X ,)

( . i e M ,

* Г1

{piAek(y\Xt)dy

+ 1 - p } . (3)

T h e second sim plification refers to the fact th a t nearly all X t points are so far from the b o u n d ary o f the observ atio n region A th a t we can safely assum e th a t they are no t censored i.e.

$Ack(y \X i)d y = 0. (4)

In the first ap p ro x im atio n o f (3) we assum e th at (4) holds for all X -points, and thus replace (3) by

exp { (i n k y mí)\x A

U'eAf, J

(5)

F o r s = ( M l , N l , n ) note th a t (5) m ay be considered as a fun ctio n o f the p aram eter vector

0 = (6)

and let us den o te th a t it is m axim ised by the follow ing vector

0(s) = (p , 1, М1,М2^ х, а г, р)

where

p = \ M l \ / \ M\ , X = \ N \ N t \ / \ Ad\, and (fi и 'цг, а v, a 2, p)

are th e stan d ard m axim um likelihood estim ates o f the p aram eters in a two- dim ensional norm al distribution based on the sam ple ( Y nW — X ;, i e M {).

(5)

Scarch fo r the m axim um is still a p roblem atic task and we cope with it by considering the function value in all possible “ neighbours o f a considered sta te ” . F o r state we define its n eig hb ou r ( M , , ^ , я ') if it can be o b tain ed from ( ) in one o f the follow ing five ways.

1. A ddition o f a pair o f X -and У-points: M\ — M l u{ i ' } where i ' e M \ M l ,

N \ = N t u {/'}, where j ' e N \ N = n(i), for i e M l and n'(i') = / . T he

n um ber o f such neighbours is |M \ A f 1||7 V \N 1|.

2. R em oval o f a pair o f X -and У-points: M \ = A ijX ji'}, where i 'e M , ,

N't = jV1\{ /'} , w here / e /V l5 я '( 0 = n(i), for i e M \ and n(i') = / . T his can

be d o n e in |А /( | = |N , | ways.

3. S w ap p in g an X -p o in t: M \ = ( M , \ { Г } ) u { i" } , w here i 'e A ^ and

i"e M \ M ,, N \ = N t n'(i) = я ( 0 , for i e M , \{ r '} and n'(i") = n(iÍ'). T here are

I M J I M X A f J such neighbours.

4. S w apping a У-point: M \ , N \ = ( N [ j " } , w here j ' e N l and

j " e N \ N l , я '(0 = я (0 , for iu M ,\{ i" } , where n( f ) = / and я '( 0 = j". Swapping

a У-p o in t ca n be do n e in

INJ |Ar\N,|

ways.

5. E xchange am o n g two pairs: M \ = M t , N \ — N \ = N lf я '( 0 = я ( 0 , for

i e M l \ {i', i"}, w here i ' e M l an d i " e M l , fo r i' ф i", n '(i’) = n (i') and n'(i") = я(Г). T h e nu m b er o f such neighbours is I M j K M j I - l ) ^ .

N ow we can search for the m axim um o f fu nctio n (2) by considering iteratively its value on all possible states each o f which is a neigh bo ur o f som e o th er state o f the previous iteration. In this way L und and R udem o found the m axim um o f the conditional likelihood fu n ction and tried to investigate the behaviour o f this function. T his ap p ro ach , th o u g h quite attractiv e from the m athem atical p o in t o f view, does n o t solve the m ain problem o f estim ating the extent o f forest depletion on the basis o f the possessed noise version i.e. the У process realisation.

III. D IR E C T A S S E S S M E N T O F F O R E S T D E P L E T IO N

Assessing intuitively the num ber o f tru e trees o ne feels th a t the position o f false trees is n o t independent o f the position o f tru e trees (obviously it c a n n o t be, e.g. b o th ca n n o t be located in the sam e p o sitio ns) or, in other w ords, th e average distance betw een false tree and the n earest tree (either false or tru e) is sm aller th an the sam e distance fo r the tru e trees only. This is p ro b ab ly caused by the fact th a t false spruces fo r som e reasons h ap p en to be located close to tru e spruces b u t are n o t spruces. T herefore, the q u a d ra tic dependence betw een the forest area and the n u m b er o f trees com prised by it (if we assum e uniform forest density) sh o u ld be violated for area chosen in som e way.

(6)

T h e area on which we will require the q u a d ra tic dependence between it and the n u m b er o f spruces is constructed in the follow ing way. F o r all pixels representing cither tru e o r false spruces we consider circles o f the sam e radius r with centres at the pixels. We let r grow and for each value o f r e.g. positive integer we calculate the n u m ber o f trees which fall within at least one circle. T h e function describing the dependence o f the num ber o f trees on r should be a q u ad ratic one. We ca n n o t go to o far with the radius length, because for big values o f r the circles o verlap one an o th er and thus som e trees would be counted twice. In T ab le 1 we calculated the num bers o f trees com prised by circles with radius length equal 20 pixels at the greatest. T h e p articu lar lengths ending with 0.1 were chosen so as to m ake th e series o f the num ber o f trees as sm oo th as possible. A ctually, no m a tte r w hat the ending o f the successive radius values is, th e conclusions arc exactly the sam e bu t fo r the ending chosen i.e. 0.1 the scries are w ithout big “ju m p s ” . N ext, we calculate the coefficients o f determ in atio n for the least squares q u ad ratic regression for the n um b er o f trees com prised by all circles in dependence on r. T h e coefficients presented in T ab le 1 were calculated fo r the regression lines based on 10 successive values o f r. F o r the reasons m entioned above we ca n n o t go to far w ith rad iu s length and if we choose o th er num ber o f observations for q u a d ra tic regression e.g. 8, 9, 11 o r 12, the results are alm ost identical (r2 is the sam e u p to 0.01).

T able 1. N um bers o f spruces (true and all) w ithin successive circles and d e te rm in a tio n coefficient for q u a d ra tic regression

R a d iu s length r

N u m b er o f trees w ithin circle o f ra d iu s r D e term in a tio n C oefficient

all trees tru e trees all trees true trees

4.1 5 1 0.902 0.803 5.1 9 2 0.930 0.873 6.1 10 2 0.952 0.904 7.1 16 2 0.963 0.914 8.1 24 5 0.975 0.943 9.1 34 6 0.985 0.966 10.1 50 13 0.990 0.979 11.1 69 23 0.990 0.983 12.1 92 36 0.991 0.987 13.1 109 42 0.994 0.992 14.1 140 64 15.1 169 84 16.1 197 104 17.1 234 126 18.1 261 145 19.1 297 171 Source: A u th o r’s investigation.

(7)

L ooking at the num bers presented in T ab le 1 o ne can see th a t the coefficient o f d eterm ination for true trees only reach th e ideal value (i.e. equal to one) “ la te r” th an the coefficient for all trees. L ater, obviously, refers to bigger values o f r and, w hat follows, to greater n um ber o f trees com prised by all circles. T o some extent this p henom en on m ay seem n atu ra l because the forest density o f tru e trees is sm aller. In o u r o pinion , however, this discrepancy is also caused by p articu lar location o f false trees. T he value o f d eterm in atio n coefficient for all trees which m ay be a p p ro p ria te for the “threshold p oin t” should be the one from interval (0.985, 0.990), preferably closer to 0.985, because later the differences betw een the values o f d eter m in atio n coefficient bccom c very small (even less th a n 0.001), to o sm all to be considered inform ative signs o f the dependence o f determ ination coefficient on radius length. I f we assum e 0.985 to be the thresho ld p oint we can see th a t for tru e trees we would have to throw aw ay only 6 trees o u t o f 171, which is a tolerab le m istake (3.5% ) and for all trees we w ould have to throw aw ay 34 trees, which is very close to the ideal n u m b er o f 35 false trees th a t should be rejected. T his m ethod o f assessing the n um b er o f true trees is n o t very precise because it leaves some placc fo r d elib erate threshold point choice, bu t it is very simple and free o f any assu m ptio ns a p a rt from uniform forest density.

R E F E R E N C E S

L arsen M ., R u d e m o M . (1998), O ptim izing tem plates fo r fin d in g trees in aerial p h o to g rap h s,

P attern Recognition L etters, 19, 1153-1162.

L u n d J., R u d e m o M . (2000), M o d els fo r p o in t processes observed w ith noise, B iom etrika, 87, 235-249.

Jerzy Korzeniewski

A N A L IZ A P R O C E S Ó W P U N K T O W Y C H Z S Z U M E M Z P R Z Y K Ł A D E M A P L IK A C Y JN Y M

Streszczenie

P rzy k ład em z as to s o w an ia p ro cesó w p u n k to w y c h o b serw o w an y ch w ra z z szum em są zdjęcia lo tn ic ze lasó w ro b io n e w celu o szaco w an ia u b y tk ó w leśnych n a d a n y m terenie. R u d e m o i L u n d (2000) z a p ro p o n o w a li m odel, k tó ry m oże być u ży teczn y w ty m celu, w ykorzystujący liczbę „ k an d y d a tó w n a d rzew a” w idocznych n a zdjęciu. P a ra m e try w arunkow ej funkcji w iary g o d n o ści zo stały oszacow ane z uw zględnieniem tak ic h o d m ia n szum u, ja k

(8)

znikanie p u n k tó w , przem ieszczanie się p u n k tó w o raz pojaw ian ie się p u n k tó w fałszyw ych. T o podejście nie rozw iązuje p roblem u szacow ania faktycznej liczby drzew .

W arty k u le tym z a p ro p o n o w a n o now y algorytm , k tó ry b ezpośrednio szacuje faktyczną liczbę p raw dziw ych drzew . Jedynym koniecznym założeniem je s t założenie o stałej gęstości zalesienia n a d an y m o b szarze lasu. R ezu ltaty uzyskane za p o m o cą no w eg o alg o ry tm u m ożna ocenić ja k o interesujące.