A description of visual structure: Relations with human information processing mechanisms

(1)

A Description

of

Visual Structure

Ben J. A. Kröse

TR diss

1504

(2)

A Description

of

Visual Structure

(3)

cover:

(4)

of

Visual Structure

PROEFSCHRIFT

ter verkrijging van de graad van doctor

in de technische wetenschappen

aan de Technische Universiteit Delft,

op gezag van de Rector Magnificus,

prof. dr. J.M. Dirken,

in het openbaar te verdedigen

ten overstaan van het College van Dekanen

^".:,rv<sc^\ op dinsdag 21 oktober te 16.00 uur

door

Bernardus Johannes Anthonius

Kröse

geboren te Delft,

natuurkundig ingenieur.

1986

Offsetdrukkerij Kanters B.V.

Alblasserdam

TRdiss^

1504

j

(5)

Dit proefschrift is goedgekeurd door de promotoren prof. dr. ir. E. Backer

(6)

(7)

vu

Preface

To carry out research in which various disciplines meet is an interesting and exiting -but complicated- enterprise. The research described in this thesis contains aspects of image processing and pattern recognition, psychology, and physiology. Being a physics engineer, I frequently had to consult colleagues who have specialized in other disciplines. I feel priviliged to have worked in the Ergonomics group at the Department of Industrial Design Engineering of Delft University of Technology. The diversity of disciplines present in this group has been of great importance to the completion of my research.

I would like to express my gratitude to Prof. R. den Buurman and my direct colleagues Theo Boersema, Jack Gerrissen and Pynan Hoekstra for their valuable comment. I must also mention Prof. E. Backer and Piet Verbeek, who have helped me with their professional advice concerning image processing techniques. Finally, I am most grateful for the support I have received from all those not mentioned here.

(8)

General introduction

1.1 Survey

This thesis deals with the intriguing process of visual perception. All organisms, whether bacteria, plants or animals, must adapt to their environment if they are to survive. An essential prerequisit for survival is the ability to detect structures and events in the surrounding. We call this ability "visual perception" if the animal is sensitive to energy in the form of "light" as medium of information about the environment. The question of how information about the external world is taken in by means of the visual system has been an important issue of study for many centuries. The current state of knowledge of visual perception can be considered to be built up from research within two traditions.

First, there is the physiological tradition, which studies the nervous system directly and tries to find an answer to the question of how the pattern of light falling on receptor cells is transformed by networks of nerve cells into patterns of electrical activity. This approach has provided us with important knowledge concerning the retinal, retino- cortical and cortical processes. Descriptions of the retinal structure, in particular on the distribution of rods and cones, are given by Pirenne (1967). The organization of the different retinal cells (receptor, bipolar, horizontal, amacrine and ganglion cells) into neural networks is described in various text-books on visual perception such as Cornsweet (1970) and Bruce and Green (1985). From the retina, neural signals are transmitted to different parts of the brain. The "primary" visual pathway transmits retinal signals via the optic chiasma and the lateral geniculate nucleus to the visual cortex. The uniformity of the visual cortex together with the inhomogeneity of the retina results in a non-linear mapping of the retina onto the cortical cells (Hubel and Wiesel, 1974; Schwartz, 1977). In view of the enormous amount of literature in this field, a discussion "in extenso" of all physiological findings is beyond the scope of this introduction. Furthermore it must be realized that an understanding of the "hardware" of early visual processes does not necessarily result in an understanding of "perception".

(11)

2

The second approach to visual perception moves away from the physiological level and studies the perceptual experience in relation to properties of the visual stimulus. The focus of much .of this research has been on human perception and has therefore mainly been carried out by psychologists and psychophysicists. In their experiments, psychophysicists generally use simple visual stimuli with well defined physical properties in order to study elementary perceptual phenomena such as detection or discrimination. The early psychophysicists Fechner and Helmholtz determined detection thresholds for intensity and contrast while more recently experiments have been carried out to study other properties of the visual system such as the critical flicker frequency (De Lange, 1958) or the modulation transfer function (Campbell and Robson, 1968). However, when more complex stimuli are used, psychophysicists have always dealt with the problem to describe the stimulus, i.e. what are the features which cause the corresponding perceptual experiences? Theories on the perception of form, may be based on different approaches for the description of form (for example: Leeuwenberg, 1971; Uttal, 1975). The objective of much contemporary

psychophysical research is to develop a (computational) model for the description of the visual stimulus and to relate parameters of the model with perceptual phenomena.

Practical applications of visual perception date back many centuries: the ancient Greek architects were familiar with some of the geometrical illusions and made use of these in projects such as the Parthenon. Nowadays, knowledge about perception is used in many "engineering" disciplines, but in particular in the field of ergonomics: knowledge about perception is a prerequisit for the optimal design of tools for communication between man and his surroundings.

Recent technological developments have caused a shift in the presentation of

information from tactile and auditive to merely visual. The driver of an old Citroen 2C V will, when exceeding the 100 km/h limit, be reminded of his speed by a variety of processes, not only visual but in this case especially tactile and auditive. If this driver borrows the car of his father-in-law, a Citroen CX Pallas, only the display of a small (say V-2° visual angle) numeral in a window indicates his speed, which may lead to more or less dramatic "suprises". Also the operator in a large control room and the modern aircraft pilot are confronted with an abundance of visually presented

information. The ergonomist's crucial problem concerns the design of displays in such a way that the presented information is correctly detected, discriminated and understood.

(12)

An increasing interest in the study of human vision is apparent in the relatively young disciplines of pattern recognition, image processing and robot vision. Granrath (1981) discusses the role of human visual models in digital image processing and points out that during the last decades a symbiotic relationship between psychophysicists and image processing engineers has developed: as image processing capabilities have grown, so has the complexity of models for the description of human visual behaviour. An illustration of this relationship is the vast amount of image processing algorithms which may be used by the psychophysicist to describe his stimulus material.

1.2 On this thesis

The objective of the research which is described in this thesis is to answer the questions posed by the psychophysicist or by the ergonomist: (1) What is the nature of the basic characteristics of the visual data which yield perceptual experiences and (2) How can these characteristics be manipulated to modulate, in a controlled manner, the transfer of visual information?

In order to answer the first question, a model for the description of the visual data has to be developed. It is obvious that such a model must comply with knowledge on the human information processing.

Visual information will be processed on different locations along the human visual pathway. Although a strict separation into a series of subprocesses cannot be done a-priori, examination of contemporary experimental literature suggests that multiple stage information processing is operational in human vision.

Our main point of interest is the information processing at the input of the visual system. Current models on early information processing propose an automatic, nonselective, parallel mechanism for the extraction of "features" (Treisman and Gelade, 1980) or "textons" (Julesz, 1981). The perceptual process at the input of the visual system can thus be considered as the organization of the visual data into a representational model in which the basic features of the data can be described. In Chapter 2 of this thesis a model will be introduced which is based on redundancies within the visual data. The notion of structure is equated with non- randomness in the visual data, in particular non-randomness in spatial relations between pattern points. Knowledge concerning the "hardware" of the early visual processor, i.e. the

(13)

4

non-uniform distribution of the human image sampling units, has been incorporated in the model.

The specific representation of image structure enables us to distinguish between structures at different levels of detail. In Chapter 2 it will be shown that the structure description is particularly useful for the description of visual stimuli used in research on the order of structure level in visual processing (Navon, 1977; Kinchla et al.,

1983). In the same chapter an example of an application in the field of image processing will be given: through manipulations in the structure representation space we are able to indicate which points are important for image structure at a certain level of detail.

Not in all situations a global description of structure is preferable. Many recent theories on texture or pattern discrimination are based on the description of local features. In the last section of Chapter 2 a measure of structure dissimilarity will be introduced, based on local structure descriptions.

Research on the human visual information intake parameters will be discussed in the subsequent chapters. Results of experiments with briefly presented stimuli, described in Chapter 3, show that it seems justified to relate the detectability of a target pattern in a field of background patterns to the value of the structure dissimilarity measure for target and background pattern. Recent theories on early information processing suggest that as a result of the limited capacity of the visual system, an "orienting" or "focussing" of (spatial) attention preceeds detection of the target (Posner, 1980; Julesz, 1981). This orienting of attention can be a conscious, concept driven process or an unconscious, stimulus driven process. The structure dissimilarity measure can be considered as an stimulus factor influencing the attentional process which performs information selection. In the case of briefly presented stimuli this attentional process will be carried out by a covert mechanism (focal attention) while an overt process which involves eye movements can be used if the stimuli are presented longer. Parameters of eye movements will be discussed in Chapter 4 in relation to the structure dissimilarity measure and in relation to earlier results on target detectability by means of focal attention. It will be conjectured that a study of the parameters of the covert attentional mechanism is an essential prerequisit for correct interpretation of eye movements.

Finally, an application of theories on pattern discrimination is shown by means of an experiment in which the discriminability of pictographic symbols is studied.

(14)

<L Description of the structure model

2.1 Introduction

As stated in the general introduction of this thesis, the visual information processing at the input of the visual system is considered to be a process of organizing the raw sense data into a representation from which the basic structural features of the data can be extracted.

This approach seems similar to the first stage of the widely used scheme to describe an automatic pattern recognition system. Duda and Hart (1973) divide the overall pattern recognition problem into the separate problems of sensing, feature extraction and optimal decision making. Features can be extracted directly from the sense data or, after a transformation of the sense data, from the information in the new representational domain.

Gerrissen (1982) points out that structure is not an objective quantity but that a given measure of structure can only be applied within a context induced by the nature of the structure analysis. In order to describe the process of structure extraction within the perceptual process he uses the characterization of systems in terms of a hierarchy of epistemological levels as introduced by Klir and Uyttenhove (1977).

Sense data from a given data system at a low epistemological level are transformed into a structure system at a higher epistemological level by means of a set of structure identification functions. The choice of the data system and of a particular structure identification function defines the representation of the visual scene in the structure system.

In attempting to develop a model for the description of visual structure we face two questions. First of all, what is the nature of the sense data which are represented in the data system and second what sort of structure identification function must be applied to describe the structure?

Results of experimental research on the perception of structure such as texture discrimination (Julesz ,1962) or the detection of Moiré patterns (Glass and Switkes,

(15)

6

1976; Prazdny,1984) show that structure is only perceived if the pattern points have equal or adjacent values for their visual quality (luminance, colour, contrast etc.). The data systems, as a basis for our structure system, are seen as the "slices" of perceptual adjacency after the decomposition of the visual scene into categories of points having similar values for their visual qualities.

The second question can be answered by equating our notion of structure with non randomness in the sense data. For an efficient transfer of visual information the visual system should act upon redundancies within the sense data. The redundancy in an image depends on the predictability of the relations between the image points. Total predictability, i.e. certainty about the positions of all points after observing only one point occurs when the total correlation of the image has been determined. Watanabe (1960) shows that this total correlation can be decomposed in terms of lower order correlations and their interrelations.

The detection of structure in the sense data of the data system now becomes a matter of detecting non uniformity in the various probability distributions of points (first order), point pairs (second order) or higher order point configurations.

Mathematical formulations using spatial relations between image points of similar visual quality have proven to be very useful in predicting perceptual phenomena. Uttal (1975) uses the autocorrelation function in his description of form and gives speculations on neurophysiological autocorrelators. Moore et al. (1975) use an autocorrelation-like structure function to describe features of a visual scene and Julesz (Julesz et al.,1973) uses second order statistics in his, now abandoned, "iso-dipole statistics" theories on human texture discrimination. Stevens (1978) uses virtual lines in his computation of locally parallel structure as determinant of the perception of "Moiré patterns".

In this chapter a model for the description of visual structure based on second order relations between points in the data system is introduced.

A self-similar stack model is used as image sampling system. Such a model is strongly supported by results of psychophysical experiments (Koenderink and Van Doorn, 1978) and it has been shown that a self-similar stack model enables efficient procedures for the calculation of spatial correlations (Burt, 1981). The resulting structure description is independent of position, size and orientation.

Although the main objective of this research is to relate various aspects of human visual perception to parameters of the structure model, in this chapter an example

(16)

will be given of an application of the structure model in the field of image processing.

From the huge amount of visual information which is presented to the human observer, only those parts will be processed which are essential for fulfilling a certain task. Two attentional mechanisms play an important role in this process. First, it has been shown (Kinchla et al. ,1983) that the observer is able to direct his attention towards a processing of structure on a certain level of detail. In Section 2.3 the findings of Kinchla and other authors are discussed in the context of the structure description. A second attentional mechanism is the observer's ability to direct the attention to a given location in the visual field. The latter can be a covert process ('focal attention', as introduced by Julesz (1981)), or an overt process which involves eye movement. This spatial attention is closely related to structural differences in the visual field. In Section 2.5 a more elaborate discussion on local structure will be given, while in the Chapters 3 and 4 the experiments which have been carried out to test relations between local structure features and spatial attention will be discussed.

2.2 The structure feature space

To arrive at a formal description of the structure model it is necessary to define the sense data in a data system. As mentioned in the introduction of this chapter, the data in a data system consist of image points having identical or adjacent visual quality. Such a binary pattern P can be described by the characteristic function g(x,y) with

g(x,y) = 1 for (x,y) e P,

(2.1)

g(x,y) = 0 for (*,y) e P.

The structure analysis is carried out within a retinal region R, defined to be a finite, convex region on the image plane. All patterns are assumed to fall within a

(17)

8

2.2.1 Second order structure

In order to describe relations between image points, the 'chord' concept, formerly introduced by Moore (1975), has been used.

A chord is a virtual line between two pattern points, defined by the quadruple

(xy,u,v) where x, y, respectively x+u and y+v are the coordinates of the points as

shown in Figure 2.1. An alternative and useful description can be given by using rectangular polar coordinates and defining a chord by (x,y,r,0).

(x + u, y + v)

• ■-■

-(x,y) u

Figure 2.1. A chord in two different coordinate systems.

The function

h(x,y,u,v) = g(x,y) g(x+u,y+v) (2.2)

characterizes the relation between image points; if two points x,y and x+u, y+v belong to the pattern P, the value of h(x,y,u,v) will be 1, in other cases 0.

The total number of chords in a retinal region R as a function of u and v indicates the presence of second order structure:

x + rcosB,y + rsin 6

(x,y)

f(u,v) = Z X h(x,y,u,v). x,ye R

(18)

It is clear that Eq. 2.3 constitutes the autocorrelation function of the binary pattern P:

f{u,v)= g®g

(2.4) = E E g(x,y) g(x+u,y+v).

x,ye R

The autocorrelation function expressed in polar coordinates is defined by:

ftr,6) = E E h(x,y,r,e) x,y<= R

= E Z g(x,y) g(x + r cos 6, y + r sin 6) (2.5)

x,ye R

and can be considered as the number of chords in the image having the same orientation and length. This autocorrelation function is represented in the (r,0) chord space. Clusters of high intensity in the chord space are characteristic for the structure of the image. We call these clusters "structure features".

Considering the measure of self information associated with the occurrence of f(r,6) chords in the image, Gerrissen (1982) introduces a measure of feature strength. A more elaborate discussion of his model is given in appendix A.

2.2.2 The image sampling system

By means of the chord space a transformation of the visual scene represented in the data system into a representation in the structure system has been obtained. This hierarchy of representational levels is consistent with the processing hierarchy found in the visual system, starting with the cell layers in the retina via the lateral geniculate nucleus into the visual cortex.

At the retinal level, an important sensing process is accomplished by three layers of cells (receptive, bipolar and ganglion cells), which have a complex system of "through" and "across" pathways. Measuring the output of the ganglion cells while stimulating the retina with a small spot of light demonstrates that each ganglion cell has a "receptive field"; a region of the retina which is usually roughly circular, in

(19)

1 °

which stimulation affects the ganglion cell's firing rate.

Boycott et al. (1974) observed three morphological classes of ganglion cells; a, jS and % which are correlated with the electrophysiological types transient (Y), sustained (X) and (W) cells. From these, the X-cells are small and sensitive to continuous stimulation. The X channel is commonly assumed to perform pattern analysis and resolution of fine details.

An important characteristic of the human retina is the inhomogeneous distribution of photoreceptors. Based on considerations from information theory, Gerrissen (1982) proposes a description in which the retina is constructed from concentric rings of sampling units with increasing density in the direction of the centre. Wilson (1983) describes the retina as a receptor matrix consisting of so called 'datafields', of which the diameter increases with increasing distance from the fovea. As field size he uses the diameter of the receptive fields as introduced by Hubel and Wiesel (1974). In the parafoveal area (between 2° and 20° eccentricity) Hubel and Wiesel find an almost linear relation between the average field size d and eccentricity e, both expressed in degrees visual angle:

d = 0.06e. (2.6) A description in which the retina is constructed from concentric rings of receptors

with increasing receptor density in the direction of the centre corresponds with the results of Schwartz (1977) concerning the retinal ganglion cell distribution. This 'sunflower heart' distribution of receptors can be seen as a particular case of a model which represents the retina as a stack of self similar detector arrays (Crettez and Simon, 1982, Koenderink and Van Doorn, 1978). Koenderink introduces a model in which the visual sampling system consists of a number (=40) of sheets, where each sheet contains receptors of similar size. The receptors fill a sheet without overlap or gaps. The sheets have different sizes and can be seen as magnified replicas of each other; the larger the radius Rs of the sheet, the larger the diameter ds

(20)

= X = constant. (2.7)

The smallest sheet is about the size of the human fovea, the largest sheet fills the whole visual field. A schematic representation is given in Figure 2.2a.

This specific configuration of the receptor array implies two types of transformation distortion. The finiteness of resolution of a certain sheet in the stack causes

quantization errors when determining the length and orientation of the chords. Relatively short chords suffer the most from this type of distortion. The second type of transformation distortion is a result of the limited size of the sheet The probability that a chord of a certain length falls within this sheet decreases with increasing chord length, while chords with a length greater than the diameter of the sheet will not be detected by this sheet. From these considerations we assume that each sheet will process chords of a certain length r's, proportional to the radius of the sheet:

r's = aRs. (2.8)

From Figure 2.2b it can be seen that the maximal angular quantization error for this chord is equal to

1.15 ds

A0 = 2 arctg . (2.9) ' ' s

Using (2.7) and (2.8) we find that this angular quantization error is constant. The maximal length quantization error is, as can be seen from Figure 2.2, equal to:

Ar's = 2ds (2.10)

and depends on the sheet in which the chord length is determined. Regarding the relative length quantization error however, it can be shown that this error is independent of the chord length:

(21)

12

Figure 2.2. A schematic "exploded" 'view of the stack distribution as introduced by Koenderink and Van Doorn (1978) (a) and the maximal angular and length quantization errors for a chord with length r' (b).

—: = = constant. (2.11)

r's r's

The quantization of the (r,0) autocorrelation space should be in agreement with these quantization errors. A constant relative length quantization error is obtained by the

(22)

CHORD-LENGTH \m (b) 10. ■ a

Figure 2.3. Simple dot pattern (a), the distribution of the chord frequencies in the chord space (b) and the isometric projection of the chord space (c).

(23)

14

quantization of the logarithm of the chord length with constant steps, since

A In r's = In (r's + Ar's) - In r's

= In (1 + Ar's/r's)

= contstant. (2.12)

0 is quantized in steps of 5° using Hubel and Wiesel's relation (2.6) between

eccentricity and average receptor size.

In Figure 2.3a, a simple pattern is shown, built from dots in a 128 x 128 grid. The corresponding log polar autocorrelation space is given in Figure 2.3b.

The area of a square is proportional to f{r,6). Figure 2.3c shows us the log polar autocorrelation space in isometric projection. For this simple pattern the dominant short chord orientations are observed to be the horizontal (0°) and the vertical (90°) orientations. The longest chords form a cluster around the vertical direction.

2.2.3 Position, size and orientation

As a consequence of using second order relations, the transformation to the autocorrelation space is shift invariant (Figure 2.4a). Changing the size of the pattern, the directions of all occuring chords remain the same while the length of the chords changes. Irrespective of the chord length there will be a fixed shift along the log r axis. Dilation (or shrinking) of the pattern causes the structure features to shift along the log r axis; their configuration remains undisturbed (Figure 2.4b).

When rotating the pattern over an angle 0, the length of the chords does not change but the orientation of all chords will change with this angle <j). A rotation thus causes a shift of the structure features along the orientation axis while again the

configuration remains undisturbed (Figure 2.4c). Due to the finite resolution of the grid in which the pattern is represented, a slight distortion will occur after rotation. As can be seen this distortion concerns only the structure in the short chord area of the autocorrelation space. With respect to the orientation the autocorrelation space is circular; chords with an orientation 6 greater than 180° are identical to chords with an orientation 0-180°. The transform to the autocorrelation space thus is invariant for a

(24)

rotation of the pattern over an angle of 180°.

Because a rotation or dilation of the pattern only causes a shift of structure features in the log polar autocorrelation space while the configuration remains undisturbed, it is easy to see that by means of a second autocorrelation transform (in fact any position invariant transform can be used), we obtain a transform which is position, size and orientation invariant.

(a) (b) (c)

\...

m a&&, EB$5

(25)

16

2.3 Attending to different levels of visual structure

In our visual world, structure can be perceived on many different levels of detail. To indicate the level of structure detail, in contemporary literature frequently the term "level of globality" is used. However, this denomination is confusing; a global structure description such as the chord model comprises both detailed and

non-detailed features. For this reason we will omit the term "globality" and use the term "level of structure" as introduced by Kinchla et al. (1983). Detail information is equated with a "low" level of structure and non-detail information with a "high" level of structure. How does our perceptive system process this multi-level visual information?

Evidence has been found that speed or order of processing is related to the level of the visual structure. Three possibilities for such an ordering have been reported: - lower levels (details) are processed first, followed by progressively higher levels:

a "bottom-up" sequence,

- higher levels first, followed by progressively lower levels, a "top-down" sequence,

- a "middle-out" sequence: some intermediate level first followed by processing of both progressively higher and progressively lower level.

Many of the early theories on human pattern recognition incorporate a bottom-up sequence of processing. Lower level "features" (lines, points, edges etc.) are processed first and subsequently synthesized into progressively higher order forms. Navon (1977) on the other hand finds a "global precedence" effect Using stimuli in which a larger form consisted of smaller forms he found that larger forms are always processed first: subjects were able to respond to larger forms before the more slowly processed smaller forms produced interference, while the opposite did not occur. Subsequent studies employing similar experimental tasks (Miller, 1982) or employing a modified procedure (Boer and Keuss, 1982) have shown interference effects from the larger form as well as interference from the smaller constituents of the larger form.

Kinchla et al. (1983) have argued that processing order is neither consistently bottom-up nor consistently top-down. Rather, forms at some intermediate level of structure are processed first with subsequent processing of forms at both higher and lower level. The level at which the structure is processed first is variable and can be

(26)

chord length „!■ » L J I J I I A I I I I I I I I 0 30 «0 90 120 ISO o r i e n t a t i o n lengt " ^ I I I 1 I I I l " l I I I I I I I I 0 3 0 6 0 9 0 1 2 0 ISO o r i e n t a t i o n chord-length too

mitliil

H H H H H H HHHHH H H H H H H chord-length 100

-I -I -I

ti

It»;.

„ i i i i i i i i ft i i i i i i i 0 30 60 90 120 150 o r i e n t a t i o n F F F F F F FFFFF F F F F F F

I

Lit! i il

_!l

_Si

■li ■ ■ ' « « " i i I J i i . i 1 1 1 1 1 1 1 1 1 1 1 0 30 66 90 120 iso Figure 2.5 Distinguishable configurations of structure features on different structure levels.

(27)

18

manipulated by setting the " Attentional Operating Characteristic" or AOC of the perceiver.

All authors use for their experiments stimuli which consist of a large character built from smaller characters. For some of these stimuli the log polar chord space has been determined. Figure 2.5a and Figure 2.5b respectively show the configuration of structure features for a single H and a single F. In Figure 2.5c a representation is given of the log polar chord space of a large H composed of small H's, while Figure 2.5d shows the log polar chord space of a large H composed of small F's. If a larger form is built from detailed forms we can clearly distinguish the low level and the high level structure feature configurations in the chord space.

Under the hypothesis that a manipulation of the 'AOC' or 'Attentional Frame' (Ward, 1982) causes a change in the perceiver's chord intake characteristic (the length of the chords optimally received can be varied), attention to a certain structure level can be represented by a function which emphasizes structure features around this favoured chord length. The configuration of structure features around the favoured chord length would be taken in better, after which an attentional shift may take place in detailed or non-detailed direction ( shorter or longer chords). By changing his chord intake characteristic, which can be achieved by selecting a specific sheet (or a few adjacent sheets) for the extraction of chords, the perceiver is able to shift his attention from one structure level to another.

2.4 The structure function

In Section 2.2 a transformation of the visual image to the chord space has been introduced. Excluding the invariances mentioned in Section 2.2, the configuration of clusters in the chord space is characteristic for the structure of the image. The presence of structure on different levels, which for instance occurs if the pattern consists of larger and smaller patterns, results in different configurations of structure features which are distinguishable along the log r axis, as has been shown in the previous section. In this section the findings in the chord space are related back to parts or points in the image, yielding an application of the structure model in the field

(28)

of image processing.

Optical autocorrelation techniques have been used for several image processing applications. Fiskin (1977) uses an optical filtering in the autocorrelation space to detect orientation and position of specific projections of protein molecules as observed by electron microscopy. A similar but digital method will be employed to indicate structure at a given level of detail.

In the chord space, clusters of high intensity represent strong partial correlations and are considered as structure features. In order to quantify the contribution of a point

(x,y) to these clusters we define:

C{x,y) = E I Kx,y,r,6) f{r,6). (2.11) r 0

This expression is identical to the definition of the "autocorrelation structure measure" as introduced by Moore et al. (1977).

For many applications we are interested in the structure on a certain level of detail. The contribution of a point (x,y) to the structure at level L (defined by a minimal chord length r^ and a maximal chord length r2) can be expressed by:

CL(x,y) = Z Z h(x,y,r,6)f(r,e)L(r), (2.12) r 9 in which L(r)=l for rx<r<r2, (2.13) = 0 elsewhere.

Figure 2.6a shows a pattern consisting of a large form with noise elements, built from smaller forms. The corresponding log polar chord space is given in Figure 2.6b. The structure function CL(x,y) has been calculated using only features at a low

structure level (rx = 2,r2 = 5) and the result is depicted in Figure 2.6d. The higher

the value for CL(x,y), the larger the size of the squares. Because the dominant micro

(29)

20 ie. (c) ■ ■ M B » ■■■«■■■■■iifilKiM- BB H « p # U B i i ■ * * « t *B Bl K jk ■ ■ • ■ >••• ■ •« • • •■■ »■■■■?ƒ■>■■■■■•»■■ • >■■» ■■•■■ •■ •«■.■ > « ■ B W B M B B V » • • ■ • -urn •■■ ■ • ' • ■ • ■■ ■ • • ■ afSa • • B■ ■ ■ ■ ■ ■ • ■ • • • • •■■•> • ■•*1H- • •■■• •■ •• ■ B B < B > - » « ■ ■ • ■ ■ ■ ■ ¥ • ■ • « • ■ * • ■ ■ ■ ■ ! • • • • BaBB • • • ■ ' ••■•■■ ■ • • ■ T i > • • • ■ ■ ■ • - • • • _ .■m• ••• • • • ■■■ • • * . . B B B ■ ■ • • B B • • B ■Bïï • • (d) ^ 1111111111111111PJ11111111111111111 e , 4 5 ge 135 ORIENTATION ZH H

H

0 H

H

. . H

H

i i

H

Figure 2.6 Dot pattern (a) and corresponding chord space (b). The Cfjix,y), using features at two different structure levels is depicted in (c) and (d). The area of a square is proportional to the value of C^(x,y).

(30)

which form the H's will have a high value for CL(x,y) while points from other small

forms will have low values for CL{x,y).

If we calculate the structure function using structure features at a higher structure level (rj= 12, r2= 35), the pattern points which belong to the large *L' pattern have

high values for Cipc,y), irrespective of the shape of the smaller pattern, while pattern points which do not belong to this large pattern have low values for CL(x,y);

the large pattern can be discriminated from the noise elements (Figure 2.6c).

By means of the structure function we are able to indicate which points are important for the dominant structure at a given level of detail. In image processing this can be used for filtering out noise elements or enhancing the dominant structure of the image.

2.5 Local structure features

In the introduction of this chapter it has been posed that the characteristics of the structure identification function are induced by the context of the structure analysis. The intention of the observer will therefore determine the way the information is processed.

An important objective within the human perception is the segmentation of die visual world into different meaningful parts. The two processes underlying this

segmentation, detection and discrimination, are generally believed to be carried out at an early stage of visual perception. In this section the structure model will be used to characterize parameters which cause texture or pattern discrimination.

Texture has always been an important subject in automatic pattern recognition and image analysis. Haralick (1979) divides texture analyzing algorithms into two categories: a statistical approach and a structural approach. From the statistical point of view texture is defined by a set of statistics extracted from a large ensemble of measurements made on the entire image; a "global" procedure. The structural point of view on the other hand considers texture to be defined by subpatterns which occur repeatedly within the overall pattern.

A similar partitioning can be found in theories on human texture perception. A class of theories is based on the findings of Campbell (1968) who stated that the visual

(31)

22

system is selectively sensitive to various frequency bands of the spatial Fourier spectrum. According to these theories a texture can be described by its spatial frequency components. Our structure model also gives a global representation of image structure, though it has to be emphasized that, as a result of the choice of the data system (image points having similar visual quality), the chord space is not identical to the power spectrum of the image. A specific example of a global description is the statistical approach of Julesz et al. (1973) in which textures were found to be indiscriminable if they match in their (first and) second order statistics. In his collaboration with Julesz, Caelli further investigated this conjecture and developed a series of counter examples; textures which were effortlessly

discriminable in spite of having identical dipole (chord) length statistics. Caelli and Julesz (1978) proposed that, apart from the statistical global features, local figure-like features should play a role in texture discrimination. In a later paper Julesz (1981) abandons the approach of texture as a statistical, global percept and introduces a set of local features, called "textons", which characterize a texture. Does this mean that an autocorrelation approach, and in particular our structure model, is in contradiction with the feature approach?

In order to answer this question we have to focus on the nature of the algorithm which is used for characterization of texture: a global or local approach. The model for the description of visual structure, as introduced in Section 2.2 , gives a representation of the visual structure of the image data which fall within a certain retinal region R. As to the size and nature of this region R, different ideas exist. Moore et al.(1975) and Uttal (1975), who particularly study the structure features of separate patterns such as simple geometric forms or characters, consider the

autocorrelation space of a single pattern; the size of the region R is determined by the size of the pattern.

Julesz (Julesz et al.,1973) in his research on texture discriminability determines the "dipole" characteristic over a much larger region, which consists of a multitude of patterns. His experimental results show that two textures are not (effortless) discriminable if their dipole characteristics are identical. Since the definition of a dipole is identical to our "chord", this implies that for these two textures also the log polar chord spaces are identical, if those chord spaces give a (global) representation of the texture in region R. As result of the nature of the textures, which consist of randomly rotated micro patterns, the orientations of the short chords are random and

(32)

Figure 2.7. Examples of two micropatterns with identical dipole characteristics which yield effortless texture discrimination (after Caelli and Julesz (1978)).

provide no information for texture discrimination. A more correct nomenclature: "dipole length characteristic" is given by Caelli and Julesz (1978), who find combinations of micropatterns composing iso-dipole length textures which are effortlessly discriminable. Local features such as "collinearity", "corner" or "closure" were thought to induce texture discriminability. These findings finally resulted in the texton theory, as presented by Julesz (1981). Textons, restricted to line drawings, are defined to be non-overlapping line segments with specific length, orientation and width, terminators of line segments, crossings of line segments and (virtual) blobs which are formed by flanking line segments. Locations where texton gradients (i.e. differences in textons, or differences in texton densities) occur, are effortless discriminable from the background. By means of numerous experiments Julesz et al. (1981,1983,1985) showed that the texton theory appeared to be a strong tool for predicting texture or pattern discriminability. However, what kind of a process extracts these local features from the sense data?

(33)

24 CHORD-LENGTH I0Q IQ 1 PATTERN : PATTERN: jtTODDDOOI = inODDDDI O ■ • J D O D Q [ D D D O O o a n a ooooa DDDPO DDOOal DODDD 'BDDDIBI ' O B O O D f ~ ] O O O O O l o o a * *IT" J° ■ ■ ■ "i » ■■■' " " * m ■ ■ C • ■ o n • - D D D D D O D oo _«■§

e

i i i i i i i i i i f i i f i i i i i n n i i T i i i 45 98 135 j p p n • • ■ • ■ ■ * ODQI D O D a a oaaaoai DDonDDDaaaDDDi DDDoaaoDonaani □ O D D O O O D D D D O D I D O O D O D D O D O a i D O D O D D D O POOI D D D O O D D □ D D O O D OOI O D D B D B S I i i i i i i i i i i i i11 T i r T i i 98 135 ORIENTATION ORIENTATION

Figure 2.8. Chord space of two patterns which form distinguishable textures.

As mentioned before, Moore et al. (1975) and Uttal (1975) restrict their

autocorrelation analysis to a restricted region R in order to characterize pattern (or form) features. Analogous to this approach, the local chord space can be determined in order to extract texture features. Figure 2.7 shows two of the patterns used by Caelli and Julesz (1978) which cause distinguishable textures with identical dipole length statistics. The log polar chord space representation of the (local) structure of these patterns is given in Figure 2.8. From this representation the chord length distribution can be obtained by integration over ft

(34)

f{r) = !fir,8) dft (2.14) Though for the patterns the/(r) is identical, 1hef(r,6) is clearly not. By integration over 6 and thus considering the distribution of chord lengths only, important structural information has been discarded. In a useful model for pattern or texture discrimination all structural information will have to be used.

Figure 2.9 gives a schematic representation of our model for the processing of local structure.

Figure - ground system

* feature detectors * i i local auto-correlator * feature detectors i i local auto-correlator i feature detectors * local auto-correlator i feature detectors * local auto-correlator

♦I

(35)

26

The model is based on a parallel system of local structure analyzers. The location and the size of the regions R in which the local structure is analyzed can be determined by a layer of the stack model which has large receptors (Figure 2.2) while for the determination of the local structure a layer with higher resolution will be used. After a representation of structure by means of a multitude of local chord spaces/j(r,0), features can be extracted on the basis of which textures or patterns will be discriminated. Which features will be extracted depends upon the nature of the problem. If for instance a part of the image has a different distribution of chord lengths f(r), this./(r) will be an optimal feature: discrimination is caused by different dipole length characteristics. If, on the other hand, the chord orientation in a part of the image differs from the rest, the distribution of chord orientations:

m = lAr,0)dr (2.15)

will be an important feature.

Structure dissimilarity function

In the following chapters of this thesis the detectability of a target pattern embedded in a field of background patterns will be studied. In particular, relations with feature differences will be investigated. In this section a structure dissimilarity function is introduced with which structure differences can be quantified.

Since in the experiments combinations of target and background patterns are used which have identical chord length distribution (Caelli and Julesz (1978)), it is expected that the f[r) will not provide an adequate feature for discrimination. Also the distribution of only chord orientations f(6) will not be a feature on which pattern discrimination can be explained: all patterns are randomly rotated.

For these reasons the entire local chord space of the target ft(r,d) will be compared

with the chord space of the background pattern/b(r,0). Because the measure of

structural dissimilarity has to be independent of orientation of the pattern (all patterns have a random orientation) the chord space/t(r,0) of the target pattern is compared

with the chord space of the rotated background pattern/b(r,0-0). By varying the 0

the best "fit" can be found. The structure dissimilarity function £>tb is defined as the

(36)

* > * =

minZI{/;(r)e)-/b(r,o-0)}2

4> r 6

-, 1/2

I E {/;2(r,Ö) +/b2(r,Ö)}

r 0

(2.16)

As a result of the normalization the D^ can take values between zero and one. A Dfo = 0 occurs if two chord spaces are identical or differ only with respect to their offset on the 0-axis.

By means of this function £>tb, a quantification of structure differences between

target and background is obtained. In the next chapter relations between values of the structure dissimilarity function Dtb and the human ability to detect a target pattern

in a field of background patterns will be studied.

2.6 Discussion

This chapter has shown that in a study of relations between perceptual phenomena and parameters from the structure model a number of approaches can be employed. The current debate on "top-down" or "bottom-up" processing orders, leads to a characterization of visual structure at different levels of detail. The log polar chord space as a representational model of (global) image structure enables us to separate the structure features at these levels of detail.

In Section 2.5 it has been brought forward that a strategy by which the visual information is processed through a parallel system of local structure analyzers will give optimal results in a pattern or texture discrimination task. This model is highly inspired by the models of Treisman and Gelade (1980) or Julesz (1981) who suggest a parallel system of local feature or "texton" extractors operating on the visual information. Julesz states that texton differences or texton density differences act as determinants of texture discrimination, while the exact locations of the textons are of less importance. The latter implies that (local) phase information can be discarded. Our structure model, based on local chord structure, is therefore expected to be particularly suited for the description of features which cause texture or pattern

(37)

28

discrimination. In the following chapters we will discuss experiments which have been carried out to investigate relations between pattern discriminability and local structure differences.

The vast amount of visual information which is presented to the observer will only partially be taken in. In this chapter we have shown that different mechanisms are operational for the selection of the required information. Structure can be taken in according to its level of detail (Section 2.3) or the observer can direct his "focal attention" to select visual information at a certain location (Section 2.5). The parameters of the structure identification function will be tuned to the context of the task for which the perceptual process is meant. This tuning process is a clear example of what is called in psychophysical literature a "concept driven" information processing strategy. A discussion about the way in which the concept determines the parameters of the structure model would go beyond the scope of this thesis. An interesting suggestion is to represent the various "settings" of the system, i.e. level of structure detail, global or local analysis, choice of feature detectors etc. by means of a "conceptual framework" as defined by Minsky (1975).

Two final remarks have to be made about models for early visual information processing which are somewhat similar to our model because they are either based on the non-uniformity of receptors or based on a spatial frequency analysis of the image.

Using experimental data of the retino-cortical projection, Schwartz (1977) proposed that the projection of the central retinal region (up to 20° visual angle) into the visual cortex can be described approximately by a log polar mapping function. There have been discussions whether the log polar mapping plays a role in scale and rotation invariant pattern recognition (Cavanagh, 1981;Schwartz,1981). The transformation into the log polar image domain implies that a rotation or dilatation of the pattern in the image domain will result in a shift in the new representational domain. By means of a second, this time translation invariant, transformation (for example an

R-transform, as proposed by Reitboeck and Altmann (1984)), size and orientation invariant properties are obtained. Compared with our model, which is translation invariant, the most important shortcoming of the Reitboeck and Altmann model is its sensitivity to translations of the image. Size and orientation invariant properties are

(38)

only valid if the pattern is centered or has another suitable normalized position with respect to the origin of the image plane. A shift of the image distorts the

representation in the log polar image domain. It is very unlikely that the human perceptual system, which is able to recognize patterns not projected directly on the fovea, can be described by such a model.

A second remark has to be made about models which are based on a spatial frequency analysis of the image. At the beginning of this chapter the chord concept was defined to be a relation between two image points of identical visual quality (i.e. colour, gray level). A Fourier transform of an image however also takes into account relations between points of different gray level. A chord space representation of a sinusoidal grating would therefore be fundamentally different from the power spectrum of this grating. However, considering the issue of global versus local processing we do find parallels between models based on a spatial frequency approach and our structure model. Though a Fourier transform of the image is a global operation, recent literature suggest that a local frequency analysis is more likely to be carried out. Perizonius et al. (1985) found evidence on the local character of spatial frequency channels in the human visual system. Instead of a description of image structure by means of a (global) Fourier transform, a description in terms of local spatial frequency spectra is presented. Such a description has been proposed already by Gabor (1946) who used what are now called "Gabor elementary functions": sine waves, the amplitude of which is modulated by a Gaussian

envelope. Rentchler et al. (1985) denominates the character of these Gabor functions as a new type of "textons". Though Gabor functions have proved to work well in audition, more research has to be carried out to show its use in visual perception.

(39)

(40)

O Relations between pattern

detectability and local structure

features

3.1 Introduction

In Chapter 2 a model for the description of visual structure has been introduced based on spatial relations between pattern points (chords). Clusters in the chord space, as a representational domain for image structure, indicate global structure in the visual data. The use of a multi-layer, multi-resolution sampling array has resulted in a model which enables us to characterize visual structure at a variable level of detail.

Not in all situations a global description of image structure is preferable. Recent theories on pattern or texture discrimination tend to consider local features as determinants of discriminability. The introduction of a local structure description in Section 2.5 makes it possible to quantify differences between local structures by means of a measure of structural dissimilarity. In this chapter we investigate a possible relation between this structure dissimilarity measure and the human ability to detect a target pattern in a field of background patterns.

As for the perception of form, Zusne (1970) mentions a variety of perceptual tasks: detection, discrimination, recognition, identification and judgement. In a detection task the subject judges whether a stimulus is present or not. In a discrimination task a decision has to be made whether a form is different from some other form or set of forms. In a recognition task the subject judges whether he has seen the form previously. In identification, judgement is made whether a specific stimulus is present or not: the form is named. In the judgemental task judgement is made regarding a specific point on a continuum to which a form should be assigned to. Dember (1966) places these tasks in a hierarchy which, from low to high, runs as follows: detection, discrimination, recognition and identification. The position of the

(41)

32

judgement task is somewhat indeterminate, but presumably located between discrimination and recognition. As Dember points out, the lower tasks need less information than the tasks higher in the hierarchy.

In our research we have restricted ourselves to detection and discrimination of forms. Which perceptual mechanisms take care of these tasks? Trevarthen (1968) distinguishes two processing channels: one "ambient", determining space at large around the body, the other "focal" which examines detail in small areas of space. From experiments with split brain monkeys, Trevarthen found evidence that the two mechanisms: vision of space and vision of object identity were served by

anatomically distinct brain mechanisms.

More recently psychologists have suggested a rather different kind of two-visual-systems theory. A preattentive system is supposed to extract features automatically and in parallel across the visual field while discriminating between objects that are defined by a combination of features requires serial search by focal attention (Treisman and Gelade, 1980; Julesz, 1981).

Based on this division between a parallel, preattentive and a serial, attentive mechanism, Julesz predicts an effortless discrimination of a target pattern from its background if the target differs from the background with respect to the density of local features or "textons". Restricted to line drawings, these textons are defined as: - line segments with given orientation and length;

- terminators, or ends óf line segments; - crossings of line segments.

We are interested whether differences between local structure features of the target pattern and local structure features of the background pattern, expressed in the structure dissimilarity measure D^, affect the detectability of a target pattern embedded in a field of background patterns. To measure the detectability, various methods are available in literature. Treisman and Gelade (1980) use the visual search time, which is the time elapsed between stimulus onset and the subject pressing one of the target-nontarget response keys. Though this method gives a criterium for the detectability of the target pattern, it must be realized that eye movements will occur. Since our structure description is based on a retina model in which a non uniform distribution of the sampling units is supposed, we have to incorporate the influence

(42)

of the (retinal) position of the target. For this reason a method has been chosen in which eye movements do not interfere with information intake. Such a method has been previously described by Bergen and Julesz (1983).

A stimulus, consisting of a hexagonal grid of (background) patterns in which in 60 % of the presentations a target is embedded, is presented briefly (too short to allow for eye movements) and after a variable interval time masked by a stimulus consisting of elements which are the union of the two patterns to be discriminated. The percentage of correct responses (corrected for false alarms) is taken as a measure for the detectabihty.

An important aspect of pattern detectability is the eccentricity of target presentation: the distance of the retinal target position to the fovea. Elaborate research on this aspect has been carried out by Engel (1971), who introduces the 'conspicuity area' as a measure for target detectabihty. This conspicuity area is defined as the retinal area within which a target can be detected with brief stimulus presentation. In our experimental set up the eccentricity of target presentation is considered as an independent variable.

In order to study target detectability as a function of structural dissimilarity between target and background pattern on the one hand, and target position on the other, three experiments have been carried out

The first experiment is designed to investigate target detectabihty as a function of the interval time between stimulus and mask. The stimulus material consists of a number of target background pattern combinations, some of which have been used in the experiments described by Bergen and Julesz (1983) and some patterns for which the previously described structure dissimilarity measure predicts either a high or a low detectabihty. In this experiment also the influence of target eccentricity on the detectability will be studied.

In a second experiment a target background combination is used for which the values of the structure dissimilarity measure can be manipulated. The effect of a gradual increase of the values of the structure dissimilarity measure on the detectabihty of the target will be investigated.

In a third experiment we particularly investigate the role of the eccentricity of the patterns, by a stimulus which consists of only one ring (with a variable diameter) of patterns around the central fixation point.

(43)

34

structure dissimilarity measure £>tb have been determined. In Section 3.5 the values

of £>tb are compared with the experimental results on target detectability. It is

shown that a quantitative relation between the two variables exists.

3.2 Experiment 1

In order to investigate the influence on target detectability of respectively target background combination, eccentricity of target presentation, and interval time between stimulus and mask, an experiment has been carried out in which these parameters were varied. As for the choice of target and background patterns, we have restricted ourselves to simple geometric forms. Some of these patterns were identical to the target background combinations used by Julesz and Bergen (1983) to enable a comparison of the results.

3.2.1 Stimuli

Seven series of stimuli have been composed with different combinations of target and background patterns. Figure 3.1 shows the target-background combinations, as well as their corresponding mask patterns, which were used in the experiment. The first four combinations have been previously used by Julesz & Bergen (1983)

comb. target ground mask 1 2 3 4 5

+ T + IZ X

r r

T A

+

F

rr

=F

v\ *

6 7

O O

-:- +

*■ ©

Figure 3.1. Combinations of target and background patterns and the corresponding mask patterns as used in experiment 1.

(44)

who found a low detectability for combinations 2 and 3 and a high detectability for combinations 1 and 4. These results were attributed to the greater difference in the number of textons between target and background pattern for the combinations 1 and 4 than for the combinations 2 and 3.

Combinations 5 and 6 were chosen to investigate the hypothesis that a high target detectability is always caused by a large difference in textons. For both combinations the number of endpoints and crossings of the target pattern is identical to the number of endpoints and crossings of the background pattern.

Two patterns which do not share identical line segments and differ considerable with respect to their textons (combination 7) were included to serve as a reference combination.

To keep the first order effects constant, for all combinations the number of pattern points (or pixels) of the target pattern was equal to the number of points of the background pattern.

The patterns, each having a random orientation, were arranged in a hexagonal grid with a blank at the central fixation point. The stimulus can be considered as three 'rings' of patterns around the central fixation position, as depicted in Figure 3.2.

3.2.2 Apparatus

The stimuli were displayed on a Barco CD 351 video monitor with short persistence phosphor. The display was controlled by a PDP 11/23 computer via a Matrox video interface. The same PDP 11/23 controlled the experiment and collected the observer responses from a joystick.

The stimuli consisted of white (luminance 7W = 6.366 cd/m2) patterns on a black

(luminance/b = 0.318 cd/m2) background. These luminance values were chosen

after a series of experiments in which the influence of luminance and contrast on the detectabiUty of the target pattern has been studied. A discussion of these experiments is given in appendix B.

The observers were seated at a distance of 0.85 m from the display, yielding an angular subtend of 1 degree for a single pattern and 13.4 degrees for the entire hexagonal stimulus field.

(45)

36

3.2.3 Subjects

The experiment has been carried out with 11 subjects, students of the Department of Industrial Design Engineering, who were paid for their cooperation. All subjects had normal or corrected-to-normal vision.

3.2.4 Procedure

Each trial started with the presentation of a central fixation cross which remained in view for 1 s. The subject was instructed to look at this fixation cross. After a 200 ms interval, the stimulus was presented during 80 ms. The afterimage was erased by the presentation of the mask during 80 ms; the (variable) interval between stimulus onset and mask onset is called stimulus onset asynchrony, or SOA (Figure 3.2). The patterns of the mask were formed by the sum of the target and the background patterns. (a)

r

<

* ^ J

J

r > -i

<

-J

y

^ r ^

r ^ -J • <

u r >

(b)

FF ^

=FI $ ^ i j ü

■

<& =H

^ /£ dd

i / # ^. ^

^ -^ F -^

^ ^ ü

FF 9? dJ <<

th FF ^ '

(c) stimulus SOA mask 80 t ( m s )

(46)

Stimuli were presented in blocks within which the SOA and target background combination were kept constant. During an experimental session five blocks of stimuli with identical target background combination have been used. The first block with an SOA of 680 ms served as a training series. The remaining four blocks were presented with successive shorter SOA: 480, 320, 240 and 180 ms. A block consisted of 70 trials, in which in 60% of the trials a target was present in the array. The position of the target was random, but uniformly distributed over the three rings. In this way an average of 12 target presentations per ring was achieved. Subjects were instructed to press a joystick to the right if they detected a target and to the left otherwise. During the experiment, no feedback to the subject was given. Because the subject guesses in case of uncertainty, the observed percentage of correct responses will be an overestimation of the real percentage of correct responses. The corrected percentage of hits has been calculated with the formula given by Blackwell (1953):

P*(S | s) = {P(S | s) - P(S | n) }/{1 - P(S | n)} (3.1)

in which P(S | s) is the observed percentage of hits and P(S | n) the observed percentage of false alarms. This corrected percentage, averaged over the subjects, is called the 'discrimination factor' <5tb for the target background combination:

*s

5tb = l/Ns Z P*(S\s) (3.2)

i = l

in which Ns is the number of subjects.

A more elaborate discussion on the determination of detection thresholds is given in appendix C.

3.2.5 Results

Figure 3.3 shows, for each target background combination, the discrimination factor 5tb, averaged over the three rings as a function of the SOA.

For small SOA the 5tb increases strongly with increasing SOA, while for larger

(47)

38

combination but equal for the combinations 2 and 5.

The 5tb does not exceed the 50% for three combinations (2,3 and 5) while only for

combination 7 the 5tb reaches the 100%. Since the 5tb for the latter combination

remained 100 % at the selected measuring points in our experiment (SOA= 480, 320,240 and 180 ms) we have included three extra measuring points: SOA= 120, 80 and 60 ms. In order to obtain the last two values we had to use a stimulus presentation time of less than 60 ms; a presentation time of 40 ms was chosen. Also the 5tb of combination 6 remained nearly constant at the selected measuring points;

100 90 80 70 (%) 50 40 30 20 10 0 comb. 1 2 3 4 5 6 7 ground target

r +

r

T

T +

A u.

+ X

-'- 'Z1

+ O

100 200 300 SOA (ms) 400 500

Figure 3.3. The discrimination factor 6lb averaged over the rings, as a function of Stimulus

Onset Asynchrony for the various target background combinations.

for this reason a measurement at SOA = 120 has been included with this combination. The data points in Figure 3.3 are obtained by first, for each subject, determining the mean discrimination factor over the three rings and subsequently averaging over the subjects. To get an impression of the intersubject variations, Table 3.1 gives the values of the discrimination factor together with the values of the standard deviation.

A description of visual structure: Relations with human information processing mechanisms

A Description

of

Visual Structure

Ben J. A. Kröse

TR diss

1504

A Description

of

Visual Structure

of

Visual Structure

PROEFSCHRIFT

ter verkrijging van de graad van doctor

in de technische wetenschappen

aan de Technische Universiteit Delft,

op gezag van de Rector Magnificus,

prof. dr. J.M. Dirken,

in het openbaar te verdedigen

ten overstaan van het College van Dekanen

^".:,rv<sc^\ op dinsdag 21 oktober te 16.00 uur

door

Bernardus Johannes Anthonius

Kröse

geboren te Delft,

natuurkundig ingenieur.

1986

Offsetdrukkerij Kanters B.V.

Alblasserdam

TRdiss^

1504

Preface

Contents

General introduction

1.1 Survey

1.2 On this thesis

<L Description of the structure model

2.1 Introduction

2.2 The structure feature space

1

°

2.3 Attending to different levels of visual structure

mitliil

-I -I -I

ti

It»;.

I

Lit! i il

!l

Si

2.4 The structure function

H

H

0

H

H

H

H

H

H

H

H

H

H

H

2.5 Local structure features

e

♦I

2.6 Discussion

O Relations between pattern

detectability and local structure

features

3.1 Introduction

3.2 Experiment 1

+ T + IZ X

r r

T A

+

F

rr

_!l

_Si