• Nie Znaleziono Wyników

2. Discussion and original elements of the approach

N/A
N/A
Protected

Academic year: 2021

Share "2. Discussion and original elements of the approach"

Copied!
13
0
0

Pełen tekst

(1)

DOI: 10.2478/v10006-010-0021-7

VISUAL SIMULTANEOUS LOCALISATION AND MAP–BUILDING SUPPORTED BY STRUCTURED LANDMARKS

ROBERTB ˛ACZYK, ANDRZEJKASI ´NSKI

Institute of Control and Information Engineering

Pozna´n University of Technology, Pl. Marii Skłodowskiej-Curie 5, 60–965 Pozna´n, Poland e-mail:robert.baczyk@put.poznan.pl

Visual simultaneous localisation and map-building systems which take advantage of some landmarks other than point-wise environment features are not frequently reported. In the following paper the method of using the operational map of robot surrounding, which is complemented with visible structured passive landmarks, is described. These landmarks are used to improve self-localisation accuracy of the robot camera and to reduce the size of the Kalman-filter state-vector with respect to the vector size involving point-wise environment features only. Structured landmarks reduce the drift of the camera pose estimate and improve the reliability of the map which is built on-line. Results of simulation experiments are described, proving advantages of such an approach.

Keywords: mobile robots, navigation, SLAM, artificial landmarks.

1. Introduction

The goal of the research is to investigate the tracking properties of the Simultaneous Localisation And Map- building (SLAM) algorithm, which reconstructs the tra- jectory of a mobile camera navigating in an unknown environment. By tracking properties we understand here the ability to properly extend the camera trajectory based on images of the surroundings; while continuing the cour- se some landmarks escape the field of view but some new appear. Usually such a SLAM system works according to the sequential estimation scheme.

At the beginning of the mission, reference coordina- tes of the starting point have to be initialized. Then fe- atures of the environment are extracted from the current image, and characteristic objects are recognized. As such they are put on the dynamically constructed map, which has been empty so far.1At the next stage, as the camera moves, the following tasks have to be accomplished:

• determination of the updated camera position and orientation (shortly: pose) with respect to the global reference system set at the beginning of the mission,

1In the case of bearing-only SLAM there is the need to bootstrap the system by injecting into the map some known 3D points or by recogni- sing objects of known dimensions.

• refinement of the location of features/objects on the map with respect to the global reference frame.

Moreover, there is usually the need to add new items on the map, which have been discovered from the new camera position. The next step is to move the camera for- ward along a trajectory and to establish a new viewer pose.

Then such a sequence is repeated.

Self-localisation of the robot with a camera fixed on board is a problem tightly coupled with the issue of map building, while navigating in an unknown (new) neighbo- urhood. None can be solved on its own and they have to be worked out simultaneously.

Last years have witnessed substantial progress in mobile robots navigation systems (Durrant-Whyte and Bailey, 2006; Bailey and Durrant-Whyte, 2006; Neira et al., 2008). Most approaches solve the SLAM task by using sequential Bayesian schemes. The Kalman filter is a good example of it (Bar-Shalom et al., 2002). Working implementations of navigation systems based on such an approach emerge (Davison et al., 2007), and it can be ob- served that the quality of solutions is correlated with the type and number of sensors used in a particular implemen- tation (Thrun et al., 2006).

The requested characteristics, fundamental to the working SLAM system, cover its ability to work on-line.

A SLAM system having such a property can be applied

(2)

282

whenever the current camera pose estimate is used within the closed-loop navigational decision scheme of the robot controler.

SLAM can be used to guide autonomous agents (mo- bile robots) on the shop-floor, in hospitals or in offices, to support the mobility of humanoid robots or blind people, and, last but not least, to guide a military warfare. Some other applications areas can include: augmented reality, city-map building or extraterrestrial navigation systems, in which the communication latency prohibits the on-line remote control.

Most of the reported SLAM methods involve multi- ple sensors to observe the environment, while in this pa- per a system using only a single camera is described. With such an approach, hardware complexity is reduced, there- fore system reliability grows and portable SLAM systems involving embedded PC solutions are feasible.

2. Discussion and original elements of the approach

Here, related earlier works will be presented as the back- ground of the contribution.

Single camera SLAM systems reported in (Davison and Murray, 2002; Davison, 2003; Davison et al., 2007) allow exploring compact indoor spaces. A map generated by the SLAM process is sparse there and usually consists of a finite set of point-wise environment features (shortly:

point-features) coordinates. The map is primary used to establish the camera pose with respect to the global refe- rence system.

The problem is that the computational cost of the SLAM algorithm grows with the square of the number of map-items. So it is wise to reduce the system state vec- tor dimension consisting of the camera state vector and the environment map items coordinates. Breaking the up- per bound of the state-vector size can disable the on-line operation of the system, given the constrained on-board computing resources.

The second important issue while navigating is a vie- wer trajectory drift, which complicates the closing of the robot path loop (Clemente et al., 2007). Sometimes, it is necessary to explore the neighbourhood of some referen- ce point along multiple exploratory trajectories. In such a case the viewer prefers to come back to the starting point following a new path in order to acquire some new map items. Loop closing is one of the most important proper- ties required in SLAM since it allows reducing accumula- tion of errors (enables position resetting).

Looking for a solution which would keep the com- plexity of the map under control and compensate for the trajectory drift put on the agenda the issue of a spar- se map management, which consists of not only point- features. To manage the map efficiently it is wise not to rely on the environment point-features only (natural

landmarks), but to account also for purposefully loca- ted artificial simple landmarks (for example, hue-planar, passive patterns). A particular design of such a landmark,2 aimed at providing efficient recognition results, was de- scribed in (B ˛aczyk et al., 2003). Similar solutions are used in augmented reality systems. Such landmarks are intro- duced there because of the critical importance of the pre- cise collocation of coordinates of virtual/artificial objects with the real-scene objects poses. In a SLAM system it is of general interest to establish a pose of the free-flying camera with respect to the current environment model, which is built on-line during the camera travel. By au- gmenting a SLAM system with the ability of structured landmarks recognition some well known limitations of the single camera as a bearing-only SLAM sensor can be overcome.

In known visual SLAM systems it is rare to acco- unt for map items other than natural point-wise image fe- atures. Line segments were accounted for in the system of Smith et al. (2006). There the goal was to improve point-features recognition reliability (important while so- lving the feature correspondence problem in the sequence of scene images). From the viewpoint of filtration process performance, such an extension was not really substantial, as finally the extracted segment endings were again treated simply as point-features.

Edgelet landmarks introduced in (Eade and Drum- mond, 2009) were not a substantial modification as in the- ir algorithm they were treated as point-features as well.

Higher level structures in visual SLAM were also proposed in (Gee et al., 2008). Discovered lines and pla- nes in space were fitted to points contained in the map, but images were not directly used for this purpose. Higher le- vel objects allow reducing the state vector dimension and provide more meaning to the SLAM map as compared to collections of points only. On the other hand, this proce- dure is complex and there are a number of coefficients to be set. The authors mention that their approach can lead to estimates inconsistency and its ability to error reduction is controversial.

In (Castle et al., 2007a ; 2007b), some known planar patterns were used together with point-features. The pla- nar pattern recognition sequence runs on the PC second core as a separate thread at around 1.5 Hz rate. In the re- cognition scheme the authors use SIFT descriptors and the RANSAC method, which are too computationally expen- sive for video frame-rate operation, and thus the insertion of planar structures into the map is delayed. The authors decompose homography (computed as the output of RAN- SAC) between the planar structure and the current image into camera calibration matrix, rotation and translation.

2It is important to distinguish point-features from landmarks (as in some papers they are called landmarks as well). By landmarks we me- an here the structured patterns purposefully introduced to the explored space.

(3)

Extrinsic parameters are then determined using singular value decomposition. Differently enough here we show the advantages of embedding landmark measurement into the filtration process at frame-rate frequency. Landmarks’

rectangular shape and unique code allows the use fast and robust recognition and a pose evaluation method (B ˛aczyk et al., 2003).

Another drawback of the mentioned method (Castle et al., 2007a; 2007b) is that the planar object was repre- sented on the created map merely by three boundary po- ints. It is not clear if the use of three boundary points treated as distinct point-features in space can hold rigid nature of a planar object during the filtration process. It is not mentioned there how the comparison of prediction and measurement is applied in the case of planar structure recognition.

Here we propose to treat planar patterns as a new class of environment objects and, as such, to put them on the map. So the SLAM state vector containing a represen- tation of two categories of features is maintained and in the single system a measurement in the image space for the first and in the 3D space for the second feature type is performed.

The next drawback of the Castle approach is that the covariance matrix (statistically characterising the uncerta- inty of the evaluated planar object position) is not formally obtained. Its diagonal elements were evaluated there only in a trial-and-error experiment, by testing the influence of their values on the convergence of the filtration process.

With our method, landmark position uncertainty with respect to the camera reference frame is formally obtained by propagating the uncertainty of image coordinates.

3. SLAM fundamentals

Assumptions. The following assumptions have been ma- de at the beginning:

• The agent (a hand-held camera or mobile robot with a fixed camera) is moving with three degrees of freedom (planar motion case). Its references are x, y coordinates and orientation angle θ on the plane.

However, the scheme presented in this paper can be easily generalised to the case of six degrees of freedom (free 3D camera motion).

• The only external sensor available in the system is a single camera.

• The guidance of the camera is not known a priori (free motion).

• Camera motion is smooth (continuous acceleration, no jerks).

• Intrinsic camera parameters are known a priori (the camera has been calibrated beforehand).

THE STATE VECTOR ESTIMATE

THE STATE VECTOR COVARIANCE MATRIX

ESTIMATE

PREDICT THE STATE VECTOR

PREDICT THE STATE VECTOR COVARIANCE MATRIX

EVALUATE THE INNOVATION COVARIANCE MATRIX

STATE VECTOR EVALUATED A PRIORI

EXPECTED IMAGE COORDINATES OF POINT-WISE FEATURES

PREDICTED COVARIANCE MATRICES

SEARCH AND RECOGNISE ON THE IMAGE POINT-WISE FEATURES ALREADY ON THE MAP MEASUREMENT RESULT

i.e. IMAGE COORDINATES OF RECOGNISED POINT-WISE FEATURES

UPDATE STATE VECTOR

UPDATE STATE VECTOR COVARIANCE MATRIX

Fig. 1. Cyclic tracking sequence using the EKF scheme in the case of point-wise environment features put on the map.

• The map consists of point-wise, stationary natural features fixed to the observation space and of freely distributed rectangular, planar landmarks.

• The position of landmarks is not known a priori, however, short edges of the landmark frame are kept perpendicular to the motion plane.

• The raw measurement data of the SLAM system are image-coordinates of the point-features and of the landmark corners.

System goals. The SLAM system uses an Extended Kal- man Filter (EKF), working sequentially (Fig. 1) on the consecutive image frames. Its role is to track the observer trajectory, consisting of a position, camera optical axis orientation and its linear and angular velocity. Moreover the EKF tracks fixed features, puts them on the map and steadily refines their coordinates already on the map. The EKF calculates on-line the covariance matrix of measure- ments data to provide the uncertainty measure of tracked variables and of stationary coordinates.

It is a typical situation that camera pose uncertainty grows with the trajectory length; however, the uncertain- ty of the stationary data (such as features and landmarks coordinates) always decreases with the length of the obse- rvation sequence (Dissanayake et al., 2001).

The EKF state-vector, which is critical data in the navi- gation system, can be defined as follows:

x =

cGT θG ˙cGT ˙θGpG1T· · ·

pGnT lG1T β1G· · · lGmT βGm

T . (1)

(4)

284

cG

=(xcGyG c)T

XG

YG ZG

XC

YC

ZC

θG

(xpCypCzpC)T pG=(xG

pyG pzG

p)T cG

XG

YG ZG

X

C

YC

ZC

XL YL ZL

θG

αC lC=(xCl yCl zCl)T

lG=(xlGylGzlG)T βG

(a) (b) (c)

Fig. 2. Geometrical relations in a global reference system between the camera and the point-wise feature (a) and the landmark (b);

landmark template (c).

It consists of the observer trajectory subvector, whose co- ordinates are the observer position on the motion plane cG = 

xGc ycGT

, the orientation θG, the linear veloci- ty subvector ˙cG = 

˙xGc ˙ycGT

and the angular velocity

˙θG. The subsequent elements of the state vector are triples of absolute Cartesian coordinates of n observed point- featurespGi =

xGp

i yGp

i zGp

i

T

, i = 1, . . . , n, fixed to the environment. Next elements of the state vector are poses of m landmarks also expressed in the global reference fra-

me: 

lGjT βjGT

= xGl

j yGl

j zlG

j θG+ αCjT

, (2)

j = 1, . . . , m. In turn, the position of the j-th land- mark expressed in the camera coordinates is denoted as lCj = (xClj ylC

j zlC

j)T and its orientation as αCj. Geometri- cal relations are shown in the Fig. 2.

The state transition model is the following:

ˆx(k + 1|k) = f(ˆx(k|k), nx(k + 1))

=

⎜⎜

⎜⎜

⎜⎜

⎜⎜

⎜⎜

⎜⎜

⎜⎜

⎜⎜

⎜⎜

⎜⎜

⎜⎜

⎜⎜

ˆxGc(k|k) +ˆ˙xGc(k|k) +12n˙x(k + 1)

t ˆyGc(k|k) +ˆ˙ycG(k|k) +12ny˙(k + 1)

t ˆθG(k|k) +ˆ˙θG(k|k) +12nθ˙(k + 1) ˆ˙xGc(k|k) + n˙x(k + 1) t

ˆ˙yGc(k|k) + ny˙(k + 1) ˆ˙θG(k|k) + nθ˙(k + 1) ˆpG1(k|k)

... ˆpGn(k|k) ˆlG1(k|k)

ˆβ1G(k|k) ... ˆlGm(k|k)

ˆβmG(k|k)

⎟⎟

⎟⎟

⎟⎟

⎟⎟

⎟⎟

⎟⎟

⎟⎟

⎟⎟

⎟⎟

⎟⎟

⎟⎟

⎟⎟

.

(3) As velocity changes are not available from direct measu- rements, a simplified constant-velocity model is used. The model accounts for the random noise reflecting the velo- city changesnx(k) =

n˙x(k) ny˙(k) nθ˙(k)T

. In the state vector prediction step, a new camera position is calculated

and the integration of noise is preformed with the trapezo- id averaging rule instead of the rectangle method used in (Davison et al., 2007). That approach provides better co- nvergence of the computations.

It is worth noticing that in (3) only a subvector of the state-vector which represents the camera pose is modified.

Predicted coordinates of the point-features and landmarks are left unmodified (as they are assumed to be stationa- ry) and their estimates are modified only in the filtration update step.

4. SLAM scheme for point-features

4.1. Main algorithm.

The system covariance matrixP is predicted according to the following formula:

P(k + 1|k)= Jf(k)

P(k|k) 0 0 Pnx(k)

JfT(k), (4)

where Jf(k) is the Jacobian matrix of the right side func- tion (3) at the time k andPnx(k) is the covariance matrix of the noisenx(k) mentioned above.

Measurement prediction. Coordinates of point-features put on the map are measured on images. Thus based on the predicted state vectorˆx(k + 1|k), the expected image- coordinates of features have to be evaluated. This is done by the use of the following pin-hole camera model:

ˆhi(k + 1|k) = hi

ˆx(k + 1|k), npi(k + 1)

=

⎜⎜

λuxˆ

C pi(k+1|k) ˆ

zpiC(k+1|k) + uc+ nupi(k + 1) λvyˆ

C pi(k+1|k) ˆ

zpiC(k+1|k) + vc+ nvpi(k + 1)

⎟⎟

⎠ ,

(5) where λu, λvdenote focal lengths for image u, v coordi- nates expressed in pixels, uc, vcdenote image-coordinates of optical center, ˆhi(k + 1|k) denotes the predicted image

(5)

coordinates of the i-th map-point, and the variables

ˆxCpi(k + 1|k) ˆypCi(k + 1|k) ˆzpCi(k + 1|k)

⎠ = ˆpCi (k + 1|k) (6)

denote its coordinates calculated at the(k + 1)-th predic- tion step, expressed in the camera reference frame (see Fig. 2):

ˆpCi (k + 1|k) =CG(k + 1|k)

×

ˆpGi (k + 1|k) −

ˆxGc(k + 1|k) ˆycG(k + 1|k) 0T .

(7) The rotation matrixCG(k + 1|k) can be calculated kno- wing the angle ˆθG(k + 1|k) being the element of the pre- dicted stateˆx(k+1|k). The model (5) accounts for the fact that readouts of image coordinates are corrupted by me- asurement noise vector:npi(k) =

nupi(k) nvpi(k)T . The function hi

ˆx(k + 1|k), npi(k + 1)

is defined with the formulas (5)–(7) and it models the camera in the case of i-th point-features being put on the map.

The innovation vector represents the difference betwe- en the measured and predicted features coordinates in the image domain:

v(k + 1) = h(k + 1) − ˆh(k + 1|k), (8) whereh(k+1) denotes the measurement result at the k+1 step and ˆh(k + 1|k) denotes a result calculated with the state vector estimate predicted at the (k+1)-th step, based on the data from the k-th step, according to (5).

The innovation vector covariance matrix represents measurement and prediction uncertainty. It is obtained as a combination of the state-vector uncertaintyP(k + 1|k) and the measurement uncertaintyPnp(k + 1):

S(k + 1)

= Jh(k + 1) P(k + 1|k) 0 0 Pnp(k + 1)



JhT(k + 1), (9) where Jh(k + 1) is the Jacobian matrix consisting of par- tial derivatives of the functions hi (5) at the (k+ 1)-th step.

At each step, the measurement vectorh(k + 1) can vary its dimension as it is clear that at the current observa- tion pose only some subsets of map elements are visible, while some of them can be undetected. Thus, at each sequ- ential filtration step some rows from the vector ˆh(k +1|k) in (8) and the Jacobian matrix Jh(k + 1) have to be remo- ved, namely, those having no corespondents in the measu- rements vectorh(k + 1).

EKF filtering scheme. In the general filtering scheme, the update for the state vector and its covariance matrix is de- fined with the following formulas:

ˆx(k + 1|k + 1)

= ˆx(k + 1|k) + W(k + 1) v(k + 1), (10)

P(k + 1|k + 1)

= P(k + 1|k) − W(k + 1) S(k + 1)WT(k + 1), (11) while the weighting matrixW(k + 1), appearing in (10) and (11), is established from

W(k + 1)

= P(k + 1|k)∂h

ˆx(k + 1|k), 0

ˆx

T

S−1(k + 1).

(12)

4.2. Point-feature initialization method. The proce- dure aims at calculating a rough position estimate of the new point-feature which appears on the image. Fur- ther, making use of the empirical knowledge on the si- ze of the observed space, one can determine the proba- bility distribution of the spatial position of the feature, which has been detected (as a new one) in the current image. A similar approach was used in (Gee and Mayol- Cuevas, 2006), but a detailed explanation was missing.

A much more complex initialization method discussed in (Civera et al., 2008) contrasts with the one presented be- low.

A point is put on the map with its uncertainty me- asure instantly, at the same step at which the point-feature is selected from the set of points detected in the current viewfield. Assuming that adding a point-feature to the map was undertaken at the instant k, just after the update stage (10), (11), the new, current system state vector has the following value:

ˆxnew(k|k)

= f2

ˆx(k|k), unew(k), vnew(k)

=

 ˆx(k|k)

f1

ˆcG(k|k), ˆθG(k|k), unew(k), vnew(k), 0, 0

 , (13) where f1 represents the inverse camera model. Thus it determines spatial coordinates of such a newly emerging point-feature:

f1

ˆcG(k|k), ˆθG(k|k), unew(k), vnew(k), npnew, ndst

=

ˆxGc (k|k) ˆycG(k|k) 0T

(6)

286

Fig. 3. Geometrical relations used for the evaluation of land- mark rotation.

+GC(k|k)

⎜⎜

⎜⎜

⎜⎜



unew(k)−uc+nupnew(k) 

dstinit+ndst(k)

λu



vnew(k)−vc+nvpnew(k) 

dstinit+ndst(k)

λv

dstinit+ ndst(k)

⎟⎟

⎟⎟

⎟⎟

.

(14) Similarly to (5), this model also accounts for the image measurement noise denoted here as npnew =

nupnew nvpnewT

. The rotation matrixGC(k|k) is cal- culated with ˆθG(k|k) picked from the state vector after the update step. The constant dstinitis empirical. It represents the most likely value of the z-coordinate (in the ZCdirec- tion) of the point-feature. The uncertainty of that coordi- nate is represented by some random variable ndst(k). The system covariance matrix augmented with the new points position uncertainty is calculated from

Pnew(k|k)

= Jf2(k)

P(k|k) 0 0

0 Pnpnew(k) 0

0 0 σn2

dst(k)

⎦ JfT

2(k), (15) where σndst is a standard deviation of ndstwhich has to be evaluated empirically.

5. Introducing landmarks to SLAM

5.1. Determining the landmark pose. As to the land- marks, the measurement result is the pose of the landmark in the global reference frame. This pose is obtained from the image coordinates of the landmark corners. The proces of landmark pose determination is described below.

The landmark rotation angle αC (ref. to Fig. 2) with respect to the camera reference frame can be calculated

from the following formula:

αC = f4(u1, v1, u2, v2, u3, v3, u4, v4)

= atan

 λu

f3(u1, v1, u2, v2, u3, v3, u4, v4)



= atan

λu uvp

 ,

(16)

where the value of the function f3 equals the image co- ordinate uvp of the vanishing point. It is obtained by the crossing of lines fitted to the horizontal landmark sides, as shown in Fig. 3. The arguments of f3are image coordina- tes of the landmark corners u1v1. . . u4v4.

The landmark positionlC =

xCl ylCzlCT

can be cal- culated by using the camera mathematical model:

⎢⎢

λu 0 0 0 0 λv 0 0

0 0 1 0

0 0 0 1

⎥⎥

⎢⎢

( r1 ) xCl ( r2 ) ylC ( r3 ) zlC ( 0 0 0 ) 1

⎥⎥

⎜⎜

xLe yeL zLe 1

⎟⎟

=

⎜⎜

zCeue zeCve zeC

1

⎟⎟

⎠ , where

( r1 ) ( r2 ) ( r3 )

⎦ = CRL, (19) e= 1, 2, 3, 4 stands for the appropriate edge index of the rectangular landmark. To simplify the formulas, it has be- en assumed that the origin of the image coordinate fra- me is in the optical centre of the image (compare (5) and (14) by taking uc = vc = 0). While having the value of αC (16), the elements of the rotation matrix CRL can be calculated. One can shift all expressions to one side of Eqn. (19) and denote by due and dve the obtained diffe- rences:

due = ue

r3

xLe yLe zLeT + zlC

− λu

r1

xLe yeLzeLT + xCl 

, dve = ve

r3

xLe yeLzLeT + zCl 

− λv

r2

xLe yeLzeLT + yCl 

.

(20)

Having the values of dueand dveone can determine a translation vector of the landmarklC=

xCl ylCzCl T by minimizing the sum of the squares of dueand dve, denoted by f5:

f5(u1, v1, . . . , u4, v4, xCl , ylC, zCl )

= f5(X, Θ) = du1

2+ dv1

2+ · · · + du4

2+ dv4

2, (21) where

X = (u1v1u2v2u3v3u4v4) and

Θ = (xCl yCl zCl ).

(7)

f7

ˆcG(k + 1|k), ˆθG(k + 1|k),ˆlG(k + 1|k), ˆβG(k + 1|k), lC(k + 1), αC(k + 1)

= f7(k + 1)

=

⎜⎜

⎜⎜

⎜⎝

xCl (k + 1) yCl (k + 1) zCl (k + 1)

⎠ − CG(k + 1|k)

ˆxGl (k + 1|k) ˆyGl (k + 1|k) ˆzlG(k + 1|k)

⎠ −

ˆxGc(k + 1|k) ˆyGc(k + 1|k)

0

arc sin

 sin



1 2



αC(k + 1) − ˆβG(k + 1|k) − ˆθG(k + 1|k)

⎟⎟

⎟⎟

⎟⎠

, (17)

S(k + 1) =

∂f7(k + 1)

ˆx

∂f7(k + 1)

lC

∂f7(k + 1)

∂αC

⎡⎣ P(k + 1|k) 0 0 P(lC αC)(k + 1)

⎢⎢

⎢⎢

⎢⎢

⎢⎣

∂f7(k + 1)

ˆx

T

∂f7(k + 1)

lC

T

∂f7(k + 1)

∂αC

T

⎥⎥

⎥⎥

⎥⎥

⎥⎦

(18)

At the end, the pose vector of the landmark in the global coordinates frame is established with the function f6:

lG βG



=

⎜⎜

xGl yGl zlG βG

⎟⎟

⎠ = f6(xGc, ycG, θG, xCl , ylC, zlC, αC)

=

⎜⎜

xGc yGc 0

⎠ + GRC

xCl ylC zlC

θG+ αC

⎟⎟

⎠ . (22)

5.2. Uncertainty of the evaluated landmark po- se. The rotation angle αC and the translation vec- tor lC = 

xCl yCl zlCT

of the landmark with re- spect to the camera coordinates frame are both calcu- lated based on the uncertainty of the same data set (u1v1 u2 v2u3v3 u4v4), which are image coordinates of the four landmark corners. Thus, there is a correlation between angle and position errors. A complete covariance matrix can be established in a single step:

P(lC αC)=

⎢⎣

∂g(X, Θ)

Θ

−1∂g(X, Θ)

X Jf4

⎥⎦

· Pnl

∂g(X, Θ)

X

T∂g(X, Θ)

Θ

−1

JfT

4

, (23) where

g(X, Θ) = ∂f5(X, Θ)

Θ

T

,

and the matrixPnl stands for uncertainty measure of the four corners’ image coordinates (it is an8 × 8 matrix in the case of a single landmark). As the translation vector lC is not evaluated explicitly but as a result of iterative computations, translational uncertainty in (23) is calcula- ted according to (Haralick, 2000).

5.3. Landmark innovation vector and its uncertainty.

In the case of a landmark it is wise to compare its predic- ted Cartesian coordinates and rotation angle with the cor- responding measurement. The most frequently encounte- red EKF implementations calculate the innovation vector as a difference of prediction and measurement results. For point-features put on the map, such a definition of the in- novation vector is used (8). For landmarks, the innovation vector is calculated with (17) as the vector function f7. Its covariance (18) is treated as the innovation covarian- ce. Thus, once j landmarks are visible in the image, the matrixS is 4j × 4j.

5.4. Filtering. The updating of the state-vector cova- riance matrixP is performed according to (11). In the ca- se of landmarks, the EKF weight matrixW is calculated in the following way:

W(k +1) = P(k +1|k) ∂f7(k + 1)

ˆx

TS−1(k +1). (24)

The updating of the state-vector estimate is defined with (25):

ˆx(k+1|k+1) = ˆx(k+1|k)−W(k+1) f7(k+1). (25) Compared with (10), in this equation the sum has be- en replaced by the difference. This can be explained in the following way. The estimateˆx enhancement based on the measurement x can be expressed as

ˆxnew= ˆx − k∂f(x, ˆx)

∂ˆx f(x, ˆx), (26) where f(x, ˆx) is an available monotonous comparation function. Such a function attains zero once x and ˆx are equal. The factor k is some positive coefficient, determi- ning the strength of theˆx estimate enhancement. The for- mula (26) is similar to (25). The definiteness of the we- ight matrixW depends on the definiteness of the gradient

∂f7(k + 1)/∂ˆx since the matrices P and S are both posi- tive definite.

(8)

288

5.5. Adding a landmark to the map. In much the sa- me way as in the case of point-features, adding a landmark to the map is performed after the update stage ((25), (11)).

The function adding the landmark to the map is denoted by f8. If the operation takes place at the k-th step, it is defined by the following formula:

ˆxnew(k|k) = f8

ˆx(k|k), xCl (k), ylC(k), zlC(k), αC(k)

=

ˆx(k|k) lG(k) βG(k)

=

 ˆx(k|k)

f6(ˆcG, ˆθG, xCl , ylC, zlC, αC)

 ,

(27)

where f6 is defined in (22). A new covariance matrix of the system can be defined as in (28):

Pnew(k|k)

= Jf8(k)

P(k|k) 0

0 P(lC αC)(k)

JfT

8(k). (28) The complete data processing scheme for landmarks and for point-features can be easily worked out based on the formulas presented so far.

6. Simulation experiment

The goal of the computational experiments is to evaluate the impact that the presence of landmarks has on the co- nvergence of SLAM system operation. To this end, a si- mulator has been designed having the ability to generate runs for a number of variables characterising SLAM sys- tem performance. The simulator also generates a graphi- cal output visualising the system state at any selected time instant.

6.1. Simulation tools. A typical in-door test environ- ment has been used. The dimension of the building inte- rior is6 × 25 m. The orthographic projections of the envi- ronment are presented in Figs. 4 and 7. The ground truth camera pose and its estimate are visualised on those fi- gures as pairs of line-segments forming V shapes, restric- ting their view-fields. The real trajectory is marked with the thick, solid line. The calculated camera trajectory is marked with the thin, dotted line. It is assumed that the navigation space is delimited with vertical walls on which point-features have been distributed randomly. They are marked in Figs. 4 and 7 with grey circles. The distribu- tion density of point-features equals one feature per 0.5 m of the wall running length on average. The height of such features is random in the range(0 ÷ 1.4) m. The camera moves horizontally at a 0.7 m height.

With a single camera only in an unknown environ- ment, the initialization of the SLAM system is not possi- ble. Thus, it is assumed that at the beginning of the mis- sion some four point-features with precisely known coor- dinates are visible (they are marked as circles with gray filling).

Except for point-features, some artificial landmarks were placed in the environment. In the experiment under consideration, a single landmark was fixed to the right- most wall. In Fig. 7 it is marked as a thick black spot. It is assumed that the system is able to extract point-features from images acquired during the mission, as well as to recognise an encountered landmark. In Figs. 4 and 7, the main axes of the uncertainty ellipsoids of the objects loca- tion are represented with black line segments.

In Fig. 5, working step numbers of the system are marked along the real trajectory of the camera. The trial trajectory is the following: the camera moves to the ri- ght along the wall up to the distance of roughly 20 m and then (without turning) it is withdrawn approximately to the starting position.

In Figs. 6 and 8, some variables of the SLAM sys- tem are visualised as a function of the time of navigation.

There are four groups of variables:

(a) the first one, representing the x-coordinates of the re- al position, real velocity vector of the camera and the- ir estimates (which are visualized indirectly as trajec- tory envelopes±3σ wide3),

(b) the second one, comprising the y-coordinates of the position and velocity vector and their uncertainty bands,

(c) the third one, consisting of the real rotation angle and angular velocity of the camera as well as of their es- timated values returned by SLAM (marked also as

±3σ band),

(d) the fourth one, showing some additional parameters characterising the operation of the system, such as the number of point-features recognised from the current image, as well as the number of false nega- tives4, time instants at which a new point-feature has been put on the map5, the number of point-features kept in the map and the time of the maintaince of the earliest point-feature in the set of currently vi- sible point-features.Moreover, in Fig. 8, the number

3It is worth noting that position uncertainty in the experiment under study is relatively small. In the case of they-coordinate uncertainty the band is as narrow as it is invisible on the graph—it is covered by a thick run of a truey-coordinate.

4The number of unrecognised point-features in Figs. 6 and 8 is shown as a negative value of the Y-axis.

5The system is able to put a single point-feature on the map within a single iteration step. During on-line operation 25 iterations per second are performed to keep up with the TV frame frequency standard.

(9)

0 5 10 15 20[m]

0 2 4

0 5 10 15 20[m]

0 2 4 (a)

(b)

Fig. 4. Orthographic projections of the system state at the 920-th (a) and the 2000-th (b) simulation step (without landmarks). The navigation space is delimited with vertical walls (broken lines). Point-features randomly distributed on walls are marked with grey circles. The ground truth camera pose and its estimate are visualised as pairs of line-segments forming V shapes, restricting their view-fields. The real and calculated camera trajectories are marked with a thick solid line and a dotted line. The main axes of the objects’ location uncertainty ellipsoids are marked with black line-segments.

0 2 4 6 8 10 12 14 16 18[m]

0 2

100

300 400 500

600

700 1100 800 1000 900

1300 1200 1400

1500 1600

1800 1700 1900START2000 STOP

Fig. 5. Real camera trajectory with marked filtration steps numbers.

of recognised landmarks at a given time instant and the number of landmarks already put on the map is shown.

Some of the values have been rescalled to make the graph more readable. Scaling factors are described in Figs. 6 and 8 with comments accompanying the appro- priate lines.

Having such a characteristic of SLAM behaviour, one can draw a number of conclusions. For example, while coming back and accounting for features that have been put on the map earlier, the live-span of the oldest extracted feature can be a good measure of the system ability to recognise the locations al- ready visited before. A similar role can be played by a number of recognised/unrecognised point-features

as well as the need of new point-feature initialisation.6 The advantage of the simulation method is also that it can be used to judge the ability of trajectory reconstruc- tion while comparing ground truth data with the estimated ones. This is frequently not possible with real systems if there is no real position of the camera data available. In particular, the obtained results demonstrate the discrepan- cies between the real trajectory of the camera and the un- certainty bounds around the SLAM-estimated trajectory.

The next two subsections present simulation results for two cases: one with only natural point-features ac- counted for and one with the landmark introduced to the environment. The simulations are carried out according to the same scenario, i.e., the trial trajectory to pass is iden- tical, the distribution of point-features is the same and the same noise disturbs the system in both cases.

6The SLAM system attempts to add a new point-feature to the map whenever less than eight point-features have been recognised on the ima- ge from those already on the map.

(10)

290

0 200 400 600 800 1000 1200 1400 1600 1800 2000

3 0 8 20

−1 0 1

−10 0 10

−10 0 10

x-coordinate x-coordinate

uncertainty

10× x-coordinate of the velocity vector

10× x-coordinate uncertainty of the velocity vector estimate y-coordinate 10× y-coordinate uncertainty

of the velocity vector estimate

camera rotation angle

uncertainty of the rotation angle estimate angular velocity

uncertainty of the angular velocity estimate

a new point-feature initialization the number of

recognized point-features

0.01× the age of the oldest recognised point-feature

0.1× the number of point-features in the map

the number of unrecognized point-features

(a)

(b)

(c)

(d)

Fig. 6. Runs of real and estimated variables of the SLAM system in the absence of landmarks:x-coordinates (a) and y-coordinates (b) of the camera position/velocity vector; rotation angle and angular velocity of the camera (c). Estimates are visualized indirectly as trajectory envelopes±3σ. Additional parameters characterising the operation of the system (d).

6.2. Testing the SLAM system without landmarks.

In Fig. 4(a) the system state at the 920-th simulation step is displayed, namely, the distribution of point-features put on the map so far and the camera pose. Two camera models are visible: the right-most one represents the real camera and the second one (closer to the starting point) represents the estimated pose of the camera. Dotted lines marked out- side the nominal field of view of the estimated camera represent the view-field enlargement caused by the ±3σ orientation uncertainty of the camera. At the 920-th si- mulation step, only a single point-feature appeared on the image, and thus the camera orientation uncertainty quic- kly grows. In turn, position uncertainty ellipsoids for the camera and for point-features are symbolically represen- ted by their main axes representing the±3σ range. In the case of the camera, thick lines are visible in the apex of a

V shape. Most frequently only a single axis is visible as the second one is too short.

In Fig. 4(b) the system state at the 2000-th simulation step is displayed (being a final step). From Fig. 6 it can be noticed that the final estimate of the camera pose equals the real pose and the earliest point-features put on the map have been recognized correctly. It is worth noting that, as a consequence of recognising the old point-features already present on the map, the system is able to correct the po- sition estimates of all point-features put on the map. Thus the resulting map at this time step is better than the one available in the middle of the mission.

6.3. Testing the SLAM system in an environment with landmarks. In Fig. 7(a) the SLAM system state at the 920-th step is visualized, while the system is able to re-

(11)

0 5 10 15 20[m]

0 2 4

0 5 10 15 20[m]

0 2 4 (a)

(b)

Fig. 7. Orthographic projections of the system state at the 920-th (a) and the 2000-th (b) simulation step (with a landmark). The navigation space is delimited with vertical walls (broken lines). A single landmark fixed to the rightmost wall is marked as a thick black spot. Point-features randomly distributed on walls are marked with grey circles. The ground truth camera pose and its estimate are visualised as pairs of line-segments forming V shapes, restricting their view-fields. The real and calculated camera trajectories are marked with a thick solid line and a dotted line. The main axes of the objects’ location uncertainty ellipsoids are marked with black line-segments.

cognize the landmark and to put it on the map. The obta- ined map is much better than the one shown in Fig. 4(a).

The estimated camera position and the estimated point- features coordinates are closer here to the ground truth data than in the previous simulation experiment. This is due to the appearance of the landmark within the camera view-field, which allows the more efficient estimation of camera velocity. Due to the visibility of the landmark, the navigation system implicitly receives the information on the underestimation of camera velocity (which was based on point-features data only so far). It is interpreted in the system as the calculated position of the camera is more advanced than the value of the position estimate available before the recognition of the landmark. As a consequen- ce, the camera position as well as the position of all point- features already on the map are corrected. Such system operation is visible between 800-th and 900-th simulation steps in Fig. 8. Especially the correction of the camera x- coordinate estimate is remarkable. Surprisingly enough, the recognition of the landmark results in a camera po- sition estimate uncertainty decrease despite the growing distance of the camera from the starting point of the tra- jectory. However, at the same time period, while the land- mark was absent (Fig. 6), an excessive rise in the camera orientation and y-coordinate uncertainty were observed.

The next remark is that in Fig. 6 between the 1600-th and 1800-th simulation steps one can observe a number of false negatives, while in the case with landmarks (Fig. 8)

during the same time period the system is able to recogni- se almost all the points that are in the field of view and which were previously placed on the map. Such an impro- vement is a consequence of better trajectory reconstruc- tion, especially for the x-coordinate of the camera.

In Fig. 7(b) the system state is visualized at the 2000- th simulation step together with the real and estimated ca- mera motion trajectory. It is evident that in this case, while the system is able to recognize the landmark and to put it on the map, the system-estimated trajectory is more close to the real one. Also the resulting map is of better quality than the one presented in Fig. 4(b), namely, some uncerta- inty elipsoids have the main axes shorter.

7. Conclusions

The reported experiments demonstrate the advantages of embedding landmark measurement into the filtration process at frame-rate frequency. By implementing the landmark-based navigation in the visual SLAM system one is able to get rid of the camera position estimate drift.

Here we have aimed at treating planar patterns as a new class of environment objects and, as such, to include them in the map. So the SLAM state vector containing a repre- sentation of two categories of features is maintained and within the same system measurements in the image space for the first and in 3D space for the second feature type are performed. Also a scheme of image coordinates un-

Cytaty

Powiązane dokumenty

Method Part 2 – Once completed, the breast simulators were tested by breast surgeons (N = 4) during clinical encounters in a breast clinic and rated for similar- ity to patients ’

Prawo Kanoniczne : kwartalnik prawno-historyczny 24/1-2,

Дрогобича головно з урахуванням східнослов’янських (здебільшого українських) елементів. 63 Список метричних книг, подаємо в кінці

AISDE - bank pytań od Komandosa Wrzucam pytania z odpowiedziami(tymi prawdopodobnie dobrymi)..

(c) Calculate the number of members of the fitness club who attend neither the aerobics course (A) nor the yoga

Asystent zajmuje się strzykawką zawierającą woretygen neparwowek podczas procedury wstrzyknięcia podsiatkówkowego oraz kontrolą szybkości wstrzyknięcia, podczas gdy

szy ich wzrost jest związany z obniżeniem entalpi swobod- nej i następuje spontanicznie. Jeżeli energia nie jest usuwa- na z systemu, temperatura wzrośnie ze względu na egzoter-

AUJ, WT II 32, Sprawozdanie z działalności Wydziału Teologicznego w roku akademic- kim 1948/1949; tamże, Sprawozdanie z seminarium Pisma św.. choć bezskutecznie, na urzędników