2. Discussion and original elements of the approach

(1)

DOI: 10.2478/v10006-010-0021-7

VISUAL SIMULTANEOUS LOCALISATION AND MAP–BUILDING SUPPORTED BY STRUCTURED LANDMARKS

ROBERTB ˛ACZYK, ANDRZEJKASI ´NSKI

Institute of Control and Information Engineering

Pozna´n University of Technology, Pl. Marii Skłodowskiej-Curie 5, 60–965 Pozna´n, Poland e-mail:robert.baczyk@put.poznan.pl

Visual simultaneous localisation and map-building systems which take advantage of some landmarks other than point-wise environment features are not frequently reported. In the following paper the method of using the operational map of robot surrounding, which is complemented with visible structured passive landmarks, is described. These landmarks are used to improve self-localisation accuracy of the robot camera and to reduce the size of the Kalman-filter state-vector with respect to the vector size involving point-wise environment features only. Structured landmarks reduce the drift of the camera pose estimate and improve the reliability of the map which is built on-line. Results of simulation experiments are described, proving advantages of such an approach.

Keywords: mobile robots, navigation, SLAM, artificial landmarks.

1. Introduction

The goal of the research is to investigate the tracking properties of the Simultaneous Localisation And Map- building (SLAM) algorithm, which reconstructs the trajectory of a mobile camera navigating in an unknown environment. By tracking properties we understand here the ability to properly extend the camera trajectory based on images of the surroundings; while continuing the cour- se some landmarks escape the field of view but some new appear. Usually such a SLAM system works according to the sequential estimation scheme.

At the beginning of the mission, reference coordinates of the starting point have to be initialized. Then features of the environment are extracted from the current image, and characteristic objects are recognized. As such they are put on the dynamically constructed map, which has been empty so far.¹At the next stage, as the camera moves, the following tasks have to be accomplished:

• determination of the updated camera position and orientation (shortly: pose) with respect to the global reference system set at the beginning of the mission,

1In the case of bearing-only SLAM there is the need to bootstrap the system by injecting into the map some known 3D points or by recognising objects of known dimensions.

• refinement of the location of features/objects on the map with respect to the global reference frame.

Moreover, there is usually the need to add new items on the map, which have been discovered from the new camera position. The next step is to move the camera for- ward along a trajectory and to establish a new viewer pose.

Then such a sequence is repeated.

Self-localisation of the robot with a camera fixed on board is a problem tightly coupled with the issue of map building, while navigating in an unknown (new) neighbourhood. None can be solved on its own and they have to be worked out simultaneously.

Last years have witnessed substantial progress in mobile robots navigation systems (Durrant-Whyte and Bailey, 2006; Bailey and Durrant-Whyte, 2006; Neira et al., 2008). Most approaches solve the SLAM task by using sequential Bayesian schemes. The Kalman filter is a good example of it (Bar-Shalom et al., 2002). Working implementations of navigation systems based on such an approach emerge (Davison et al., 2007), and it can be ob- served that the quality of solutions is correlated with the type and number of sensors used in a particular implemen- tation (Thrun et al., 2006).

The requested characteristics, fundamental to the working SLAM system, cover its ability to work on-line.

A SLAM system having such a property can be applied

(2)

282

whenever the current camera pose estimate is used within the closed-loop navigational decision scheme of the robot controler.

SLAM can be used to guide autonomous agents (mobile robots) on the shop-floor, in hospitals or in offices, to support the mobility of humanoid robots or blind people, and, last but not least, to guide a military warfare. Some other applications areas can include: augmented reality, city-map building or extraterrestrial navigation systems, in which the communication latency prohibits the on-line remote control.

Most of the reported SLAM methods involve multiple sensors to observe the environment, while in this paper a system using only a single camera is described. With such an approach, hardware complexity is reduced, there- fore system reliability grows and portable SLAM systems involving embedded PC solutions are feasible.

2. Discussion and original elements of the approach

Here, related earlier works will be presented as the back- ground of the contribution.

Single camera SLAM systems reported in (Davison and Murray, 2002; Davison, 2003; Davison et al., 2007) allow exploring compact indoor spaces. A map generated by the SLAM process is sparse there and usually consists of a finite set of point-wise environment features (shortly:

point-features) coordinates. The map is primary used to establish the camera pose with respect to the global reference system.

The problem is that the computational cost of the SLAM algorithm grows with the square of the number of map-items. So it is wise to reduce the system state vector dimension consisting of the camera state vector and the environment map items coordinates. Breaking the up- per bound of the state-vector size can disable the on-line operation of the system, given the constrained on-board computing resources.

The second important issue while navigating is a viewer trajectory drift, which complicates the closing of the robot path loop (Clemente et al., 2007). Sometimes, it is necessary to explore the neighbourhood of some reference point along multiple exploratory trajectories. In such a case the viewer prefers to come back to the starting point following a new path in order to acquire some new map items. Loop closing is one of the most important properties required in SLAM since it allows reducing accumula- tion of errors (enables position resetting).

Looking for a solution which would keep the complexity of the map under control and compensate for the trajectory drift put on the agenda the issue of a sparse map management, which consists of not only point- features. To manage the map efficiently it is wise not to rely on the environment point-features only (natural

landmarks), but to account also for purposefully loca- ted artificial simple landmarks (for example, hue-planar, passive patterns). A particular design of such a landmark,² aimed at providing efficient recognition results, was described in (B ˛aczyk et al., 2003). Similar solutions are used in augmented reality systems. Such landmarks are introduced there because of the critical importance of the pre- cise collocation of coordinates of virtual/artificial objects with the real-scene objects poses. In a SLAM system it is of general interest to establish a pose of the free-flying camera with respect to the current environment model, which is built on-line during the camera travel. By au- gmenting a SLAM system with the ability of structured landmarks recognition some well known limitations of the single camera as a bearing-only SLAM sensor can be overcome.

In known visual SLAM systems it is rare to account for map items other than natural point-wise image features. Line segments were accounted for in the system of Smith et al. (2006). There the goal was to improve point-features recognition reliability (important while so- lving the feature correspondence problem in the sequence of scene images). From the viewpoint of filtration process performance, such an extension was not really substantial, as finally the extracted segment endings were again treated simply as point-features.

Edgelet landmarks introduced in (Eade and Drum- mond, 2009) were not a substantial modification as in their algorithm they were treated as point-features as well.

Higher level structures in visual SLAM were also proposed in (Gee et al., 2008). Discovered lines and pla- nes in space were fitted to points contained in the map, but images were not directly used for this purpose. Higher level objects allow reducing the state vector dimension and provide more meaning to the SLAM map as compared to collections of points only. On the other hand, this proce- dure is complex and there are a number of coefficients to be set. The authors mention that their approach can lead to estimates inconsistency and its ability to error reduction is controversial.

In (Castle et al., 2007a ; 2007b), some known planar patterns were used together with point-features. The planar pattern recognition sequence runs on the PC second core as a separate thread at around 1.5 Hz rate. In the recognition scheme the authors use SIFT descriptors and the RANSAC method, which are too computationally expen- sive for video frame-rate operation, and thus the insertion of planar structures into the map is delayed. The authors decompose homography (computed as the output of RAN- SAC) between the planar structure and the current image into camera calibration matrix, rotation and translation.

2It is important to distinguish point-features from landmarks (as in some papers they are called landmarks as well). By landmarks we me- an here the structured patterns purposefully introduced to the explored space.

(3)

Extrinsic parameters are then determined using singular value decomposition. Differently enough here we show the advantages of embedding landmark measurement into the filtration process at frame-rate frequency. Landmarks’

rectangular shape and unique code allows the use fast and robust recognition and a pose evaluation method (B ˛aczyk et al., 2003).

Another drawback of the mentioned method (Castle et al., 2007a; 2007b) is that the planar object was repre- sented on the created map merely by three boundary points. It is not clear if the use of three boundary points treated as distinct point-features in space can hold rigid nature of a planar object during the filtration process. It is not mentioned there how the comparison of prediction and measurement is applied in the case of planar structure recognition.

Here we propose to treat planar patterns as a new class of environment objects and, as such, to put them on the map. So the SLAM state vector containing a represen- tation of two categories of features is maintained and in the single system a measurement in the image space for the first and in the 3D space for the second feature type is performed.

The next drawback of the Castle approach is that the covariance matrix (statistically characterising the uncertainty of the evaluated planar object position) is not formally obtained. Its diagonal elements were evaluated there only in a trial-and-error experiment, by testing the influence of their values on the convergence of the filtration process.

With our method, landmark position uncertainty with respect to the camera reference frame is formally obtained by propagating the uncertainty of image coordinates.

3. SLAM fundamentals

Assumptions. The following assumptions have been ma- de at the beginning:

• The agent (a hand-held camera or mobile robot with a fixed camera) is moving with three degrees of freedom (planar motion case). Its references are x, y coordinates and orientation angle θ on the plane.

However, the scheme presented in this paper can be easily generalised to the case of six degrees of freedom (free 3D camera motion).

• The only external sensor available in the system is a single camera.

• The guidance of the camera is not known a priori (free motion).

• Camera motion is smooth (continuous acceleration, no jerks).

• Intrinsic camera parameters are known a priori (the camera has been calibrated beforehand).

THE STATE VECTOR ESTIMATE

THE STATE VECTOR COVARIANCE MATRIX

ESTIMATE

PREDICT THE STATE VECTOR

PREDICT THE STATE VECTOR COVARIANCE MATRIX

EVALUATE THE INNOVATION COVARIANCE MATRIX

STATE VECTOR EVALUATED A PRIORI

EXPECTED IMAGE COORDINATES OF POINT-WISE FEATURES

PREDICTED COVARIANCE MATRICES

SEARCH AND RECOGNISE ON THE IMAGE POINT-WISE FEATURES ALREADY ON THE MAP MEASUREMENT RESULT

i.e. IMAGE COORDINATES OF RECOGNISED POINT-WISE FEATURES

UPDATE STATE VECTOR

UPDATE STATE VECTOR COVARIANCE MATRIX

Fig. 1. Cyclic tracking sequence using the EKF scheme in the case of point-wise environment features put on the map.

• The map consists of point-wise, stationary natural features fixed to the observation space and of freely distributed rectangular, planar landmarks.

• The position of landmarks is not known a priori, however, short edges of the landmark frame are kept perpendicular to the motion plane.

• The raw measurement data of the SLAM system are image-coordinates of the point-features and of the landmark corners.

System goals. The SLAM system uses an Extended Kal- man Filter (EKF), working sequentially (Fig. 1) on the consecutive image frames. Its role is to track the observer trajectory, consisting of a position, camera optical axis orientation and its linear and angular velocity. Moreover the EKF tracks fixed features, puts them on the map and steadily refines their coordinates already on the map. The EKF calculates on-line the covariance matrix of measurements data to provide the uncertainty measure of tracked variables and of stationary coordinates.

It is a typical situation that camera pose uncertainty grows with the trajectory length; however, the uncertainty of the stationary data (such as features and landmarks coordinates) always decreases with the length of the obse- rvation sequence (Dissanayake et al., 2001).

The EKF state-vector, which is critical data in the navi- gation system, can be defined as follows:

x =

c^G^T θ^G ˙c^G^T ˙θ^Gp^G₁^T· · ·

p^G_n^T l^G₁^T β₁^G· · · l^G_m^T β^G_m

_T . (1)

(4)

284

cG

=(xcGyG c)T

X^G

Y_G ZG

XC

YC

ZC

θ^G

(x_p^Cy_p^Cz_p^C)^T pG=(xG

pyG pzG

p)^T cG

X^G

Y_G ZG

X

C

YC

ZC

X_L Y^L Z_L

θ^G

α^C l^C=(x^Cl y^C_l z^C_l)^T

l^G=(x_l^Gy_l^Gz_l^G)^T β^G

(a) (b) (c)

Fig. 2. Geometrical relations in a global reference system between the camera and the point-wise feature (a) and the landmark (b);

landmark template (c).

It consists of the observer trajectory subvector, whose coordinates are the observer position on the motion plane c^G =

x^G_c y_c^G_T

, the orientation θ^G, the linear velocity subvector ˙c^G =

˙x^G_c ˙y_c^G_T

and the angular velocity

˙θ^G. The subsequent elements of the state vector are triples of absolute Cartesian coordinates of n observed point- featuresp^G_i =

x^G_p

i y^G_p

i z^G_p

i

_T

, i = 1, . . . , n, fixed to the environment. Next elements of the state vector are poses of m landmarks also expressed in the global reference fra-

me:

l^G_j^T β_j^G_T

= x^G_l

j y^G_l

j z_l^G

j θ^G+ α^C_j_T

, (2)

j = 1, . . . , m. In turn, the position of the j-th land- mark expressed in the camera coordinates is denoted as l^C_j = (x^C_l_j y_l^C

j z_l^C

j)^T and its orientation as α^C_j. Geometri- cal relations are shown in the Fig. 2.

The state transition model is the following:

ˆx(k + 1|k) = f(ˆx(k|k), n_x(k + 1))

=

⎛

⎜⎜

⎝

ˆx^G_c(k|k) +ˆ˙x^G_c(k|k) +¹₂n_˙x(k + 1)

t ˆy^G_c(k|k) +ˆ˙y_c^G(k|k) +¹₂n_y_˙(k + 1)

t ˆθ^G(k|k) +ˆ˙θ^G(k|k) +¹₂n_θ_˙(k + 1) ˆ˙x^G_c(k|k) + n˙x(k + 1) t

ˆ˙y^G_c(k|k) + ny˙(k + 1) ˆ˙θ^G(k|k) + nθ˙(k + 1) ˆp^G₁(k|k)

... ˆp^G_n(k|k) ˆl^G₁(k|k)

ˆβ₁^G(k|k) ... ˆl^G_m(k|k)

ˆβ_m^G(k|k)

⎞

⎟⎟

⎠ .

(3) As velocity changes are not available from direct measurements, a simplified constant-velocity model is used. The model accounts for the random noise reflecting the velocity changesnx(k) =

n_˙x(k) ny˙(k) nθ˙(k)_T

. In the state vector prediction step, a new camera position is calculated

and the integration of noise is preformed with the trapezo- id averaging rule instead of the rectangle method used in (Davison et al., 2007). That approach provides better co- nvergence of the computations.

It is worth noticing that in (3) only a subvector of the state-vector which represents the camera pose is modified.

Predicted coordinates of the point-features and landmarks are left unmodified (as they are assumed to be stationary) and their estimates are modified only in the filtration update step.

4. SLAM scheme for point-features

4.1. Main algorithm.

The system covariance matrixP is predicted according to the following formula:

P(k + 1|k)= Jf(k)

P(k|k) 0 0 Pnx(k)

J_f^T(k), (4)

where J_f(k) is the Jacobian matrix of the right side func- tion (3) at the time k andPnx(k) is the covariance matrix of the noisenx(k) mentioned above.

Measurement prediction. Coordinates of point-features put on the map are measured on images. Thus based on the predicted state vectorˆx(k + 1|k), the expected image- coordinates of features have to be evaluated. This is done by the use of the following pin-hole camera model:

ˆhi(k + 1|k) = hi

ˆx(k + 1|k), npi(k + 1)

=

⎛

⎜⎜

⎝ λ_u^x^ˆ

C pi(k+1|k) ˆ

z_pi^C(k+1|k) + uc+ nu_pi(k + 1) λ_v^y^ˆ

C pi(k+1|k) ˆ

z_pi^C(k+1|k) + vc+ nv_pi(k + 1)

⎞

⎟⎟

⎠ ,

(5) where λ_u, λ_vdenote focal lengths for image u, v coordi- nates expressed in pixels, u_c, v_cdenote image-coordinates of optical center, ˆhi(k + 1|k) denotes the predicted image

(5)

coordinates of the i-th map-point, and the variables

⎛

⎝ ˆx^C_p_i(k + 1|k) ˆy_p^C_i(k + 1|k) ˆz_p^C_i(k + 1|k)

⎞

⎠ = ˆp^C_i (k + 1|k) (6)

denote its coordinates calculated at the(k + 1)-th prediction step, expressed in the camera reference frame (see Fig. 2):

ˆp^C_i (k + 1|k) =CRˆ^G(k + 1|k)

×

ˆp^G_i (k + 1|k) −

ˆx^G_c(k + 1|k) ˆy_c^G(k + 1|k) 0_T .

(7) The rotation matrix_CRˆ^G(k + 1|k) can be calculated kno- wing the angle ˆθ^G(k + 1|k) being the element of the predicted stateˆx(k+1|k). The model (5) accounts for the fact that readouts of image coordinates are corrupted by measurement noise vector:npi(k) =

n_u_pi(k) nv_pi(k)_T . The function h_i

ˆx(k + 1|k), npi(k + 1)

is defined with the formulas (5)–(7) and it models the camera in the case of i-th point-features being put on the map.

The innovation vector represents the difference betwe- en the measured and predicted features coordinates in the image domain:

v(k + 1) = h(k + 1) − ˆh(k + 1|k), (8) whereh(k+1) denotes the measurement result at the k+1 step and ˆh(k + 1|k) denotes a result calculated with the state vector estimate predicted at the (k+1)-th step, based on the data from the k-th step, according to (5).

The innovation vector covariance matrix represents measurement and prediction uncertainty. It is obtained as a combination of the state-vector uncertaintyP(k + 1|k) and the measurement uncertaintyP_n_p(k + 1):

S(k + 1)

= Jh(k + 1) P(k + 1|k) 0 0 P_n_p(k + 1)

J_h^T(k + 1), (9) where J_h(k + 1) is the Jacobian matrix consisting of par- tial derivatives of the functions h_i (5) at the (k+ 1)-th step.

At each step, the measurement vectorh(k + 1) can vary its dimension as it is clear that at the current observation pose only some subsets of map elements are visible, while some of them can be undetected. Thus, at each sequ- ential filtration step some rows from the vector ˆh(k +1|k) in (8) and the Jacobian matrix J_h(k + 1) have to be remo- ved, namely, those having no corespondents in the measurements vectorh(k + 1).

EKF filtering scheme. In the general filtering scheme, the update for the state vector and its covariance matrix is de- fined with the following formulas:

ˆx(k + 1|k + 1)

= ˆx(k + 1|k) + W(k + 1) v(k + 1), (10)

P(k + 1|k + 1)

= P(k + 1|k) − W(k + 1) S(k + 1)W^T(k + 1), (11) while the weighting matrixW(k + 1), appearing in (10) and (11), is established from

W(k + 1)

= P(k + 1|k)∂h

ˆx(k + 1|k), 0

∂ˆx

T

S⁻¹(k + 1).

(12)

4.2. Point-feature initialization method. The proce- dure aims at calculating a rough position estimate of the new point-feature which appears on the image. Fur- ther, making use of the empirical knowledge on the size of the observed space, one can determine the proba- bility distribution of the spatial position of the feature, which has been detected (as a new one) in the current image. A similar approach was used in (Gee and Mayol- Cuevas, 2006), but a detailed explanation was missing.

A much more complex initialization method discussed in (Civera et al., 2008) contrasts with the one presented be- low.

A point is put on the map with its uncertainty measure instantly, at the same step at which the point-feature is selected from the set of points detected in the current viewfield. Assuming that adding a point-feature to the map was undertaken at the instant k, just after the update stage (10), (11), the new, current system state vector has the following value:

ˆxnew(k|k)

= f2

ˆx(k|k), unew(k), vnew(k)

=

ˆx(k|k)

f₁

ˆc^G(k|k), ˆθ^G(k|k), u_new(k), v_new(k), 0, 0

, (13) where f₁ represents the inverse camera model. Thus it determines spatial coordinates of such a newly emerging point-feature:

f₁

ˆc^G(k|k), ˆθ^G(k|k), u_new(k), v_new(k), n_p_new, n_dst

=

ˆx^G_c (k|k) ˆy_c^G(k|k) 0_T

(6)

286

Fig. 3. Geometrical relations used for the evaluation of landmark rotation.

+GRˆ^C(k|k)

⎛

⎜⎜

⎝

unew(k)−uc+n_upnew(k)

dstinit+ndst(k)

λu

vnew(k)−vc+n_vpnew(k)

dstinit+ndst(k)

λv

dst_init+ ndst(k)

⎞

⎟⎟

⎠ .

(14) Similarly to (5), this model also accounts for the image measurement noise denoted here as npnew =

n_u_pnew n_v_pnew_T

. The rotation matrix_GRˆ^C(k|k) is calculated with ˆθ^G(k|k) picked from the state vector after the update step. The constant dst_initis empirical. It represents the most likely value of the z-coordinate (in the Z_Cdirec- tion) of the point-feature. The uncertainty of that coordi- nate is represented by some random variable n_dst(k). The system covariance matrix augmented with the new points position uncertainty is calculated from

Pnew(k|k)

= Jf2(k)

⎡

⎣ P(k|k) 0 0

0 P_n_pnew(k) 0

0 0 σ_n²

dst(k)

⎤

⎦ J_f^T

2(k), (15) where σ_n_dst is a standard deviation of n_dstwhich has to be evaluated empirically.

5. Introducing landmarks to SLAM

5.1. Determining the landmark pose. As to the landmarks, the measurement result is the pose of the landmark in the global reference frame. This pose is obtained from the image coordinates of the landmark corners. The proces of landmark pose determination is described below.

The landmark rotation angle α^C (ref. to Fig. 2) with respect to the camera reference frame can be calculated

from the following formula:

α^C = f4(u1, v₁, u₂, v₂, u₃, v₃, u₄, v₄)

= atan

λ_u

f₃(u1, v₁, u₂, v₂, u₃, v₃, u₄, v₄)

= atan

λ_u u_vp

,

(16)

where the value of the function f₃ equals the image co- ordinate u_vp of the vanishing point. It is obtained by the crossing of lines fitted to the horizontal landmark sides, as shown in Fig. 3. The arguments of f₃are image coordina- tes of the landmark corners u₁v₁. . . u₄v₄.

The landmark positionl^C =

x^C_l y_l^Cz_l^C_T

can be calculated by using the camera mathematical model:

⎡

⎢⎢

⎣

λ_u 0 0 0 0 λ_v 0 0

0 0 1 0

0 0 0 1

⎤

⎥⎥

⎦

⎡

⎢⎢

⎣

( r1 ) x^C_l ( r2 ) y_l^C ( r3 ) z_l^C ( 0 0 0 ) 1

⎤

⎥⎥

⎦

⎛

⎜⎜

⎝ x^L_e y_e^L z^L_e 1

⎞

⎟⎟

⎠

=

⎛

⎜⎜

⎝ z^C_eu_e z_e^Cv_e z_e^C

1

⎞

⎟⎟

⎠ , where

⎡

⎣ ( r1 ) ( r2 ) ( r3 )

⎤

⎦ = CR^L, (19) e= 1, 2, 3, 4 stands for the appropriate edge index of the rectangular landmark. To simplify the formulas, it has been assumed that the origin of the image coordinate frame is in the optical centre of the image (compare (5) and (14) by taking u_c = vc = 0). While having the value of α^C (16), the elements of the rotation matrix _CR^L can be calculated. One can shift all expressions to one side of Eqn. (19) and denote by d_u_e and d_v_e the obtained diffe- rences:

d_u_e = ue

r3

x^L_e y^L_e z^L_e_T + z_l^C

− λu

r1

x^L_e y_e^Lz_e^L_T + x^C_l

, d_v_e = ve

r3

x^L_e y_e^Lz^L_e_T + z^C_l

− λv

r2

x^L_e y_e^Lz_e^L_T + y^C_l

.

(20)

Having the values of d_u_eand d_v_eone can determine a translation vector of the landmarkl^C=

x^C_l y_l^Cz^C_l _T by minimizing the sum of the squares of d_u_eand d_v_e, denoted by f₅:

f₅(u1, v₁, . . . , u₄, v₄, x^C_l , y_l^C, z^C_l )

= f5(X, Θ) = du1

2+ dv1

2+ · · · + du4

2+ dv4

2, (21) where

X = (u₁v₁u₂v₂u₃v₃u₄v₄) and

Θ = (x^C_l y^C_l z^C_l ).

(7)

f₇

ˆc^G(k + 1|k), ˆθ^G(k + 1|k),ˆl^G(k + 1|k), ˆβ^G(k + 1|k), l^C(k + 1), α^C(k + 1)

= f7(k + 1)

=

⎛

⎜⎜

⎜⎝

⎛

⎝ x^C_l (k + 1) y^C_l (k + 1) z^C_l (k + 1)

⎞

⎠ − CRˆ^G(k + 1|k)

⎛

⎝

⎛

⎝ ˆx^G_l (k + 1|k) ˆy^G_l (k + 1|k) ˆz_l^G(k + 1|k)

⎞

⎠ −

⎛

⎝ ˆx^G_c(k + 1|k) ˆy^G_c(k + 1|k)

0

⎞

⎠

⎞

⎠

arc sin

sin

1 2

α^C(k + 1) − ˆβ^G(k + 1|k) − ˆθ^G(k + 1|k)

⎞

⎟⎟

⎟⎠

, (17)

S(k + 1) =

∂f₇(k + 1)

∂ˆx

∂f₇(k + 1)

∂l^C

∂f₇(k + 1)

∂α^C

⎡⎣ P(k + 1|k) 0 0 P_(l^C _α^C₎(k + 1)

⎤

⎦

⎡

⎢⎢

⎢⎣

∂f₇(k + 1)

∂ˆx

T

∂f₇(k + 1)

∂l^C

T

∂f₇(k + 1)

∂α^C

T

⎤

⎥⎥

⎥⎦

(18)

At the end, the pose vector of the landmark in the global coordinates frame is established with the function f₆:

l^G β^G

=

⎛

⎜⎜

⎝ x^G_l y^G_l z_l^G β^G

⎞

⎟⎟

⎠ = f⁶(x^G_c, y_c^G, θ^G, x^C_l , y_l^C, z_l^C, α^C)

=

⎛

⎜⎜

⎝

⎛

⎝ x^G_c y^G_c 0

⎞

⎠ + GR^C

⎛

⎝ x^C_l y_l^C z_l^C

⎞

⎠

θ^G+ α^C

⎞

⎟⎟

⎠ . (22)

5.2. Uncertainty of the evaluated landmark po- se. The rotation angle α^C and the translation vector l^C =

x^C_l y^C_l z_l^C_T

of the landmark with respect to the camera coordinates frame are both calculated based on the uncertainty of the same data set (u1v₁ u₂ v₂u₃v₃ u₄v₄), which are image coordinates of the four landmark corners. Thus, there is a correlation between angle and position errors. A complete covariance matrix can be established in a single step:

P_(l^C _α^C₎=

⎡

⎢⎣

∂g(X, Θ)

∂Θ

−1∂g(X, Θ)

∂X J_f₄

⎤

⎥⎦

· P_n_l∗

∂g(X, Θ)

∂X

T∂g(X, Θ)

∂Θ

−1

J_f^T

4

, (23) where

g(X, Θ) = ∂f₅(X, Θ)

∂Θ

T

,

and the matrixP_n_l stands for uncertainty measure of the four corners’ image coordinates (it is an8 × 8 matrix in the case of a single landmark). As the translation vector l^C is not evaluated explicitly but as a result of iterative computations, translational uncertainty in (23) is calculated according to (Haralick, 2000).

5.3. Landmark innovation vector and its uncertainty.

In the case of a landmark it is wise to compare its predicted Cartesian coordinates and rotation angle with the cor- responding measurement. The most frequently encountered EKF implementations calculate the innovation vector as a difference of prediction and measurement results. For point-features put on the map, such a definition of the innovation vector is used (8). For landmarks, the innovation vector is calculated with (17) as the vector function f₇. Its covariance (18) is treated as the innovation covarian- ce. Thus, once j landmarks are visible in the image, the matrixS is 4j × 4j.

5.4. Filtering. The updating of the state-vector covariance matrixP is performed according to (11). In the case of landmarks, the EKF weight matrixW is calculated in the following way:

W(k +1) = P(k +1|k) ∂f₇(k + 1)

∂ˆx

TS⁻¹(k +1). (24)

The updating of the state-vector estimate is defined with (25):

ˆx(k+1|k+1) = ˆx(k+1|k)−W(k+1) f7(k+1). (25) Compared with (10), in this equation the sum has been replaced by the difference. This can be explained in the following way. The estimateˆx enhancement based on the measurement x can be expressed as

ˆxnew= ˆx − k∂f(x, ˆx)

∂ˆx f(x, ˆx), (26) where f(x, ˆx) is an available monotonous comparation function. Such a function attains zero once x and ˆx are equal. The factor k is some positive coefficient, determi- ning the strength of theˆx estimate enhancement. The formula (26) is similar to (25). The definiteness of the weight matrixW depends on the definiteness of the gradient

∂f₇(k + 1)/∂ˆx since the matrices P and S are both posi- tive definite.

(8)

288

5.5. Adding a landmark to the map. In much the same way as in the case of point-features, adding a landmark to the map is performed after the update stage ((25), (11)).

The function adding the landmark to the map is denoted by f₈. If the operation takes place at the k-th step, it is defined by the following formula:

ˆxnew(k|k) = f8

ˆx(k|k), x^C_l (k), y_l^C(k), z_l^C(k), α^C(k)

=

⎛

⎝ˆx(k|k) l^G(k) β^G(k)

⎞

⎠

=

ˆx(k|k)

f₆(ˆc^G, ˆθ^G, x^C_l , y_l^C, z_l^C, α^C)

,

(27)

where f₆ is defined in (22). A new covariance matrix of the system can be defined as in (28):

Pnew(k|k)

= Jf8(k)

P(k|k) 0

0 P_(l^C _α^C₎(k)

J_f^T

8(k). (28) The complete data processing scheme for landmarks and for point-features can be easily worked out based on the formulas presented so far.

6. Simulation experiment

The goal of the computational experiments is to evaluate the impact that the presence of landmarks has on the convergence of SLAM system operation. To this end, a simulator has been designed having the ability to generate runs for a number of variables characterising SLAM system performance. The simulator also generates a graphi- cal output visualising the system state at any selected time instant.

6.1. Simulation tools. A typical in-door test environment has been used. The dimension of the building inte- rior is6 × 25 m. The orthographic projections of the environment are presented in Figs. 4 and 7. The ground truth camera pose and its estimate are visualised on those fi- gures as pairs of line-segments forming V shapes, restricting their view-fields. The real trajectory is marked with the thick, solid line. The calculated camera trajectory is marked with the thin, dotted line. It is assumed that the navigation space is delimited with vertical walls on which point-features have been distributed randomly. They are marked in Figs. 4 and 7 with grey circles. The distribution density of point-features equals one feature per 0.5 m of the wall running length on average. The height of such features is random in the range(0 ÷ 1.4) m. The camera moves horizontally at a 0.7 m height.

With a single camera only in an unknown environment, the initialization of the SLAM system is not possible. Thus, it is assumed that at the beginning of the mission some four point-features with precisely known coordinates are visible (they are marked as circles with gray filling).

Except for point-features, some artificial landmarks were placed in the environment. In the experiment under consideration, a single landmark was fixed to the rightmost wall. In Fig. 7 it is marked as a thick black spot. It is assumed that the system is able to extract point-features from images acquired during the mission, as well as to recognise an encountered landmark. In Figs. 4 and 7, the main axes of the uncertainty ellipsoids of the objects location are represented with black line segments.

In Fig. 5, working step numbers of the system are marked along the real trajectory of the camera. The trial trajectory is the following: the camera moves to the right along the wall up to the distance of roughly 20 m and then (without turning) it is withdrawn approximately to the starting position.

In Figs. 6 and 8, some variables of the SLAM system are visualised as a function of the time of navigation.

There are four groups of variables:

(a) the first one, representing the x-coordinates of the re- al position, real velocity vector of the camera and their estimates (which are visualized indirectly as trajectory envelopes±3σ wide³),

(b) the second one, comprising the y-coordinates of the position and velocity vector and their uncertainty bands,

(c) the third one, consisting of the real rotation angle and angular velocity of the camera as well as of their estimated values returned by SLAM (marked also as

±3σ band),

(d) the fourth one, showing some additional parameters characterising the operation of the system, such as the number of point-features recognised from the current image, as well as the number of false negatives⁴, time instants at which a new point-feature has been put on the map⁵, the number of point-features kept in the map and the time of the maintaince of the earliest point-feature in the set of currently visible point-features.Moreover, in Fig. 8, the number

3It is worth noting that position uncertainty in the experiment under study is relatively small. In the case of they-coordinate uncertainty the band is as narrow as it is invisible on the graph—it is covered by a thick run of a truey-coordinate.

4The number of unrecognised point-features in Figs. 6 and 8 is shown as a negative value of the Y-axis.

5The system is able to put a single point-feature on the map within a single iteration step. During on-line operation 25 iterations per second are performed to keep up with the TV frame frequency standard.

(9)

0 5 10 15 20[m]

0 2 4

0 5 10 15 20[m]

0 2 4 (a)

(b)

Fig. 4. Orthographic projections of the system state at the 920-th (a) and the 2000-th (b) simulation step (without landmarks). The navigation space is delimited with vertical walls (broken lines). Point-features randomly distributed on walls are marked with grey circles. The ground truth camera pose and its estimate are visualised as pairs of line-segments forming V shapes, restricting their view-fields. The real and calculated camera trajectories are marked with a thick solid line and a dotted line. The main axes of the objects’ location uncertainty ellipsoids are marked with black line-segments.

0 2 4 6 8 10 12 14 16 18[m]

0 2

100

300 400 500

600

700 1100 800 1000 900

1300 1200 1400

1500 1600

1800 1700 1900START2000 STOP

Fig. 5. Real camera trajectory with marked filtration steps numbers.

of recognised landmarks at a given time instant and the number of landmarks already put on the map is shown.

Some of the values have been rescalled to make the graph more readable. Scaling factors are described in Figs. 6 and 8 with comments accompanying the appropriate lines.

Having such a characteristic of SLAM behaviour, one can draw a number of conclusions. For example, while coming back and accounting for features that have been put on the map earlier, the live-span of the oldest extracted feature can be a good measure of the system ability to recognise the locations already visited before. A similar role can be played by a number of recognised/unrecognised point-features

as well as the need of new point-feature initialisation.⁶ The advantage of the simulation method is also that it can be used to judge the ability of trajectory reconstruc- tion while comparing ground truth data with the estimated ones. This is frequently not possible with real systems if there is no real position of the camera data available. In particular, the obtained results demonstrate the discrepan- cies between the real trajectory of the camera and the uncertainty bounds around the SLAM-estimated trajectory.

The next two subsections present simulation results for two cases: one with only natural point-features ac- counted for and one with the landmark introduced to the environment. The simulations are carried out according to the same scenario, i.e., the trial trajectory to pass is iden- tical, the distribution of point-features is the same and the same noise disturbs the system in both cases.

6The SLAM system attempts to add a new point-feature to the map whenever less than eight point-features have been recognised on the image from those already on the map.

(10)

290

0 200 400 600 800 1000 1200 1400 1600 1800 2000

3 0 8 20

−1 0 1

−10 0 10

x-coordinate x-coordinate

uncertainty

10× x-coordinate of the velocity vector

10× x-coordinate uncertainty of the velocity vector estimate y-coordinate 10× y-coordinate uncertainty

of the velocity vector estimate

camera rotation angle

uncertainty of the rotation angle estimate angular velocity

uncertainty of the angular velocity estimate

a new point-feature initialization the number of

recognized point-features

0.01× the age of the oldest recognised point-feature

0.1× the number of point-features in the map

the number of unrecognized point-features

(a)

(b)

(c)

(d)

Fig. 6. Runs of real and estimated variables of the SLAM system in the absence of landmarks:x-coordinates (a) and y-coordinates (b) of the camera position/velocity vector; rotation angle and angular velocity of the camera (c). Estimates are visualized indirectly as trajectory envelopes±3σ. Additional parameters characterising the operation of the system (d).

6.2. Testing the SLAM system without landmarks.

In Fig. 4(a) the system state at the 920-th simulation step is displayed, namely, the distribution of point-features put on the map so far and the camera pose. Two camera models are visible: the right-most one represents the real camera and the second one (closer to the starting point) represents the estimated pose of the camera. Dotted lines marked out- side the nominal field of view of the estimated camera represent the view-field enlargement caused by the ±3σ orientation uncertainty of the camera. At the 920-th simulation step, only a single point-feature appeared on the image, and thus the camera orientation uncertainty quic- kly grows. In turn, position uncertainty ellipsoids for the camera and for point-features are symbolically represented by their main axes representing the±3σ range. In the case of the camera, thick lines are visible in the apex of a

V shape. Most frequently only a single axis is visible as the second one is too short.

In Fig. 4(b) the system state at the 2000-th simulation step is displayed (being a final step). From Fig. 6 it can be noticed that the final estimate of the camera pose equals the real pose and the earliest point-features put on the map have been recognized correctly. It is worth noting that, as a consequence of recognising the old point-features already present on the map, the system is able to correct the position estimates of all point-features put on the map. Thus the resulting map at this time step is better than the one available in the middle of the mission.

6.3. Testing the SLAM system in an environment with landmarks. In Fig. 7(a) the SLAM system state at the 920-th step is visualized, while the system is able to re-

(11)

0 5 10 15 20[m]

0 2 4

0 5 10 15 20[m]

0 2 4 (a)

(b)

Fig. 7. Orthographic projections of the system state at the 920-th (a) and the 2000-th (b) simulation step (with a landmark). The navigation space is delimited with vertical walls (broken lines). A single landmark fixed to the rightmost wall is marked as a thick black spot. Point-features randomly distributed on walls are marked with grey circles. The ground truth camera pose and its estimate are visualised as pairs of line-segments forming V shapes, restricting their view-fields. The real and calculated camera trajectories are marked with a thick solid line and a dotted line. The main axes of the objects’ location uncertainty ellipsoids are marked with black line-segments.

cognize the landmark and to put it on the map. The obtained map is much better than the one shown in Fig. 4(a).

The estimated camera position and the estimated point- features coordinates are closer here to the ground truth data than in the previous simulation experiment. This is due to the appearance of the landmark within the camera view-field, which allows the more efficient estimation of camera velocity. Due to the visibility of the landmark, the navigation system implicitly receives the information on the underestimation of camera velocity (which was based on point-features data only so far). It is interpreted in the system as the calculated position of the camera is more advanced than the value of the position estimate available before the recognition of the landmark. As a consequence, the camera position as well as the position of all point- features already on the map are corrected. Such system operation is visible between 800-th and 900-th simulation steps in Fig. 8. Especially the correction of the camera x- coordinate estimate is remarkable. Surprisingly enough, the recognition of the landmark results in a camera position estimate uncertainty decrease despite the growing distance of the camera from the starting point of the trajectory. However, at the same time period, while the landmark was absent (Fig. 6), an excessive rise in the camera orientation and y-coordinate uncertainty were observed.

The next remark is that in Fig. 6 between the 1600-th and 1800-th simulation steps one can observe a number of false negatives, while in the case with landmarks (Fig. 8)

during the same time period the system is able to recognise almost all the points that are in the field of view and which were previously placed on the map. Such an impro- vement is a consequence of better trajectory reconstruc- tion, especially for the x-coordinate of the camera.

In Fig. 7(b) the system state is visualized at the 2000- th simulation step together with the real and estimated camera motion trajectory. It is evident that in this case, while the system is able to recognize the landmark and to put it on the map, the system-estimated trajectory is more close to the real one. Also the resulting map is of better quality than the one presented in Fig. 4(b), namely, some uncertainty elipsoids have the main axes shorter.

7. Conclusions

The reported experiments demonstrate the advantages of embedding landmark measurement into the filtration process at frame-rate frequency. By implementing the landmark-based navigation in the visual SLAM system one is able to get rid of the camera position estimate drift.

Here we have aimed at treating planar patterns as a new class of environment objects and, as such, to include them in the map. So the SLAM state vector containing a repre- sentation of two categories of features is maintained and within the same system measurements in the image space for the first and in 3D space for the second feature type are performed. Also a scheme of image coordinates un-