• Nie Znaleziono Wyników

Spatiotonal adaptivity in super-resolution of under-sampled image sequences

N/A
N/A
Protected

Academic year: 2021

Share "Spatiotonal adaptivity in super-resolution of under-sampled image sequences"

Copied!
194
0
0

Pełen tekst

(1)

Spatiotonal Adaptivity in Super-Resolution

of

Under-sampled Image Sequences

Proefschrift

ter verkrijging van de graad van doctor aan de Technische Universiteit Delft,

op gezag van de Rector Magnificus prof. dr. ir. J.T. Fokkema, voorzitter van het College voor Promoties,

in het openbaar te verdedigen op maandag 19 october 2006 om 15:00 uur

door

Pham Quang Tuan

(2)

Samenstelling promotiecommissie:

Rector Magnificus voorzitter

Prof.dr.ir. L.J. van Vliet Delft University of Technology, promotor Dr. K. Schutte TNO Defense, Security and Safety Prof.dr.ir. A. Gisolf Delft University of Technology Prof.dr.ir J. Biemond Delft University of Technology

Prof.dr. A.M. Vossepoel Erasmus MC - University Medical Center Rotterdam and Delft University of Technology

Prof.dr.sc.techn. N. Petkov University of Groningen Prof.dr.ir. F.C.A. Groen University of Amsterdam

Prof.dr. I.T. Young Delft University of Technology, reservelid

This work was partly supported by the IOP Beeldverwerkings project of Senter, Agency of the Ministry of Economic Affairs of the Netherlands, project number IBV99007B.

Advanced School for Computing and Imaging

This work was carried out in graduate school ASCI. ASCI dissertation series number 127. The printing cost of this thesis was partially funded by i-Optics Nederland BV.

ISBN: 90-75691-14-9 c

(3)
(4)
(5)

Summary

Spatiotonal Adaptivity in Super-Resolution

of Under-sampled Image Sequences

In this thesis we investigate the task of improving the resolution of under-sampled image sequences. The limited number of sensors in many commercial digital cam-eras produces aliasing artifacts in images. However, aliasing also stores hidden high-frequency information, which can be used to reconstruct a high-resolution image from multiple low-resolution ones. A multi-frame super-resolution reconstruction algorithm usually involves three steps: image registration, image fusion, and image restoration, each deserves a chapter in this thesis.

In chapter 2, we analyze the registration of shifted image pairs using a gradient-based approach. The bias of the registration is derived and the result is used to construct a new registration algorithm with improved accuracy. This iterative registration algorithm is shown to be optimal under the influence of input Gaussian noise. A similar conclusion is found for the iterative registration of images under planar projective transformation.

A new algorithm for image fusion from irregularly sampled data is proposed in chapter 3. We extend the normalized convolution framework to perform adaptively in both the spatial and the tonal domain. The fusion kernel adapts itself to the local data. It is robust against intensity outliers. This robust normalized convolution technique is shown to have applications in other problems as well.

Using the same spatiotonal adaptivity formulation, chapter 4 extends the bi-lateral filter to a structure-adaptive bibi-lateral filter. Separable implementations of the new filter significantly speed up the execution time as well as improving its noise filtering capability at low signal-to-noise ratio. The efficient filter is especially useful for high-dimensional signals like medical images.

Chapter 5 applies tonal adaptivity to maximum-likelihood image restoration. The use of an intensity-based confidence value per pixel makes the algorithm robust against intensity outliers without the need of regularization. No difficult decision on the tradeoff between detail retention and outlier suppression is required. Moreover, the new restoration algorithm is still optimal under the presence of outliers.

(6)

Chapter 6 takes a completely different approach to the rest of the thesis by de-veloping an example-based super-resolution algorithm for compressed images. Since aliasing information is often lost under heavy compression, the usual reconstruction-based algorithm does not yield a satisfactory result. Our example-reconstruction-based algorithm works directly in the compression domain and is able to produce favorable results given the right texture sources.

(7)

Samenvatting

Ruimte-intensiteit Adaptiviteit in Super-Resolutie

van Onderbemonsterde Beeld Sequenties

In dit proefschrift wordt onderzocht hoe de resolutie verbeterd kan worden van onderbemonsterde beeld sequenties. Het beperkte aantal sensoren in veel com-merci¨ele digitale camera’s veroorzaakt aliasing artefacten in beelden. Echter, alias-ing herbergt ook verborgen hoog frequente informatie, die gebruikt kan worden om een hoog resolutie beeld te construeren van een reeks lage resolutie beelden. Een multi-frame super-resolutie reconstructie algoritme bestaat meestal uit drie stap-pen: beeld registratie, beeld fusie en beeld restauratie. Dit proefschrift wijdt aan elke stap een hoofdstuk.

In hoofdstuk 2 wordt de registratie geanalyseerd van verschoven beeld paren, gebruikmakend van een gradi¨ent-gebaseerde methode. De bias (vaste afwijking) van de registratie wordt afgeleid en gebruikt om een nieuwe registratie methode met hogere nauwkeurigheid af te leiden. Er wordt aangetoond dat deze iteratieve registratie methode optimaal is onder invloed van Gaussische ruis. Eenzelfde con-clusie geldt voor iteratieve registratie van beelden met een planaire projectieve transformatie.

In hoofdstuk 3 wordt een nieuw algoritme voor fusie van onregelmatig bemon-sterde data voorgesteld. Genormaliseerde convolutie wordt uitgebreid met adap-tiviteit in het ruimtelijke en intensiteits domein. De fusie kernel past zichzelf aan de lokale data aan en is robuust voor intensiteit outliers. Er wordt getoond dat deze robuuste genormaliseerde convolutie techniek ook gebruikt kan worden voor andere toepassingen.

Gebruikmakend van dezelfde ruimte-intensiteit adaptieve formulering, wordt in hoofdstuk 4 het bilateraal filter uitgebreid naar een structuur adaptief bilateraal filter. Een scheidbare implementatie van dit nieuwe filter versnelt aanzienlijk de uitvoertijd en verbetert het ruis filterend vermogen voor lage signaal ruis verhoud-ing. Dit effici¨ente filter is bijzonder bruikbaar voor hoog dimensionale signalen zoals medische beelden.

In hoofdstuk 5 wordt ruimtelijke adaptiviteit gebruikt voor maximum-likelihood beeld restauratie. Het gebruik van een intensiteit gebaseerde betrouwbaarheids

(8)

waarde per pixel maakt het algoritme robuust tegen intensiteit outliers zonder reg-ularisatie nodig te hebben. Geen moeilijke afweging is nodig tussen detail behoud en outlier onderdrukking. Bovendien is het nieuwe restauratie algoritme nog steeds optimaal onder de aanwezigheid van outliers.

Hoofdstuk 6 heeft een geheel andere aanpak dan de rest van het proefschrift door de ontwikkeling van een voorbeeld-gestuurd super-resolutie algoritme voor gecom-primeerde beelden. Aangezien aliasing informatie vaak verloren gaat bij zware compressie, geven gebruikelijke reconstructie gebaseerde algoritmes geen bevredi-gende resultaten. Ons voorbeeld-gestuurde algoritme werkt direct in het compressie domein en is in staat om goede resultaten te produceren gebruikmakend van de juiste textuur bronnen.

(9)

Contents

1 Introduction 1 1.1 Problem definition . . . 3 1.2 Literature review . . . 5 1.2.1 Image registration . . . 5 1.2.2 Super-resolution fusion . . . 7 1.2.3 Super-resolution restoration . . . 8 1.2.4 Super-resolution synthesis . . . 9 1.3 Image formation . . . 10 1.4 Thesis organization . . . 12

2 Optimal gradient-based image registration 15 2.1 Cramer-Rao bound for image registration . . . 16

2.1.1 Fisher information matrix for image registration . . . 17

2.1.2 Cramer-Rao bound for 2D shift estimation . . . 18

2.1.3 Cramer-Rao bound for 2D projective registration . . . 19

2.2 Bias of gradient-based shift estimators . . . 20

2.2.1 Shift estimation based on Taylor series expansion . . . 21

2.2.2 Bias of 1D gradient-based shift estimator . . . 22

2.2.3 Bias of 2D gradient-based shift estimator . . . 24

2.3 Optimal pairwise gradient-based registration . . . 25

2.3.1 Iterative gradient-based shift estimator . . . 26

2.3.2 Optimal registration and the aperture problem . . . 27

2.3.3 Optimal 2D projective registration . . . 29

(10)

2.4.1 Effect of aliasing on gradient-based shift estimation . . . 32

2.4.2 Consistent flow by bundle adjustment . . . 34

2.4.3 Registration on a high-resolution fusion frame . . . 37

2.4.4 Multi-frame super-resolution experiment . . . 39

2.5 Summary and future research . . . 40

3 Robust and adaptive normalized convolution 43 3.1 Normalized convolution using polynomial bases . . . 44

3.1.1 Weighted least-squares solution . . . 46

3.2 Robust normalized convolution . . . 47

3.3 Structure-adaptive normalized convolution . . . 50

3.3.1 Estimation of local image structure and scale . . . 50

3.3.2 Structure-adaptive applicability function . . . 52

3.4 Application I: Fusion of irregularly sampled data . . . 54

3.4.1 Super-fusion using robust normalized convolution . . . 54

3.4.2 Super-fusion followed by deconvolution . . . 57

3.5 Application II: JPEG blocking artifact removal . . . 59

3.5.1 Fast structure adaptive filtering . . . 60

3.5.2 Blocking artifact removal algorithm . . . 62

3.5.3 Results and analysis . . . 64

3.6 Conclusion and discussion . . . 65

4 Separable bilateral filtering 69 4.1 How bilateral filtering works . . . 70

4.1.1 Relation to other filtering techniques . . . 72

4.2 Separable filtering along sampling directions . . . 75

4.2.1 Effective kernel of an xy-separable bilateral filter . . . . 76

4.2.2 Anti-aliasing filter construction . . . 79

4.2.3 Performance of xy-separable bilateral filtering . . . . 79

4.3 Separable structure-adaptive bilateral filtering . . . 80

4.3.1 Effective kernel of an uv-separable bilateral filter . . . . 82

(11)

Contents xi

4.4 Applications . . . 84

4.4.1 Preprocessing of medical images . . . 84

4.4.2 Preprocessing for image compression . . . 85

4.5 Conclusion and future research . . . 87

5 Performance of super-resolution restoration 89 5.1 Super-resolution restoration . . . 90

5.1.1 Maximum-likelihood super-resolution . . . 92

5.1.2 Robust maximum-likelihood super-resolution . . . 93

5.1.3 Robust SR with postprocessing noise reduction . . . 96

5.2 Limit of resolution enhancement by restoration . . . 99

5.2.1 Point spread function . . . 100

5.2.2 Noise and registration error . . . 102

5.2.3 A practical limit of super-resolution restoration . . . 103

5.3 Evaluations of super-resolution algorithms . . . 109

5.3.1 Objective performance measures for SR . . . 109

5.3.2 Evaluation results . . . 112

5.4 Conclusion . . . 119

6 Example-based super-resolution in the DCT domain 123 6.1 Influence of DCT compression on SR reconstruction . . . 125

6.1.1 Performance of registration on compressed images . . . 126

6.1.2 Spectrum reduction by DCT-based compression . . . 127

6.2 Super-resolution synthesis in the DCT-domain . . . 128

6.2.1 Example-based approach to video SR . . . 129

6.2.2 Example-based SR in the DCT domain . . . 130

6.3 Results and discussion . . . 135

6.3.1 The importance of a right texture source . . . 136

6.3.2 Future research directions . . . 138

6.4 Conclusions and discussion . . . 140

(12)

7.1 Conclusions . . . 143

7.2 Directions for future research . . . 145

A Histogram-based noise estimation 147 A.1 Robust variance estimation by a histogram fit . . . 147

A.2 Noise estimation using an interpolated frame . . . 150

A.3 Noise estimation from an aliased image sequence . . . 151

B Model-based point spread function estimation 155 B.1 Slanted edge method . . . 155

B.1.1 Adaptive Hough transform for line detection . . . 157

B.1.2 Robust model fit to an edge profile . . . 158

B.2 PSF from arbitrary edges under subpixel localization . . . 159

(13)

Chapter 1

Introduction

Super-Resolution (SR) is a technique to increase the resolution of an image or a sequence of images beyond the resolving power of the imaging system. Contrary to common belief, pixel count (i.e. the number of pixels in an image) is not an appropriate measure of image resolution. Instead, the resolution of a image should be defined as the amount of fine details that are visible. As a result, the industry practice of increasing the pixel count and reducing the pixel size does not always yield an improved resolution [25]. For this solution to succeed, dedicated hardware components such as powerful optics and complex image stabilization mechanisms are required. Furthermore, resolution improvement by sensor miniaturization is limited by the diffraction limit and the amount of incoming light.

Multi-frame SR, on the contrary, is a practical solution for both dedicated and ordinary imaging systems alike. The method uses a set of mutually unregistered images to construct a new image at a higher resolution (e.g., see figure 1.1). SR reconstruction is especially effective for aliased signals, which result from a weaker resolving power of the sensor array compared to the optics. By fusion of several shifted Low-Resolution (LR) images onto a finer grid, the sampling rate of the sensor can be increased in software and a resolution up to the diffraction limit of the optics can be achieved. However, there exists a trade-off between spatial resolution and temporal resolution. Although the diffraction limit can be surpassed by more advanced SR techniques, these often involve explicit knowledge of the point spread function and the reconstructed scenes.

Although it may sound like a pure signal processing solution, SR from an under-sampled image sequence actually occurs in nature. Studies of biological vision show that many living organisms rely on the movements of some parts of the eyes to increase the amount of information acquired. A jumping spider, for example, scans its long and narrow retina back and forth to capture the world by only 1600 photoreceptors from its two main eyes [107]. The human visual system also exhibits micro-trembling movements of the eye, even under perfect fixation, to prevent image

(14)

(a) one of 100 low-resolution frames (320 × 240) (b) two-times higher resolution fusion output

(c) zoom-in of (a) (d) zoom-in of (b)

Figure 1.1: Two-times SR of an MJPEG compressed video by fusion of 100 LR frames. The hand-held camera (Canon Powershot S50) underwent a trembling motion, which yielded randomly shifted and rotated images of the scene.

fade-out in the retina [90]. Due to these micro-saccades, we do not see an aliased version of the world from the under-sampled peripheral retina [83]. Although the human visual system does not literally reconstruct an image of the world at a finer grid, these micro movements allow finer localization of edges [85], which is effectively the same as SR.

Although the original idea may have come from nature, SR by software has been developed to handle more difficult situations than what the human eyes are designed for. For example, the human eyes can fixate on and super-resolve a regularly moving object by a temporal fusion along its smooth motion path. However, we tend to lose track of objects under erratic vibration [179]. A software approach, on the other hand, does not heavily rely on motion prediction to register a moving object. As a result, if the object is large and distinctive enough, it can be segmented and the erratic movement can be corrected.

(15)

1.1. Problem definition 3

explains the mediocre results of an example-based SR technique [60] when a suit-able training set is not availsuit-able. Multi-frame SR restoration, on the other hand, can resolve details much more accurately than the learning-based approach if the registration is correct and the blur model is known.

The problem of SR was first attempted some twenty years ago [186], since then SR has been deployed in a number of real world applications, ranging from the government-controlled military, forensics and space exploration to the more civilian-oriented printing and entertainment business. Video evidence of the beating of Rod-ney King and Reginald Denny prior to and during the 1992 Los Angeles riots [118] has prompted for an increased public acceptance of digitally restored evidence in courtroom. Video restoration software from Lowdry Digital [3] has managed to con-vert hundreds of favorite old movie titles from dirty and grainy analog medium to high-quality DVD format. Not stopping at off-line restoration, dynamic SR is being integrated into TVs and mobile phones by Philips and Sony with their PixelPlus [4] and Digital Reality Creation [2] technologies, respectively.

The remainder of this introduction chapter is presented as follows. Section 1.1

defines the tasks of SR and outlines our approach to the problem. Each component of the approach is then reviewed with the relevant literature in section 1.2. Since aliasing is an important necessity in achieving SR, the process of image formation and under-sampling is briefly mentioned in section1.3. Section1.4finally highlights the contributions of the thesis in the remaining chapters.

1.1

Problem definition

Although a natural scene has infinitely many levels of details, a typical imaging device can only capture the scene at a limited range of resolution. To increase the resolution of the capturing device, either the hardware has to be improved or a soft-ware solution can be employed. Hardsoft-ware advancements in stronger optics, shutter and aperture control, fine-grain film or miniaturized sensor arrays can undoubtedly lead to a substantial gain in image resolution. However, due to practical reasons, these hardware improvements may not be fully employed to push the resolution of camera systems to the diffraction limit of light. A software approach to SR is then the next logical step to improve the image resolution beyond the sampling bandwidth of the camera system.

(16)

are under-sampled because the optics usually resolves much finer details than what can be captured by the sensor array. These cameras produce aliased images, whose edges look jaggy when zooming in. However, the presence of aliasing enables an inexpensive way to obtain SR from an image sequence. Provided that there are subpixel motions between the LR images, the sampling rate can be increased in software by fusion of motion-corrected images onto a finer sampling grid. This image fusion operation truly increases the resolution (not just an increase in pixel count as occurs in interpolation schemes) by a restoration of the aliased frequency spectrum. However, the increase in spatial resolution is exchanged for a lower temporal resolution because many LR frames are required to produce a single High-Resolution (HR) image.

A typical solution for super-resolution from an image sequence involves three sub-tasks: registration, fusion and deblurring (see figure1.2). First, the LR images are registered against a common reference to a subpixel accuracy. An intermediate image at a higher resolution is then constructed from the scattered input samples. The HR fusion image is finally deblurred to amplify the frequency spectrum beyond the Nyquist frequency of the imaging sensor.

Regis-tration Aligned LR samples

...

...

Set of LR images Fusion HR fusion

image Super-resolvedimage

Deblur

Figure 1.2: A three-step algorithm for super-resolution of an image sequence: image registration, image fusion and image deblurring.

(17)

1.2. Literature review 5

1.2

Literature review

This section reviews the state-of-the-art of super-resolution techniques in the literature. Four components are surveyed separately: image registration, super-resolution fusion, super-super-resolution restoration, and super-super-resolution synthesis.

1.2.1

Image registration

Image registration is the task of finding the geometric transformation between two or more views of the same scene. In its most generalized form, a unique corre-spondence is found between every point in one image to another point in a second image, both points represent the same physical point in the scene. This type of reg-istration, a.k.a. optic flow [86], is ill-posed and it can only be solved approximately using regularization [87, 121, 17]. Furthermore, the resulting motion vectors do not necessarily describe the physical motion of either the camera or the scene, but an apparent motion in camera/image coordinates.

In many cases, the motion between different views of the same scene can be described by one of the parametric models presented in table 1.1. For example, motion of a distant scene captured by a slowly moving camera contains only a two-dimensional (2D) translation [46]. Satellite images or scanned documents [172] undergo a similarity transformation which is a combination of a 2D translation, rotation and scaling. The motion of a 3D planar surface or of a static scene captured by a pan-tilt-zoom and rotating camera is a 2D projective transformation [80].

Although the collection of algorithms on image registration is massive [28, 210], they can be classified into two main approaches: feature-based methods and area-based methods. While the former requires only a sparse set of point correspondences to fit the motion model, the latter uses the information from all pixels. Because of its regression nature, the feature-based methods are not robust against false matches. They also requires an over-determined number of point correspondences to achieve a good accuracy [80]. Area-based methods, on the other hand, can achieve a high level of accuracy suitable for SR if the initial estimation is good enough. Due to the pros and cons of each approach, the motion parameters can be roughly estimated by a feature-based method before being refined by an area-based method [103]. In many real applications, lens distortion correction [171] and bundle adjustment [185] can be applied to further improve the accuracy and consistency of multi-frame alignment.

(18)

prob-2D Model Degrees of

freedom Coordinate transformation Meaning Translation 2 x0 = x + tx

y0 = y + ty

2D translation Euclidean 3 x0 = cosθ x − sinθ y + tx

y0 = sinθ x + cosθ y + ty

Translation + rotation Similarity 4 x0 = s cosθ x − s sinθ y + tx

y0 = s sinθ x + s cosθ y + ty Euclidean + isotropic scaling Affine 6 x0 = a11x + a12y + tx y0 = a21x + a22y + ty Similarity + anisotropic scaling + skew Projective 8 x0 = (m1x + m2y + m3)/D y0 = (m4x + m5y + m6)/D D = m7x + m8y + 1 Affine + keystone + chirping Optic flow x0 = x + vx(x, y) y0 = y + vy(x, y) Free form motion

Table 1.1: The hierarchy of two-dimensional motion models.

ability and matching confidence throughout a sequence. The Harris and Stephens corner detector [79] was the first widely used feature detector until recently when a family of scale- and affine-invariant descriptors emerged: Scale-Invariant Feature Transform (SIFT) keypoints [120], Harris- and Hessian-Affine corners [129], and Maximally Stable Extremal Region (MSER) [127] (see [131, 130] for a comparison of these region detectors and descriptors). Point matches are then established be-tween two different images using a nearest neighbor search in the feature space. Since false matches are unavoidable due to scene occlusion or self-resemblance, a robust regression technique such as RANSAC [80] or random quadric algorithm [38] is applied. Multiple independent motions can also be extracted from a dense cor-respondence map using a highly robust model-selection estimator [198].

Different from the feature-based methods which work upon only a small num-ber of candidate points, the area-based methods make use of the entire overlapping area of the two images. Many area-based methods estimate image motion by min-imizing L2-norm distance between the two motion-corrected images. A pioneering

(19)

1.2 Literature review 7

et al. [99] and 2D projective motion by Szeliski [180] and Mann and Picard [126]. These least-squares motion estimators are unbiased and they are optimal under Gaussian noise [153]. The area-based registration is not restricted to the inten-sity domain only. Many other image representations have also been used, for ex-ample, a log-polar coordinate transform of the Fourier image [162] and Fourier phase information [55]. Other criteria than the L2-norm have also been explored:

mutual information [196] and I-divergence [104]. An image representation other than intensity-based is necessary in multi-modal image registration, which appears frequently in applications such as medical imaging [133], surveillance, or remote sensing [57].

1.2.2

Super-resolution fusion

Super-resolution fusion is the task of merging the motion corrected low-resolution samples to form a high-resolution image [209]. Although the HR image is not yet deblurred, some degree of SR is already achieved if the LR images are aliased [156]. While many SR schemes combine fusion and restoration, the decoupling of these two tasks reduces computational complexity and allows flexibility in restoration algorithms. However, the separation of SR fusion and restoration is only possible under the assumptions of rigid motion, common space-invariant blur and identical noise characteristics across all LR images [46]. Though seemingly limited, these conditions are frequently met in many real applications such as remote sensing [57] or document scanning [172] when depth of the scene is constant during capture.

The first attempt of SR reconstruction was done in the Fourier domain by Tsai and Huang [186] who assumed uncorrupted shifted LR images. Kim et al. [100] extended this work to accept noisy and blurred inputs using a recursive least-squares optimization. However, most recent fusion techniques are spatial-based so that other motion models than the translational model can also be handled. Ur and Gross [70, 187] first reported a spatial interpolation of scattered LR samples to form a HR image. Keren et.al. [99] improved the robustness of the interpolation by eliminating extreme values before local sample average. Recent advances in SR fusion show trends towards robustness against both mis-registration and intensity outliers [50]. Fusion of a spatiotonal adaptive neighborhood [158] also permits robust SR reconstruction from very few LR samples.

(20)

computational complexity, i.e. the method is slow given a large number of input data points. As a result, local polynomial methods such as cubic spline [73] or multilevel B-spline [110] are often used in computer graphics and remote sensing.

1.2.3

Super-resolution restoration

The aim of super-resolution restoration is to recover the image spectrum beyond the spatial bandwidth of the image system. Although image fusion already achieves some SR by unwrapping the aliased frequencies of the LR images, the resulting spectrum power is significantly attenuated by optical blur, motion blur and sensor integration. Deconvolution is therefore necessary to amplify the high frequency spectrum above the contrast level detectable by the human visual system.

Given the decoupling of reconstruction and restoration, SR restoration is simply a deconvolution of the HR fusion result. If equation (1.1) models the degradation of an object f by a space invariant blur h and additive Gaussian noise n [106], deconvolution is the task of recovering f from the degraded image g.

g(~x) = (f ∗ h)(~x) + n(~x) =

+∞

Z

−∞

f (~χ) h(~x − ~χ)d~χ + n(~x) (1.1)

where ∗ is the convolution operator.

Due to the duality of spatial convolution and frequency multiplication, inversion in the Fourier domain is initially proposed [93]. Wiener extended the inverse filter to minimize the Mean Squared Error (MSE) between the restored image ˆf and the object f . Recently, Shi and Reichenbach report a spatial-domain filter [175] that is very efficient while achieving the same restoration capability of the Fourier-Wiener filter.

Because observation noise can be characterized by a Gaussian distribution, least-squares deconvolution, which minimizes the simulation error || ˆf ∗ h − g||2, often achieves better results than the Wiener filter, which minimizes the reconstruction error || ˆf − f ||2. Least-squares deconvolution, however, requires regularization to produce a stable result in the presence of noise. Popular regularization norms that penalize high-frequency oscillation in the restored image are Tikhonov-Miller’s quadratic norm [182], Rudin’s Total Variation (TV) norm [168] and Farsiu’s bi-lateral TV norm [50]. Other regularization methods that do not directly penalize high values of the gradient magnitude can also produce good results: restoration using the Lorentzian model for the gradient near edges [113], maximum entropy methods [61], and Fourier-wavelet methods [139].

(21)

deconvolu-1.2 Literature review 9

tion. Iterative Back Projection (IBP) [92], for example, applies the least-squares deconvolution method to multiple LR images to produce a single HR output. Max-imimum A Posteriori (MAP) SR [77] is also a multi-frame restoration algorithm using the Tikhonov-Miller regularization. Similarly, Zomet’s robust SR [211] is a modified implementation of the IBP method which uses a pixel-wise median of the projected error to improve the restoration robustness. Projection On Convex Sets (POCS) [178] is also another technique for both deconvolution and multi-frame SR restoration [150].

1.2.4

Super-resolution synthesis

Super-resolution synthesis refers to a class of SR algorithms that use model-based priors to improve the resolution of input images. This model-based approach dif-fers from the reconstruction-based approach in the final step where high-frequency details are recovered from the reconstructed (but possibly blurry) HR image after fusion. Instead of deconvolution, the model-based approach imports plausible high-frequency textures from an image database into the HR image. The latter approach has gained significant interests in recent years because it promises to overcome the limit of reconstruction-based SR [13].

Motivated by multi-scale texture synthesis [82], Freeman [60] proposed an example-based SR approach using a set of high-quality reference images. Using the reference images, the relation between high- and low-frequency bands of small image patches are studied and the missing high-frequency band of the blurred input is inferred. Similarly, in a framework called image analogies, Hertzmann [84] used a sharp and blurred versions of a reference scene to deduce a sharp version of the blurred input using multi-scale patch-based matching. The two synthesis methods work especially good for natural scenes which contain a lot of stochastic textures [62]. Template-based objects can also be synthesized using priors learnt from representative models such as SR of faces by Baker and Kanade [13].

A common characteristic of the previously described synthesis algorithms is the use of a single nearest neighbor for texture transfer. However, given a large database of training patches, multiple good matches are possible. Using a linear combination of the matching HR patches, the performance of SR synthesis can be increased [32, 119]. The observation is supported by Local Linear Embedding (LLE) [170] theory, which points out that the corresponding LR and HR image patches both form manifolds with similar local geometries in two distinct feature spaces. The linear relationship in the LR feature space can therefore be extrapolated to the HR feature space.

(22)

contain different types of texture [39]. The problem lies in the fact that the input image patches are not well represented by those in the reference set. Although 90 rotation and mirroring produce more training patches [32], this solution increases the search space. A better way is to make the matching operator rotation-invariant by resampling the image patch along its gauge coordinates [72]. An affine matching operator may also be useful for large image patches under nonrigid motion.

1.3

Image formation

The most common way to capture a photographic image is to use a camera. The camera got its name from a phenomenon called camera obscura (Latin for dark room), in which light traveling through a pin hole forms an inverted image on a screen inside a dark room. Although the projected image is in the analog domain, they are often converted to a digital representation for further analysis.

sensor integration & sampling

( ) f x Optical PSF Sensor PSF × f k[ ] sampling ( ) BL f x + noise [ ] n k

Figure 1.3: Process of digital image formation: optical blur, sensor blur and sam-pling

The whole digital imaging process is depicted for a one-dimension signal in fig-ure 1.3. Input signal f (x) undergoes bandwidth reduction by the combination of lenses and aperture. At the imaging plane where the inverted image is projected, the signal is degraded again by light integration over a finite-size sensor element. The degraded signal is finally sampled at regularly spaced intervals, and the sam-pled intensities are quantized to form a digital image f [k]. Although sensor blur and sampling occur simultaneously in a sensor array, we depict them here as sep-arate processes for clarity purposes. Apart from the mentioned degradations by bandwidth reduction and quantization, the output image also suffers from noise and aliasing.

(23)

1.3 Image formation 11

not correspond to a point image on the focal plane because of diffraction. The diffraction-limited Point Spread Function (PSF) is an Airy pattern whose width is proportional to the wavelength of light and inversely proportional to the aper-ture size [27]. Sensor blur, on the other hand, is caused by light integration over the photosensitive area of a sensor element. To maximize quantum efficiency, the integration area is often designed to span the whole pixel (i.e. 100% fill-factor). This gives rise to a square sensor PSF depicted in figure 1.3. As shall be seen in chapter5, the effects of all blur sources can be represented by a single system PSF. This PSF can be estimated from an image, and it is often modeled by a Gaussian function (see appendix B).

( ) BL f x x k

×

=

[ ] f k x ( ) F ω ω c ω

=

( ) s F ω ω ω s ω

=

( ) r F ω ω ω c ω

×

=

( ) r f x x

x I II III IV V (a) Exact recovery by Nyquist sampling (ωs= 2ωc)

=

( ) F ω ω ( ) s F ω ω

×

=

( ) BL f x x k [ ] f k x ω c ω s ω

=

( ) r F ω ω ω c ω

×

=

( ) r f x x

x

(b) Aliased reconstruction due to under-sampling Figure 1.4: Sampling (I-III) and reconstruction (III-V) of a continuous 1D signal. Row I to V show spatial-frequency pairs of: input continuous signal, sampling func-tion, sampled signal, sinc interpolator, and reconstructed continuous signal.

Although too much blur can reduce the resolution of a signal, bandwidth atten-uation is essential for a proper sampling of the signal. Figure 1.4a illustrates how a continuous signal can be sampled in such a way that it facilitates exact reconstruc-tion [147]. The first column of figure 1.4a shows signals and sampling filters in the spatial domain. The second column shows the corresponding Fourier transforms. Since the signal to be sampled fBL(x) is band-limited by the optics (see figure 1.3), its frequency response F (ω) is truncated outside a cut-off frequency ωc. The

(24)

Fourier domain. The result of this convolution is a superposition of shifted copies of the band-limited spectrum at multiples of ωs. If the sampling rate is equal or above

Nyquist rate ωs > 2ωc, the shifted spectrum copies do not overlap and the original

spectrum can be recovered by an ideal low-pass filter (figure 1.4a IV). However, if the signal is under-sampled (i.e. ωs < 2ωc), the original spectrum can no longer be

extracted from of the superposition of overlapping spectra in figure1.4b III. As can be seen in figure 1.4b IV-V, even with a perfect interpolator (a sinc function), the reconstructed signal fr(x) is different from the original signal fBL(x). The

recon-structed spectrum Fr(ω) is said to be aliased because its values are distorted by a

superposition of overlapping shifted copies of the original spectrum.

Because a continuous signal can be exactly reconstructed from its samples under Nyquist sampling, imaging systems rarely over-sample. In fact, a small degree of aliasing even enhances the perceptual sharpness of an image. Aliasing is often allowed if the total power of the folding spectrum is insignificant compared to the total signal power. While too much aliasing introduces artifacts to the sampled image, aliasing is desirable in SR reconstruction because the signal spectrum can be extended beyond the sensor’s bandwidth by unraveling the overlapping copies of the spectrum.

1.4

Thesis organization

There are five content chapters in this thesis, each containing novel contributions to one of the four themes: image registration (chapter 2), image fusion (chapter 3 and 4), super-resolution restoration (chapter 5) and super-resolution synthesis (chapter 6). Although spatial and tonal adaptivity is a common theme throughout the chapters, each chapter can be read independently. The content of each chapter is briefly described as follows.

Chapter 2. Optimal gradient-based image registration. This chapter derives a performance limit for parametric image registration under Gaussian noise. Based on the bias analysis of a gradient-based shift estimator, it is shown that iterative gradient-based registration is optimal. The optimal registration is further extended to multiple images by bundle adjustment and a low-resolution to high-resolution registration strategy.

(25)

1.4 Thesis organization 13

fusion capability of the robust and adaptive normalized convolution is demonstrated in the problem of SR fusion and JPEG blocking artifact removal.

Chapter 4. Separable bilateral filtering. This chapter presents two sepa-rable implementations of the well-known edge-preserving bilateral filter. The first implementation filters an image separably along the sampling axes, whereas the second implementation decomposes the filter along local gauge coordinates. While the former offers excellent speedup with a good filtering quality, the latter is a trade-off between pure speed and excellent filtering under severe noise situations.

Chapter 5. Performance of super-resolution restoration. This chapter presents a robust SR restoration algorithm without the need of regularization. The restoration algorithm is fast because of its simple cost function and a detached noise reduction process. This chapter also uses the limit of registration in chapter 2 to derive a limit of SR restoration. The performance of our new SR algorithm and others is finally evaluated with an objective evaluation scheme.

Chapter 6. Example-based super-resolution in the DCT domain. This chapter points out the inefficacy of SR reconstruction on lossy compressed images. A new SR approach based on texture synthesis in the compressed domain is pro-posed. SR synthesis in the DCT domain is faster than a similar algorithm in the spatial domain. Blocking artifacts are also successfully suppressed by our new ap-proach.

Chapter 7. Conclusions and future research. This final chapter provides the conclusions of this work and discuss some open issues for future research.

(26)
(27)

Chapter 2

Optimal gradient-based image

registration

1

Image registration is a fundamental task in computer vision that plays a key role in many applications such as motion estimation [17], panoramic image construc-tion [180], and super-resolution [92]. It is generally accepted that registration per-forms better on some type of images than others and that noise degrades not only the precision but also the accuracy of registration [28]. However, until rigorous Cramer-Rao analysis [98] was applied to image registration, these observations were not supported by a well-formulated theory. The Cramer-Rao bound (CRB) imposes a lower bound on the variance of any estimate of a deterministic parameter. The CRB analysis in this chapter is motivated by the analysis of shift estimation in [165] and affine registration in [114]. It is however extended to all parametric registra-tion including, but not limited to, translaregistra-tion and 2D projective transformaregistra-tion. Common to all these parametric registration models is that: the CRB is signal-dependent and is proportional to the variance of the input noise.

This chapter proposes methods for estimating the optimal registration parame-ters, i.e. the ones that achieve the Cramer-Rao bound. Since most available regis-tration estimators are biased, they are not optimal. However, through an iterative process, the biases of a gradient-based shift estimator [121] and a 2D projective estimator [180] can be removed. The iterative estimators are not only unbiased but are also optimal since their variances approach the CRB. Analysis of the conver-gence rate shows that the error of the iterative shift estimator reduces to less than one-hundredth of a pixel after only two iterations. Levenberg-Marquardt optimiza-tion of a 2D projective registraoptimiza-tion also achieves this level of accuracy within 10 to 15 iterations under a coarse-to-fine multi-scale strategy.

1Sections 2.1 to 2.3 of this chapter have been published in: T.Q. Pham, M. Bezuijen, L.J. van Vliet, K.

Schutte and C.L. Luengo Hendriks, Performance of optimal registration estimators, in Visual Information

Processing XIV, SPIE vol. 5817, 2005 [153]. Materials on multi-frame registration in section 2.4 are new.

(28)

Because super-resolution usually involves a large number of input images, the improvement of registration on multiple frames is also of interest. Using a technique called bundle adjustment [185], consistent motion vectors are imposed on the input frames. This reduces the variation of pairwise registration and therefore improves the overall registration accuracy. Extra signal content hidden in the aliased input images can also be used to improve multi-frame alignment. Using an initial regis-tration, a High-Resolution (HR) image is constructed. This HR image is then used as a reference frame in the registration of the low-resolution input images. This improves intensity and gradient estimation, which in turns improves the precision of gradient-based registration.

The remainder of this chapter is organized as follows: section 2.1 derives the Cramer-Rao bound for a general parametric registration with special attention to two cases: translational and 2D projective registration. Section 2.2 presents a gradient-based shift estimator based on a Taylor series expansion of the local sig-nal. Biases of the gradient-based estimator are then derived. Section 2.3 corrects the shift bias in an iterative manner and shows that the iterative gradient-based estimator is optimal. Finally, section 2.4 presents two techniques for improving registration of (under-sampled) image sequences.

2.1

Cramer-Rao bound for image registration

The mean squared error of any estimate of a deterministic parameter has a lower bound known as the Cramer-Rao Bound [98] (CRB). Specifically, if a parameter vector m = [m0 m1 ... mn]T is estimated from a given set of measurements, the

CRB provides a lower bound on the error covariance matrix:

E[εεT] ≥ µ I + ∂b ∂m ¶ F−1(m) µ I + ∂b ∂mT + bbT (2.1)

where ε = ˆm − m is the estimation error (the hat sign denotes an estimate of the variable underneath); b = E[ ˆm] − m is the bias of the estimator (E[.] is the expectation of the enclosed expression); I is the identity matrix; F(m) is the Fisher Information Matrix (FIM) that characterizes how well the unknown parameter vector m can be estimated from the observed data; and F−1 is the inverse of F. The ≥ sign in (2.1) means that the difference between the left matrix and the right matrix is non-negative definitive. As a result, the inequality holds for all diagonal terms.

If the estimator is unbiased (i.e. b = 0), the expected variance of the parameters can be found directly from the main diagonal entries of the inverse matrix F−1:

Eh( ˆmi− mi)2

i

(29)

2.1 Cramer-Rao bound for image registration 17

The Fisher information matrix is derived from the maximum likelihood prin-ciple. Let Pr(r|m) be the probability density function of an observed noisy data r(m), the Fisher information matrix is a measure of the steepness of the likelihood function around its peak:

F(m) = E · ∂mlog Pr(r|m) ¸ · ∂mlog Pr(r|m) ¸T (2.3) Since the peak of a steep likelihood function is less sensitive to noise than that of a smooth one, the FIM characterizes how precise m can be estimated from the observed data.

2.1.1

Fisher information matrix for image registration

A direct image registration method searches for a parametric transformation be-tween the coordinate systems of two images based on their intensity correlation. Assuming that both images I1∗ and I2∗ are noise corrupted versions of a noiseless

scene I by two instances of zero-mean Gaussian noise with variance σn2: I1∗(x, y) = I1(x, y) + n1(x, y) = I(x, y) + n1(x, y)

I2∗(x, y) = I2(x, y) + n2(x, y) = I(x0, y0) + n2(x, y)

where x0 = f (x, y, m) and y0 = g(x, y, m) are the coordinate transformations, and m = [m1 m2 ... mn]T is the unknown registration parameter (e.g., under translation

x0 = x − vx, y0 = y − vy and m = [vx vy]T). Since the noise realizations n1 and n2

are normally distributed over the registration region S, the total probability of the unknown scene I given an estimate of m is:

Pr(I| ˆm) = Y S 1 σn exp à −(I 1 − I)2 2 n ! Y S 1 σn exp à −(I 2 − I0)2 2 n ! (2.4)

where the implicit coordinates for I1∗, I2 and I is (x, y) except for I0 = I(x0, y0). The log-likelihood function therefore is:

log Pr(I| ˆm) = − 1 2 n X S n (I1∗− I)2+¡I2∗− I0¢2o+ const (2.5) From (2.3), the Fisher information matrix for a n-parameter vector m is thus a n × n matrix F with its entries computed as:

(30)

where the derivative of the noiseless image I0 = I(x0, y0) with respect to each un-known parameter mi can be computed from its spatial derivatives and the

registra-tion model {x0, y0} = {f (x, y, m), g(x, y, m)}: ∂I0 ∂mi = ∂I0 ∂x0 ∂x0 ∂mi + ∂I0 ∂y0 ∂y0 ∂mi (2.7)

2.1.2

Cramer-Rao bound for 2D shift estimation

Using the general derivation of the Fisher information matrix in the previous sub-section, the Cramer-Rao bound for any unbiased shift estimator can be derived. Two-dimensional shift estimation looks for a translational vector v = [vx vy]T

be-tween the coordinate systems of the two images: x0 = x − vx and y0 = y − vy. The

Fisher information matrix can be computed from (2.6) and (2.7) [165]:

F(v) = 1 σ2 n   P S Ix2 P S IxIy P S IxIy P S I 2 y  = 1 σ2 n T (2.8)

where Ix = ∂I0/∂x0 = ∂I/∂x and Iy = ∂I0/∂y0 = ∂I/∂y are spatial derivatives of

the uncorrupted image I0. As can be seen in (2.8), the FIM for 2D shift estimation is proportional to a Gradient Structure Tensor T (GST) [79] integrated over the region S.

Substitution of (2.8) into (2.2) yields the Cramer-Rao bounds of the variances of the registration parameters:

var(vx) ≥ F−111 = σn2 P S Iy2.det(T) var(vy) ≥ F−122 = σn2 P S Ix2.det(T) (2.9) where det(T) = P S I 2 x P S I 2 y ³P S IxIy ´2

is the determinant of T. Note that the inequality in (2.9) is stricter than the one given in [165], which ignores the second term of det(T) to arrive at simpler variance bounds:

var(vx) ≥ σn2 .P S I 2 x var(vy) ≥ σn2 .P S I 2 y (2.10)

The simplified bounds clearly shows that the shift variance is linearly proportional to the input noise variance σn2 and inversely proportional to the total gradient energy in the shift direction. As a result, scenes with strong textures and little noise are likely to result in accurate shift estimation. However, the equality of the loose bound in (2.10) is hardly achievable (since (PIxIy)2 only vanishes when the

(31)

2.1 Cramer-Rao bound for image registration 19

Note that the CRB characterizes the shift variances based on an uncorrupted signal I, which is not available in practice. Fortunately, the total gradient energies of I can be approximated from those of the corrupted image I1 and a noise instance n given a normal distribution N(0, σn):

Eh P S (I x)2 i =P S I 2 x+ E h P S n 2 x i = P S I 2 x+ var(nx) P S 1 Eh P S (Iy)2i= P S Iy2 + Eh P S n2yi= P S Iy2 + var(ny)P S 1 Eh P S I xIy∗ i =P S IxIy

where Ix = ∂I1/∂x, Iy∗ = ∂I1/∂y, nx = ∂n/∂x, and ny = ∂n/∂y.

2.1.3

Cramer-Rao bound for 2D projective registration

The Cramer-Rao bound is not only applicable to shift estimation but also to more general motion models such as 2D affine and projective transformation. A 2D projective transformation, for example, is the motion of a static scene captured by a stationary camera or the motion of a moving planar surface. It is the most general planar motion model which includes translation, Euclidean, similarity, and transformations as in table 1.1. Similar to the previous subsection, the Cramer-Rao bounds for the eight projective parameters are computed from an 8 × 8 Fisher information matrix.

Planar projective registration seeks an 8-parameter vector m = [m1 m2 ... m8]T

that transforms one coordinate system (x, y) into another (x0, y0):

   u v D   =    m1 m2 m3 m4 m5 m6 m7 m8 1   .    x y 1    x0 = u D = m1x + m2y + m3 m7x + m8y + 1 y 0 = v D = m4x + m5y + m6 m7x + m8y + 1 (2.11) Substitute h ∂x0 ∂m ∂y 0 ∂m iT = D1 " x y 1 0 0 0 −x0x −x0y 0 0 0 x y 1 −y0x −y0y # into (2.7) yields: · ∂I0 ∂m ¸T = D1h x∂x∂I00 y∂I 0 ∂x0 ∂I 0 ∂x0 x∂I 0 ∂y0 y∂I 0 ∂y0 ∂I 0 ∂y0 ... −x(x0 ∂I∂x00 + y0 ∂I 0

∂y0) −y(x0 ∂I 0 ∂x0 + y0 ∂I 0 ∂y0) i (2.12)

(32)

Due to a complex 8 × 8 matrix inversion, the exact formula for the Cramer-Rao bound of 2D projective registration are not given here. They can be computed from the diagonal entries of the inverse Fisher information matrix F−1(m). Similar to the shift estimation case, the lower variance bounds of the 2D projective parameters are proportional to the input noise variance and inversely proportional to the total gradient energy.

2.2

Bias of gradient-based shift estimators

It is interesting to note that most available shift estimators are biased. To support this claim, a scatter plot of shift estimates from different estimators is presented in figure 2.1. The estimation is performed on a pair of 256 × 256 8-bit shifted images. The images are corrupted by additive Gaussian noise with SNR = 15dB after a synthetic sub-pixel translation of v = [−0.23 − 0.25] . Each of the 100 data points per estimator corresponds to an estimated displacement of the same image pair corrupted by a different noise realization. The estimators tested in the experiment are:

1. Normalized Cross Correlation (NCC)

The NCC method computes normalized cross-correlation of the two images at 9 integer locations n1, n2 ∈ {−1, 0, 1} around the integer shift:

NCC(n1, n2) = P S h I2(x − n1, y − n2) − I2 i h I1(x, y) − I1 i s P S h I2(x, y) − I2 i2P S h I1(x, y) − I1 i2 (2.14)

where I is the mean intensity of image I. A sub-pixel maximum is then located by a quadratic surface fit to the selected 3 × 3 NCC neighborhood. This sub-pixel offset is accumulated to the integer shift for a final shift estimate.

2. Mean Absolute Difference (MAD)

Similar to the NCC method, the MAD method locates a local minimum of the mean absolute difference between the two images by a quadratic surface fitting over a 3 × 3 neighborhood of:

MAD(n1, n2) = 1

N

X

S

|I2(x − n1, y − n2) − I1(x, y)| (2.15)

3. Phase fitting in Fourier domain (phase)

The phase-based method uses the well-known Fourier shift property [147]: I2(x, y) = I1(x + vx, y + vy) ⇔ F2(ωx, ωy) = F1(ωx, ωy)ei(vxωx+vyωy)

(33)

2.2 Bias of gradient-based shift estimators 21

The shift [vx yy] is then found by a plane fit to the phase difference of the two

frequency spectra. Note that due to noise and a phase jump at [−π, π], only the low frequency part of the phase is used for the fitting [124].

4. Gradient-based method (Taylor)

This method uses a first-order Taylor series approximation of the sub-pixelly shifted signal to derive a set of linear equations to solve estimated shift. More about the gradient-based method is described later in this subsection.

−0.26 −0.22 −0.18 −0.14 −0.28 −0.24 −0.2 −0.16 v x v y [v x vy] true = [−0.229 −0.248] NCC MAD Taylor phase true shift

Figure 2.1: Accuracy and precision of different shift estimators due to noise.

As can be seen from figure 2.1, all methods under review have shift variations because of the 15dB noise. More importantly, all of them are biased because the true shift (seen in the figure as a star) does not coincide with any of the cluster centers. In this subsection, we focus on the bias of the Taylor shift estimator because an understanding of the bias is useful for bias correction in later subsections.

The Cramer-Rao bound presented in the previous section is achievable for unbi-ased registration estimators. Biunbi-ased estimators have a higher variance bound (see equation 2.1). While most available shift estimators are biased, the biases are often hard to model. In this section, we present bias analysis for one- and two-dimensional gradient-based shift estimators. The bias is derived entirely in the spatial domain as opposed to a mix of spatial and frequency analysis [165].

2.2.1

Shift estimation based on Taylor series expansion

A Taylor series is a polynomial expansion of a function about a point. The expansion is an approximation and it can be used to relate two slightly shifted signals:

(34)

where the 1D signal s2 is a shifted version of s1 by a small displacement vx(vx < 1).

Because vx is close to zero, high-order terms in (2.17) are negligible. s2 can thus

be approximated by a truncated polynomial after the first-order derivative. The approximation is done for N samples in a region S, which results in a set of linear equations. A least-squares estimation of vx then minimizes the following Mean

Square Error (MSE):

MSE = 1 N X S µ s2(x) − s1(x) − vx∂s1 ∂x2 (2.18)

This yields a least-squares solution for the displacement [121]:

vxLS = P S (s2(x) − s1(x)) ∂s1 ∂x P S ³ ∂s1 ∂x ´2 (2.19)

As noted earlier, the truncated Taylor series approximation is only accurate when the second and higher order terms in (2.17) are small. Consequently, only displacements smaller than the derivative scale (we use Gaussian derivative at scale σs = 1 pixel for discrete signals) can be estimated reliably using this

gradient-based method. Signals with displacements larger than one pixel should be first aligned to integer accuracy using other methods such as cross-correlation. A multi-resolution Taylor shift estimation [10] that estimates and refines the displacements from coarse to fine scales could also be used to increase the coverage of the estimated displacement.

2.2.2

Bias of 1D gradient-based shift estimator

One problem that remains with the Taylor expansion method is that the shift is estimated from a set of approximations, not equations. The solution (2.19) is also derived from noise-free signals only. As a result, there is a systematic bias that depends on the image content, the noise, and the displacement itself. In fact, the bias is a combined effect of multiple model errors such as truncation of Taylor series and imperfect signals.

Bias due to truncation of Taylor series

One most noticeable bias term in the estimated displacement is due to the trunca-tion error of the Taylor series in (2.17):

(35)

2.2 Bias of gradient-based shift estimators 23

Putting s2(x) − s1(x) = vx∂s∂x1 + ε into the least-squares estimate in (2.19), the

truncation bias can be computed as:

biasT = vxLQ− vx= P S ε ∂s1 ∂x P S ³ ∂s1 ∂x ´2 = 1 3!v 3 x P S 3s1 ∂x3 ∂s∂x1 P S ³ ∂s1 ∂x ´2 + 1 5!v 5 x P S 5s1 ∂x5 ∂s∂x1 P S ³ ∂s1 ∂x ´2 + ... (2.21)

where the even-order terms vanish due to Parseval’s theorem [147]:

X S ∂2ns ∂x2n ∂s ∂x +∞ Z −∞ f g∗dx = 1 Z −π F G∗dω = j(−1) n Z −π ω2n+1|S(ω)|2dω = 0 where s(x) S(ω) f (x) = ∂s(x)∂x F (ω) = jωS(ω) g(x) = ∂2n∂xs(x)2n ⇔ G(ω) = (−1)nω2nS(ω)

Bias due to intensity noise

The formula for the truncation bias is applicable to shift estimation between two noiseless images. In reality, noise is always present and is a source of a much more significant bias term. We assume additive Gaussian noise, uncorrelated with the original image and uncorrelated between realizations:

s∗1(x) = s1(x) + n1(x) s∗2(x) = s1(x + vx) + n2(x)

Under Gaussian noise, shift estimation by minimizing the MSE in (2.18) still holds, so does the solution (2.19) with an exception that the image gradient must be derived from the noiseless signal s1. Since s1 is unavailable, the noisy signal s∗1

is used instead. Although noise does not affect the truncation bias, it introduces another bias to the estimated shift:

biasN = v∗x− vx = P S (s 2(x) − s∗1(x))∂s 1 ∂x P S ³ ∂s∗1 ∂x ´2 − vx = P S (s2(x) − s1(x)) ∂s1 ∂x P S ³ ∂s1 ∂x ´2 +P S ³ ∂n1 ∂x ´2 − vx = −N S + N vx (2.22)

where vx is the estimated shift computed from the noisy signals s∗i, S = P ³∂s1

∂x

´2

and N = P ³∂n1

∂x

´2

(36)

It is interesting to see that the Taylor shift estimator always underestimates the displacement in the presence of noise. The bias due to noise is proportional to the shift magnitude. As a result, it is a more severe bias source than the Taylor series truncation, which is proportional to the cubic power of the shift.

Combining (2.21) and (2.22) yields the total bias of the 1D shift estimator:

bias = vx P S ³ ∂n1 ∂x ´2 P S ³ ∂s1 ∂x ´2 +P S ³ ∂n1 ∂x ´2 +3!1vx3 P S 3s1 ∂x3 ∂s∂x1 P S ³ ∂s1 ∂x ´2 +5!1vx5 P S 5s1 ∂x5 ∂s∂x1 P S ³ ∂s1 ∂x ´2 + O(vx7) (2.23)

This bias can be expressed as a polynomial of the shift and can therefore be elimi-nated if the shift goes to zero.

2.2.3

Bias of 2D gradient-based shift estimator

Similar to the 1D case, a 2D gradient-based shift estimator is based on a two-dimensional local Taylor series expansion:

I2 = I1 + vxIx+ vyIy + ε (2.24)

where image I2 is a shifted version of image I1 by a small displacement (vx, vy),

Ix and Iy are derivatives of I1 in the x and y dimension respectively, and ε is the

approximation error. Minimizing the truncation errorP²2 over a region of support S yields a set of equations for vx and vy:

  P S Ix2 P S IxIy P S IxIy P S I 2 y  · " vx vy # =   P S IxIt P S IyIt   (2.25)

whose solution is: vx= ³P S IxIt P S I 2 y P S IyIt P S IxIy ´. det(T) vy = ³P S IyIt P S I 2 x− P S IxIt P S IxIy ´. det(T) (2.26)

where It = I2 − I1 is an approximation to the temporal derivative and det(T) is

the determinant of the Gradient Structure Tensor (GST) in (2.25). Note that the same GST matrix appears in the Fisher information matrix of 2D shift estimation (equation 2.8).

The truncation bias for the 2D gradient-based shift estimator can be derived analytically by putting It = vxIx+ vyIy + ε to (2.26) (formulae for biasTy is similar

(37)

2.3. Optimal pairwise gradient-based registration 25

where the even-order terms in the expression of P

S εIx vanish due to Parseval’s

theorem: P S εIx = 16v3x P S IxxxIx+ 12vx2vyP S IxxyIx+ 12vxvy2 P S IxyyIx+ 16vy3 P S IyyyIx+ ...

The bias due to noise for 2D gradient-based shift estimator is more complicated than that of the 1D case:

biasNx = vx − vx = P S IxIt³P S Iy2+P S n2y´P S IyItP S IxIy ³P S I 2 x+ P S n 2 x ´³P S I 2 y + P S n 2 y ´ ³P S IxIy ´2 − vx = P S IxIt P S n 2 y det(T) + N − vx N det(T) + N (2.28)

where vx is the estimated displacement computed from (2.26) using the derivatives of the noisy image I1 = I1+n1; nx and ny are derivatives of n1 in x and y dimension

respectively. N is the over-estimation of det(T) due to noise: N =P S n2xP S Iy2+P S n2yP S Ix2+P S n2xP S n2y (2.29)

Combining (2.27) and (2.28), the total bias of the 2D gradient-based shift esti-mator is: biasx= P S IxIt P S n 2 y det(T) + N N det(T) + Nvx+ avx3 + bvx2vy + cvxvy2+ dvy3+ ... det(T) (2.30)

Although the 2D bias is not as simple as the 1D bias, it also reduces to zero as vx

and It approach zero. This suggests a strategy for bias elimination by iteratively

reducing vx to zero. This iterative gradient-based shift estimator is described in

the next subsection.

2.3

Optimal pairwise gradient-based registration

(38)

2.3.1

Iterative gradient-based shift estimator

With an analytic solution of the bias in the previous subsection, the shift estimate can be corrected by a bias subtraction. However, the shift bias in (2.23) and (2.30) are derived from the unknown true shift and a clean reference signal. As a result, a direct bias correction algorithm [20] that uses the inaccurate shift and the noisy reference frame will not yield an exact shift estimate. Instead, we opt for an indirect but effective bias correction via an iterative procedure. For a 1D signal, equation (2.22) reveals that the relative bias bias/vx ≈ −N/(S+N) has a sign that is opposite

of vx and a magnitude smaller than one. This indicates an underestimation of the

real displacement. The bias is therefore guaranteed to reduce at an exponential rate if shift estimation is performed iteratively on a motion-corrected image pair. Since the bias can be reduced to practically zero, the iterative shift estimation converges to an unbiased solution.

The algorithm for an 1D iterative shift estimation is described as follows (Note that integer displacement should be corrected first since the gradient-based method only handles sub-pixel displacement):

1. Initialize a counter i = 0, a temporary signal s02 = s2, and an accumulated

displacement vx = 0

2. Perform a gradient-based shift estimation on s1 and si2 that results in an

incremented displacement vix

3. Accumulate the latest shift increment: vx = vx+ vix

4. Warp the second signal towards the first signal using the accumulated dis-placement: si+12 (x) = s2(x) ∗ δ(x − vx)

5. Exit loop if the incremented displacement vxi is within a desirable accuracy or if a maximum number of iterations is reached, otherwise increment the counter i = i + 1 and go back to step 2.

The iterative algorithm converges so fast that the relative bias is reduced to less than 0.5 percent for practical Signal-to-Noise Ratios1 (SNR ≥ 10dB) after only 2 iterations:

bias(1stiter) ≈ S+NN vx 111 × 0.5 ≈ 4.5e−2

bias(2nditer) ≈ S+NN bias(1st) ≈³S+NN ´2vx

³

1 11

´2

× 0.5 ≈ 4.1e−3

The iterative shift estimation is also very efficient because the least-squares so-lution (2.19) requires only one spatial and one temporal derivative. The spatial

Cytaty

Powiązane dokumenty

Artykuł umieszczony jest w kolekcji cyfrowej bazhum.muzhp.pl, gromadzącej zawartość polskich czasopism humanistycznych i społecznych, tworzonej przez Muzeum Historii Polski

Publikacja Drogi państw nordyckich do Unii Europejskiej to bardzo interesujące i komplek- sowe omówienie kolejnych etapów integracji europejskiej trzech krajów nordyckich –

25 Katechizm religii katolickiej dla diecezji chełmińskiej, wyd.. FUNKCJE ŚREDNIOWIECZNYCH DZWONÓW.... Ilość ta świadczy, iż starano się na Warmii dostosować do norm,

[r]

Wartościowy zarówno dla publikacji, jak i dla odbiorcy jest także wstęp Od autorki, w którym pisarka zdradza, jaką jest osobą, dowodzi swej skromności i jakby tłumaczy

The method was experimentally evaluated on two control problems with continuous spaces, pendulum swing-up and magnetic manipulation, and compared to a standard policy derivation

The ML-DDPG architecture consist of three DNNs, a model network, a critic network and an actor network. The model network is trained by using a new concept that we call

[r]