Subband coding of images

(1)

(2)

(3)

Subband Coding of Images

Proefschrift

ter verkrijging van de graad van doctor aan de Technische Universiteit Delft, op gezag van de Rector Magnificus, prof. drs. P.A. Schenck, in het openbaar te verdedigen ten overstaan van een commissie aangewezen door het College van Dekanen op donderdag 26 oktober 1989 te 16.00 uur.

door

Peter Hans Westerink

geboren te 's-Gravenhage

elektrotechnisch ingenieur

(4)

Dit proefschrift is goedgekeurd door de promotoren prof. dr. ir. D.E. Boekee en prof. dr. ir. J. Biemond

(5)

S u m m a r y

The need for both data communication and data storage can be seen to be growing enormously. In digital image communication we have applications such as the videophone, teleconferencing, digital TV and the transmis sion of still pictures over, for example, telephone lines (e.g. newspaper pictures). Digital image storage systems are concerned with, for instance, digital video recording (e.g. CD video) or storage of medical images such as CT scans. These two application areas have as a common problem that images or image sequences need to be transmitted or stored as efficiently as possible. This means that both image transmission and image storage schemes attempt to minimize the amount of information necessary to ade quately represent an image. This general problem is addressed by the field of image coding. In this thesis we are concerned with the digital encoding of images for data compression. In particular, the subband coding tech nique is discussed.

Subband coding was introduced by Crochiere et al. in 1976 for speech. Since then, considerable attention has been devoted to subband coding of speech and it has proven to be a powerful technique for the medium and low bandwidth source encoding of speech. The basic idea of subband coding is to split up the signal frequency band into subbands and to encode each band with a coder and bit rate accurately matched to the statistics of that particular band. For speech coding two advantages of taking this approach have been found to be important. Firstly, the quantization noise generated in a particular band is largely confined to that particular band in the reconstruction and is not allowed to spread to other bands. This prevents a low-level input in one frequency range being masked by the quantization noise in another frequency range. Secondly, by dynamically assigning bits to each subband the noise spectrum can be shaped according

(9)

to the subjective noise perception of the human ear ("noise shaping"). The problem of splitting a signal into subbands and then reconstructing it again is usually looked upon as a pure signal processing problem. The exact signal characteristics and the nature of application is not given any attention. Effectively this means that errors due to encoding, transmission and decoding are neglected. In that case the splitting and reconstruction stages become attached and the design is directed at perfect or nearly per fect reconstruction. For splitting a one-dimensional signal such as speech into two subbands the "Quadrature Mirror Filter" (QMF) technique was introduced by Croisier, Esteban and Galand. In this technique the signal is split using "Finite Impulse Response" (FIR) filters that are relatively easy to design and to implement. The reconstruction is nearly perfect in the ab sence of coding errors. An overview of this one-dimensional two-subband case is given. Following Vetterli, the extension to two dimensions is made, where a two-dimensional signal is split into four subbands. In order to be able to obtain more than four subbands, splitting tree structures are in troduced. The actual implementation of the filter banks and the necessary boundary values are discussed. By using symmetrical boundary values it is shown that extra coding errors due to intensity jumps at the subband boundaries can be avoided.

The encoding of the subbands is usually carried out for each subband separately. However, it is possible to jointly encode the subbands by de signing a "Vector Quantizer" (VQ) across the subbands. In this manner the subband encoding scheme becomes very elementary and of low complexity. There is no need for a bit allocation algorithm or a variance estimation. A theoretical analysis for asymptotically high bit rates indicates a consid erable gain of using VQ instead of scalar quantizers. In practice, however, when the bit rates are typically low, the gain is much smaller. The actual performance of the low complexity subband coding scheme is shown to be comparable to other existing encoding techniques, such as the block DCT encoding of images.

A scheme that is more adaptive to changing image statistics is rel atively easily designed by separately encoding the subbands. To design such a scheme, it is first necessary to investigate which existing splitting filter can be used. Also, several image frequency band division schemes are discussed. To design the subband encoders the subband statistics are measured. It is shown that the low-pass subband can be efficiently

(10)

en-coded using a two-dimensional DPCM encoder. The remaining subbands are almost uncorrelated and it is therefore sufficient to encode these with a Lloyd-Max quantizer. From histogram measurements it is concluded that the probability density function (pdf) on which the quantizers are to be based are not Laplacian. In fact, the histograms are more narrow and peaked. Therefore, the "Generalized Gaussian" pdf is introduced, which has a shape parameter c determining the width of the pdf. For c = 2.00 the pdf becomes the Gaussian pdf, while c = 1.00 would imply a Laplacian pdf. Histogram fitting experiments demonstrate that for c = 0.50 the best fittings to the subband histograms are obtained. For the prediction error of the low-pass subband the value of c is close to 0.75. After quantizer de sign the probabilities of the quantizer representation levels are taken into consideration. By applying variable length encoding the performance of the subband coder can be further improved.

By assuming that the total mean-squared error (MSE) between the original image and the reconstruction is equal to the sum of the separate MSE values for each subband a bit allocation algorithm is derived that optimally distributes the available bits over the subbands. In contrast to existing algorithms the assumption of high bit rates is not necessary and the algorithm can be applied to quantizers with non-integer bit rates as well. The extension to color images shows that the adaptive bit alloca tion algorithm often only assigns one quarter of the available bits to the chrominance subbands. The remaining three-quarters of bits are spent on the luminance subbands.

Results of subband coding and comparisons to other coding techniques show that subband coding is favorable both in an objective and in a subjec tive sense. For the MSE it is shown that by using subband coding slightly higher bit rate reductions can be achieved than when using transform cod ing. Further, the type of distortion that occurs with subband coding looks more natural than the blocking effect that is known to occur by using transform coding and is therefore less annoying. Therefore, in a subjec tive sense subband coding is usually perceived as being preferable to block transform coding techniques such as the block DCT encoding. Examples of subband coding of 512x512 color images show high quality images at a total of only 0.8 bits per pixel.

The subband filtering design and the actual encoding of the subbands are generally viewed as two separate problems. However, by introducing

(11)

a mathematical model for a Lloyd-Max quantizer it is possible to analyze the quantization errors and the impact on the filtering. By modeling the quantizer with the "gain-plus-additive-noise" model four different types of errors can be distinguished that lead to more insight into the structure of image subband coders using Lloyd-Max quantizers. The first error, which is defined as the QMF design error, always occurs, even in the absence of coding. However, this type of error can be neglected. A "signal error" determines the sharpness of the reconstructed image. A "random error", uncorrelated with the original image, is responsible for the "muddy" ap-pearence of low bit rate encoded images and is the most visible in the vicinity of edges and in fiat (low frequency) areas. The aliasing error due to coding is shown to be negligible in comparison to both the signal and the random error. Finally, it is concluded that a spatially adaptive encod ing technique will decrease both the signal and random error effects and therefore will improve the subband coder performance.

Finally, some special topics in the area of subband coding are elabo rated. Firstly, subband coding is shown to be very suitable for the pro gressive transmission of images. A subband coding scheme is designed and examples of progressively transmitted images are shown. Issues such as side information and image display size are discussed. Next, the exten sion to subband coding of image sequences is made. Applications range from videophone, teleconferencing and standard TV to high definition TV (HDTV). Also for these new areas subband coding seems to be a very promising technique. By designing a subband coder for low bit rate cod ing it will be demonstrated that a trade-off between spatial and temporal resolution is made automatically, as a function of the amount of motion within the sequence. This property is closely related to the characteristics of the human visual system. Finally, the impact of channel errors on the subband encoded data is investigated. A dynamic error protection scheme is devised, protecting the subbands and the bits within each subband ac cording to their importance.

(12)

Chapter 1 Introduction

In this thesis we are concerned with the digital encoding of images for data compression. In particular, the subband coding technique will be discussed. In subband coding of images the image frequency band is split up into subbands after which each subband is encoded separately using a coder and bit rate accurately matched to the statistics of that particular band. The design of a subband coding scheme can initially be divided into two separate problems, the subband splitting (and reconstruction) and the encoding/decoding of the subbands. Here, we will deal with both problems, but the emphasis will be on the coding of the subbands. However, the interaction between the two subjects will be analyzed as well.

In this first chapter, the general concept of image coding is discussed and some of the most commonly used image coding techniques are summa rized. First, in Section 1.1 the problem of image coding is introduced and a description of a general digital image communication system is given. Next, Section 1.2 will provide a brief description of rate distortion theory and, as distortion measures, the definition of the mean-squared error and of the signal-to-noise ratio will be given. In Section 1.3 some existing im age coding techniques are briefly reviewed. Finally, the concept of subband coding will be introduced in Section 1.4.

1.1 The digital communication system

The need for both data communication and data storage can be seen to be growing enormously. In digital image communication we have applications such as videophone, teleconferencing, digital TV and the transmission of

(13)

still pictures over, for example, telephone lines (e.g. newspaper pictures). Digital image storage systems are concerned with, for instance, digital video recording (e.g. CD video) or storage of medical images such as CT scans. These two application areas have as a common problem that images or image sequences need to be transmitted or stored as efficiently as possi ble. This means that both image transmission and image storage schemes attempt to minimize the amount of information necessary in order to ade quately represent an image. This general problem is addressed by the field of image coding.

A block diagram of a digital communication system is shown in Fig ure 1.1. Here, the source generates the information which is to be

transmit-source source encoder channel encoder destination source decoder channel decoder ■ channel

Figure 1.1: A digital communication system-block diagram. ted through a channel to the destination, which sometimes is also referred to as the user. The channel can be regarded as a transmission path (e.g. a telephone line) as well as a storage medium (e.g. an optical disk). Typ ically, the source within this digital communication scheme is a sampled signal. In the case of image coding the source will be a sampled image, where the samples of the image are called "picture elements" or "pixels". It must be noted here, that the pixels need not necessarily be amplitude-discrete. However, in most situations we see that the pixels have already

(14)

been digitized in typically 8 bits per pixel or even 12 bits per pixel for medical images such as X-rays.

In Figure 1.1 it is the task of the source encoder to reduce the amount of data of the source image. The inverse operation, the reconstruction of the image, is performed in the source decoder. Basically, the source encod ing can be divided into two parts. Firstly, the removal from the image of the statistical redundancy. This operation is also called data compression and is reversible and therefore lossless. Data compression can be achieved by using variable length encoding. The theory dealing with the topic of data compression is also called Information Theory [Blah87]. Secondly, in

data reduction part of the information content is removed. Because of the

loss of information this is an irreversible operation. In that case a certain amount of distortion is always introduced, but the amount of data can be reduced more than in the case of data compression. The source encoding stage, data compression in combination with data reduction, will often also be referred to simply as data compression, indicating a compression of the amount of data. A mathematical basis for this data compression problem is provided by "Rate Distortion Theory" [Berg7l], which will be briefly reviewed in the next section.

In practice the channel in Figure 1.1 is subject to various types of noise, distortion and interference. Because of that the channel output may differ from the channel input. After data compression the channel encoder therefore again adds some redundancy to the data, in the form of parity checks. After receiving the data the channel decoder then exploits this redundancy to correct (or detect) errors and produce an estimate of the compressed data. This subject is also a part of the field of information theory [Blah87j.

In general, source coding and channel coding are two different subjects and in practice these two parts of a communication system are designed separately. In this thesis we will mainly deal with the source encoding of images by means of subband coding. However, the design of a channel encoder where the source characteristics are taken into account will also briefly be investigated.

(15)

1.2 Rate distortion theory

The source encoder/decoder pair in Figure 1.1 must be designed for min imum distortion between the original and the reconstructed image at a given amount of data after compression. In the case that no distortion at all is tolerated the lower bound on the performance of any encoding scheme is given by the entropy H of the source. However, in most applications a certain degree of distortion is allowed. Then the optimal performance that is theoretically attainable by an image encoding scheme is given by the "rate-distortion function" R{D), which is dealt with in "Rate Distortion Theory" [Berg7l]. More specifically, R{D) is the minimum amount of in formation needed in order to reproduce the source with a given distortion

D. A typical rate-distortion function is sketched in Figure 1.2. It can be

H R(D)

t

- D

Figure 1.2: A typical rate-distortion function R(D).

seen that for D = Owe have R{0) = H, which is called the entropy of the source. Note that sources that are not amplitude-discrete can never be reproduced with zero distortion and in that case the rate-distortion curve tends to go to infinity for D = 0. The rate-distortion function R{D) pro vides a theoretical lower bound to the performance of any source encoding system.

Given the communication channel, its capacity C is defined as the max imum amount of information that can reliably be sent through the channel. For the design of a communication system it is then important to know

(16)

under which conditions it is possible to send a certain amount of informa tion through a communication channel. The requirements are given by the channel encoding theorem [Berg71,Boek88]:

A source can be reproduced at the destination with an av erage distortion D + e, if for the capacity C of a communication channel it holds, that:

C > R{D) + 6,

where both e and <5 are arbitrarily small. It is not possible to reproduce a source with average distortion D if C < R(D).

The consequence of this theorem is that for a given source and channel the minimum average distortion D that can be attained is derived from

C = R{D).

Although much work has been devoted to the development of rate dis tortion theory, practical results are known only for a very few special cases

[Berg7l]. For instance, if the sample amplitude of a one-dimensional mem-oryless source is distributed according to a Gaussian probability density function, then for a mean-squared error criterion (which will be defined in the following) the rate-distortion function R{D) is given by:

R{D) =

{l<o

g

£,0<D<«\

' { 0, D > a2. K '

In cases where the rate distortion function is not explicitly known, often bounds can be derived, such as the Shannon lower bound [Berg7l]. For discrete memoryless sources with arbitrary probability density functions it is also possible to compute the R{D) function numerically, by means of an algorithm by Blahut [Blah72]. However, for complex sources, such as images, it is usually not possible to calculate the rate distortion function and the performance of data compression schemes needs to be evaluated experimentally.

Finally, we will define a distortion measure that will be used throughout this thesis to both optimize and evaluate the designed data compression

(17)

schemes. The most widely used distortion measure D to determine the fidelity of a reconstructed image for an M x N size image is the mean-squared error (MSE), which is calculated by

, -■ M N

MSE = — - £ £ [ : r ( m , n ) - x ( m , r i ) ]2, (1.2)

m i V m = l n = l

where x(m, n) and x(m,n) represent the original and the reconstructed pixels, respectively. Related to the MSE for images two versions of the

"Signal-to-Noise Ratio" (SNR) exist. These are defined as

2

SNR = 1 0 l o g1 0^ (dB), (1.3)

and

SNR255 = 1 0 l o g1 0- ^ ^ (dB), (1.4)

where a\ is the variance of the original image and 255 is the "peak-to-peak" value of the original image data. In the case of Eq. 1.4 the image pixels are assumed to be uniformly quantized with 256 levels (8 bits per pixel), ranging from 0 to 255.

In this thesis we will apply the MSE criterion to optimize the sub-band coding scheme. By doing so we profit from some advantages of the MSE when applied in subband coding. Firstly, the MSE is mathematically tractable and can be computed easily. Secondly, the overall MSE between input and result image will be (approximately) equal to the sum of the MSE values per subband. This will turn out to be a property that can very well be used to design a bit allocation algorithm which optimally dis tributes the available bits over the subbands. Applying other distortion measures per subband or to the image will generally not allow for such an easy bit allocation approach. Further, quantizers based on minimizing the MSE, also known as Lloyd-Max quantizers, are relatively easy to design. Finally, it will turn out that in comparison with a subjective judgement the MSE measure performs rather well for the type of errors in subband encoded images.

Other known distortion measures for images often are based on models of the human visual system (HVS) and can be very complex. For example, in [Mann74] a visual fidelity criterion is suggested based on the frequency response of the HVS. Other measures are based on spatial domain char acteristics [Netr80]. In contrast to the MSE, which is a global measure, these type of measures often incorporate local distortion measurements.

(18)

1.3 Image coding techniques

The source coding/decoding algorithm of Figure 1.1 tries to reduce the image bit rate as much as possible with minimum distortion. For that purpose a large variety of algorithms is available. In this section a short review will be given on some existing image coding techniques. It must be noted here that the survey is by no means complete and the intention is merely to give an introduction into the field of image coding. A review on image coding where a number of techniques is discussed can be found in [Jain81a]. Other recent books on image coding are by Jayant and Noll [Jaya84] and by Netravali and Haskell [Netr88].

The most elementary image coding method is related to the quantiza tion of each pixel into a finite number of representation (reconstruction) levels. Each level is represented by a codeword which is sent to the decoder where the pixel value is reconstructed by inserting the representation level corresponding to the received codeword. The most well-known quantizer is the Lloyd-Max quantizer, where the representation levels are chosen to minimize the mean-squared error between quantizer input and output [Jain81a,Jaya84]. The bit rate of an image using plain quantization can be reduced to approximately 4 to 6 bits per pixel with an acceptable level of visual distortion. The process of sampling, quantization and assigning codewords to the representation levels is also called Pulse Code Modula tion (PCM).

One type of statistical redundancy within an image consists of a statisti cal dependency between pixels. The simple quantization technique did not make use of this property and therefore could not remove this redundancy. A well-known technique that does exploit the dependencies between pixels is "Differential PCM" (DPCM). A scheme of a DPCM encoder/decoder pair is shown in Figure 1.3. The incoming pixel is first predicted in the "predictor", which usually consists of a linear prediction using pixels sur rounding the pixel to be predicted. The prediction is subtracted from the actual pixel value and the error is quantized. To avoid error propagation in the decoder the quantizer is situated inside the loop and its outputs are used for the prediction of future input samples. Therefore, the predic tion can only be based on pixels that have already been quantized and no

(19)

r7— Q prediction

-<J

?:

_ + r

7 ^

prediction

Figure 1.3: Encoding (left) and decoding (right) scheme of DPCM.

future samples may be used for prediction. This important issue in predic tion is called causality. For one-dimensional (ID) signals that are sampled in time the prediction of a present sample is based on past samples. In two-dimensional prediction causality is usually guaranteed by considering past, present and future samples in a line-by-line scanning of the image. The concept of causality and prediction for images is widely discussed in a recent book by Jain [Jain89]. Two-dimensional DPCM encoders for images may achieve J to 3 bits per pixel at a minimum.

Using "pyramid structures" [Burt83] the achievable bit rate can be brought down to around 1.0 bits per pixel. In this technique a pyramid is build by first convolving the image with a weighting function. The result ing low-pass image is next used as a prediction by subtracting it from the original image. This prediction error image is then to be encoded. The next level of the pyramid is made by subsampling the low-pass image. Sub sequent lower levels are constructed by repeating the process of filtering, subtracting and downsampling on the downsampled image. By encoding each prediction error image separately, a data reduction can be achieved at relatively low distortions. Because of the low-pass filtering and the down-sampling involved, this technique resembles the subband coding method as will be discussed in this thesis. However, one of the major differences lies in the fact that the number of data samples after splitting an image into subbands equals the number of pixels, whereas in pyramid encoding more samples are generated that need to be encoded. Just as in subband coding this scheme is also particularly suitable for the progressive transmission of images.

(20)

Other image coding techniques are explicitly based on properties of the human visual system (HVS). Often they try to exploit sensitivities of the eye to certain structures [Grah67] or rely on transforms related to the operation of neurons [Heid85,Saki82]. A review is given by Kunt et al. on techniques that use contour-texture descriptions and methods using local operators and combining their output in a suitable way [Kunt85].

A direct extension from the quantization of ID signals to multi-di mensional signal quantization is "Vector Quantization" (VQ), or "pattern matching quantization". There is a review on VQ including examples of image coding by Gray [Gray84]. In this technique an image is subdivided into blocks of, for instance, 4x4 pixels. Each block, which can also be considered as a 16-dimensional vector, is next compared with a limited set of representative blocks also known as the codebook. The block is en coded/decoded with the best matching block. Depending on the robustness of the codebook bit rates of 0.60 to 0.80 bits per pixel can be achieved. In Chapter 3 VQ will be discussed in more detail.

Image encoding schemes that yield high compression ratios at a rela tively low distortion can be obtained by applying block transform coding [Clar85]. In that technique the image is divided into blocks of, for instance, 8x8 pixels. Next, each block is transformed using a two-dimensional or thogonal transform that tries to decorrelate the block elements. The most frequently applied transform is the "Discrete Cosine Transform" (DCT) [Chen77,Chen84J. The transform coefficients are encoded, transmitted and decoded. For the reconstruction the inverse transform is applied. Using these techniques it is possible to compress the image to between approxi mately 0.5 and 1.0 bits per pixel. However, due to the division into blocks of the image, the distortion in the reconstruction will be the most visible at the block boundaries, and manifests itself as a "blocking effect". Note, that the technique of vector quantization as described above will result in the same artefacts. This type of distortion is experienced by the human observer as unnatural and can therefore be quite annoying.

(21)

1.4 Scope of the thesis

Subband coding was introduced by Crochiere et al. in 1976 [Croc76j for speech. Since then, considerable attention has been devoted to subband coding of speech and it has proven to be a powerful technique for the medium and low bandwidth source encoding of speech. The basic idea of subband coding is to split up the signal frequency band into subbands and to encode each band with a coder and bit rate accurately matched to the statistics of that particular band. For speech coding two advantages of taking this approach have been found important. Firstly, the quantization noise generated in a particular band is largely confined to that particular band in the reconstruction, and is not allowed to spread to other bands. In this manner it is prevented that a low-level input in one frequency range is being masked by the quantization noise in another frequency range. Sec ondly, by dynamically assigning bits to each subband the noise spectrum can be shaped according to the subjective noise perception of the human ear ("noise shaping").

The general subband coding scheme is as shown in Figure 1.4. Typi cally, the input signal is first split into subbands. Next the subbands are encoded, transmitted (or stored) and decoded. Finally, the signal is re constructed again using the decoded subbands. In this case the channel in Figure 1.4 represents the communication channel including the channel encoder and decoder.

splitting —»

—•

encoding » ch annel » decoding —»

—»

reconstr.

Figure 1.4: General subband coding scheme.

The problem of splitting a signal into subbands and then reconstructing it again is usually looked upon as a pure signal processing problem. The ex act signal characteristics and the nature of the application is not given any attention. Effectively this means that errors due to encoding, transmission and decoding are neglected. In that case the splitting and reconstruction

(22)

blocks in Figure 1.4 become attached and the design is directed at perfect, or nearly perfect reconstruction. For splitting a one-dimensional signal into two subbands the "Quadrature Mirror Filter" (QMF) technique was intro duced by Croisier, Esteban and Galand [Croi76,Este77]. In this technique the signal is split using "Finite Impulse Response" (FIR) filters that are relatively easy to design and to implement. The reconstruction is nearly perfect in the absence of coding errors. The QMF method and other, perfect reconstruction techniques are reviewed by Vaidyanathan [Vaid87]. The splitting and reconstruction of one-dimensional signals is dealt with in Chapter 2 of this thesis. Besides that, the necessary extension to two dimensions is given, which was first published by Vetterli [Vett84]. Finally, the implementation of the filter banks and the image boundary values are discussed.

The encoding of the subbands is usually carried out per subband sepa rately. However, it is possible to jointly encode the subbands by designing a "Vector Quantizer" (VQ) across the subbands. This will be done in Chap ter 3. Separate encoding of the subbands will be investigated in Chapter 4. For that purpose the subband statistics will be measured and the quan tizers designed. A bit allocation algorithm is next derived to optimally distribute the available bits over the subbands. Also the extension to color images will be made.

Results of subband coding and comparisons to other coding techniques are given in Chapter 5. It is shown that by using subband coding, slightly higher bit rate reductions can be achieved than when using transform coding. Moreover, the type of distortion that occurs with subband coding looks more natural than the blocking effect and is therefore less annoying. Therefore, in a subjective sense, subband coding is usually perceived as preferable to block transform coding techniques such as the block DCT encoding.

As mentioned above, the subband filtering design and the actual en coding of the subbands are generally viewed as two separate problems. However, by introducing a mathematical model for a Lloyd-Max quantizer it is possible to analyze the quantization errors and the impact on the filtering. In Chapter 6 it is shown how, by doing so, different types of errors can be distinguished, which leads to more insight into the structure of image subband coders using Lloyd-Max quantizers.

Finally, in Chapter 7 some special topics in the area of subband coding are elaborated. Firstly, subband coding is shown to be very suitable for the progressive transmission of images. A subband coding scheme will be

(23)

given and examples of progressively transmitted images are shown. Next, the extension to subband coding of image sequences is made. Applications range from videophone, teleconferencing and standard digital TV to high definition TV (HDTV). Also for these new areas subband coding seems to be a very promising technique. By designing a subband coder for low bit rate coding it will be demonstrated that a trade-off between spatial and temporal resolution is made automatically, as a function of the amount of motion within the sequence. This property is closely related to the characteristics of the human visual system. Finally, the impact of channel errors on the subband encoded data is investigated. A dynamic error protection scheme will be devised, protecting the subbands and the bits within each subband according to their importance.

(24)

Chapter 2 The Subband Filtering

Problem

In this chapter we will deal with the problem of splitting an image into subbands, and with the closely related reconstruction stage. First, in Sec tion 2.1, the elementary two channel filter bank for one-dimensional (ID) signals will be described. Since it is a fundamental part of the filter bank structure, the subjects of decimation and interpolation will be briefly re viewed. The equations for splitting and reconstruction will then be given and under the assumption that the coding errors are neglectable the in put/output relation of the filter bank will be derived. In the case of perfect splitting and reconstruction filters it can easily be shown that this rela tion yields perfect reconstruction. However, in practical cases, when for instance FIR filters are used for splitting and reconstruction, the recon structed signal will suffer from aliasing and special measures have to be taken to remove these errors. A technique will be described that explicitly cancels out the aliasing errors. This method is known as the Quadrature Mirror Filter (QMF) technique and was first introduced by Croisier et

al. in 1976 [Croi76]. The extension to two-dimensional signals that is re

quired for images will be made for the QMF technique in Section 2.2. The multi-dimensional splitting and reconstruction problem is next reduced to the one-dimensional case by using separable filters. This solution was first published by Vetterli in 1984 [Vett84] and has been applied to many subband coding schemes for images [Wood86a,Wood86b,Ghar88,West88b]. Because the elementary two-dimensional splitting method yields only four subbands, in Section 2.3 splitting tree structures will be introduced. This method allows for many possible subdivisions of the image frequency band,

(25)

while no special filter techniques need t o be applied. Finally, in order to avoid boundary effects due to the finite extent of the image in Section 2.4 suitable boundary values will be derived.

2.1 The two channel filter bank

In t h e case of a two channel filter bank, typically t h e input signal is split into a low-pass and a high-pass subband. T h e general one-dimensional two channel filter bank is shown in Figure 2.1. The input signal x(m) is

i/o (m) x0(m) xo[m) 2/o(m)

x(m) ffoM ffiM 2 : 1 | codec _{1 : 2 T} Fo{v) x(m) 2 : 1 | codec 1 : 2 T _{f i M}

3/i (m) xi(m) xi{m) 2/i (m)

Figure 2.1: T h e general two channel splitting and reconstruction scheme ("codec" stands for encoding, transmission and decoding).

first band-pass filtered by respectively t h e low-pass filter HQ{U) and the

high-pass filter HI(LJ). These filters are also called t h e analysis filters. As

a result, two filtered versions of the signal, 2/o(m) and 2/1 (m), are obtained.

Due t o the band-pass filtering these signals have a b a n d w i d t h t h a t is half the bandwidth of the original input signal. Therefore the filtered signals

are next subsampled by a factor 2, resulting in the subbands x0(m) and

X\{m). Due t o t h e downsampling, the n u m b e r of samples per subband is now half the number of samples of the input signal. Note t h a t the total number of samples has not increased after the splitting into subbands.

After encoding, transmission and decoding t h e signal needs to be re

constructed from the decoded subbands. First, the subbands x0(m) and

x i ( m ) are upsampled again to the full b a n d w i d t h signals y0(m) and 2/1 (m)

-Next, t h e upsampled subbands are once again band-pass filtered using re

(26)

synthesis filters). Finally, the filter results are added to obtain the recon

structed signal x(m). In the following we will describe the system more in detail, and make specific choices for the splitting and reconstruction filters.

2.1.1 Decimation and interpolation

Because the sampling rates within the entire process of splitting and re construction are different, the subband filter banks are often also referred to as multirate filter banks. From Figure 2.1 it can be seen that the down-sampling and updown-sampling play an important role in the signal splitting and reconstruction. Therefore, before proceeding with a Fourier domain description of the splitting and reconstruction stages, here we will first briefly review the downsampling and upsampling techniques. Details and proofs can be found, for instance, in [Croc83].

The process of downsampling or decimation by an integer factor 2 con sists of taking every second sample from an input signal x(m), or:

y(m) = i(2m). (2.1) Note that the decimation procedure is linear, but not shift invariant

[Vaid87j. The Fourier domain description for decimation is given by

v ; 2 LX& + X& + 7T) , (2.2) v2 ' v2

where X{u) and Y(u>) are the Fourier transforms of respectively the input signal x(m) and the decimated signal y(m). The decimation process in the frequency domain is shown schematically in Figure 2.2. It can be seen that compression in the time domain (decimation) results in expansion in the frequency domain. The overlapping areas of the bottom part of Figure 2.2 indicate the aliasing errors due to subsampling. This effect can also be deduced from Eq. 2.2, where the signals X ( | ) and X ( | + n) are added. Consequently, if the input signal has a bandwidth of only [—f» f ] no alias ing errors will occur.

Upsampling or interpolation of a signal with a factor 2 is done by inserting a zero between every sample of the input signal x(m):

y(2m) = x[m),

(27)

\ / 1 - 2 7 T - 2 ? r \X(U)\ 1 i 0

|i»l

i 2 ( V 3 / \ / , 27T

|X(?)||(f+

2TT w w

Figure 2.2: Decimation by a factor 2, with the input signal (top) and the decimated signal (bottom).

The Fourier domain equivalent is given by

Y[u) = X{2u). (2.4)

The effect of the time domain expansion is a compression in the Fourier domain. This is illustrated in Figure 2.3. Since Y(UJ) has replicas (or im

ages) of the original spectrum, the interpolator is said to cause an imaging

effect. To remove the images the interpolation stage is usually followed by band-pass filtering with "interpolation filters". It can be seen from Figure 2.1 that in the subband reconstruction stage the imaging effect is canceled by suitable band-pass filtering.

2.1.2 The i n p u t / o u t p u t relation

In Figure 2.1 the input signal x(m) is first filtered by a low-pass filter

H0{u) and a high-pass filter Hi{u>), respectively. The resulting signals as

described in the frequency domain are

f YO(LJ) = H0(U)X{u),

(28)

IA-MI

\ / / \ ■ - 2 T T

A/

-2?r 1 i 0

I^MI

ï

V

(

V

) 1 2?r

V\

i 2TT \ / ,

A

w w

Figure 2.3: Interpolation by a factor 2, with the input signal (top) and the interpolated signal (bottom).

where X(u) is the Fourier transform of the input signal x{m). Next, the filter output will be decimated by a factor 2. Combining Eqs. 2.2 and 2.5 yields the subband signals

XoM = i[tf

0

(!)x(f) + #o(! + ^ T O + ^)],

(2.6)

After encoding, transmission and decoding of the two subbands we obtain the decoded subband signals X o M and Xi(ui). These subbands are first interpolated with a factor 2. Following Eq. 2.4 we get:

YoM = X0(2w),

YxM = ^(2w).

(2.7)

Since the interpolation stage introduced images of the original subband spectrum, this is followed by band-pass filtering. In order to put each (decoded) subband at its original position in the signal spectrum the signal y0(w) is low-pass filtered by FO{OJ) and Yi(w) is high-pass filtered using

(29)

Fi(u). Finally, the filter results are added tó obtain the reconstructed signal:

X[u) = F0{u)X0{2u) + JF1(w)X1(2w). (2.8)

By neglecting the coding errors it is now possible to combine t h e equa tions for splitting (Eq. 2.6) and reconstruction (Eq. 2.8) to obtain the i n p u t / o u t p u t relation

X{u) = \F0{Lj)[H0{u)X{u) + H0{uj + n)X{üj + 7r)} +

+ ^Fx(w) [#x(u;)X(u>) + H1{u + n)X{u + ir)]. (2.9)

This relation can be rewritten by grouping the p a r t s consisting of the origi nal signal X(ui) and the p a r t s with the shifted signal X(uj+n), representing the aliasing error effects:

X{u) = ±[F0{u)Ho{u)+F1{u>)H1{u>)]X{u) +

+ - [F0{u)H0{u> + ir)+ F i ( w ) ^ i ( w + «■)] X{W + TT).(2.10)

Clearly, for perfect reconstruction the second t e r m on the right h a n d side of Eq. 2.10 is undesired and must be set to zero. Equally, one wants the first t e r m on the right h a n d side t o be equal to X(u). In the case of perfect low-pass and high-pass filter pairs it can be easily seen t h a t t h e aliasing

t e r m vanishes, due to t h e non-overlap of the filters F0(w) and H0(u) + TT)

and of FI(LJ) and Hi(w + n). Also because of the perfect filters, we then

have FO(LJ)HO{W) + .Fi(w)#i(u;) = 1, leaving perfect reconstruction (apart from the constant factor | ) . However, in practice when, for instance, F I R filters are used special measures have to be taken. .

In Eq. 2.10 the second term, which represents the aliasing, can be made to disappear easily, by setting

Fo(u) = 2H1{u + x), , ^

2.11

Fi{u) = - 2 # O ( W + TT).

For convenience, the factor 2 is already introduced here to remove the constant | in the first t e r m on t h e right hand side of Eq. 2.10. Once

(30)

the aliasing is canceled, the two channel filter bank becomes a linear and shift-invariant system which is described by the transfer function

T(") = w 4 = [HoMHiiu + TT) - H^H^u + *)]. (2.12)

In the ideal case T(UJ) is a pure delay, that is, T{u>) = e"J u D. This means,

that the reconstructed signal is a delayed version of the input signal, or

x(rn) = x(m + D). However, in general T(w) represents a distortion and

is therefore called the distortion transfer function, where |T(w)| is the amplitude distortion and arg{T(w)} is the phase distortion.

2.1.3 Q u a d r a t u r e Mirror Filters

Since the introduction in 1976 by Croisier, Esteban and Galand [Croi76] Quadrature Mirror Filter (QMF) banks have been widely used in both speech and image subband coding. The QMF approach starts off with the two channel filter bank as discussed in the previous section, by choosing a low-pass/high-pass filter pair according to

{ ffi(w) = H[uj + 7r),

where H{u) is a low-pass filter. The name quadrature mirror filter derives from the fact that the response of Hi(ui) is the mirror image of the response of H0{u), with respect to the frequency 7r/2 (which is a quarter of the

sampling frequency). The transfer function then becomes

T ( « ) = [H(u)2-H{u> + 7r)*]. (2.14)

It can be shown that either the phase distortion a.Tg{T[u)} or the ampli tude distortion |T(w)| can be eliminated, but not both. If we accept a small phase distortion, it is possible to realize |T(w)| = 1 by using Infi

nite Impulse Response (IIR) filters. On the other hand, by applying Finite Impulse Response (FIR) filters, the phase distortion can be set to zero,

leaving a small amplitude distortion. In practice it is easier to use and design FIR filters, and further, both in images and in speech phase errors are generally less appreciated than amplitude distortion.

To remove the phase distortion, typically, quadrature mirror filters are symmetrical, that is, linear phase FIR filters of the order L — 1, having L

(31)

filter taps. This filter choice results in

h{l) = h{L-l-l), l = 0,...,L-l, (2.15)

and in the Fourier domain

^ ( w ) = | f f ( w ) | e -,'ü' (ifi) . (2.16)

Substitution of Eq. 2.16 into Eq. 2.14 yields

T V ) = {\H{u)\2 - {-l^lHiu + n)\2] e " ^ ^ -1) . (2.17)

From Eq. 2.17 it can be seen that by choosing (L — l) to be even we get

T{u) = 0 for w = | , which is a serious amplitude distortion. Therefore

(L — l) must be odd and finally the transfer function for the two-channel QMF scheme is obtained:

T{u) = [\H{u)|2 + \H(u + TT)I2] e - ' ^ -1' . (2.18)

Filter design of QMF's therefore incorporates the design of a low-pass fil ter H(u) with the constraint that [|#(w)|2 + \H(u + 7r)|2] = 1. However,

this constraint can only be met for L = 2 and for L —► oo. The case for which L = 2 would yield a very poor low-pass filter with a wide transi tion band and is therefore not useful for subband coding where we would like to have filters that can separate the subbands as well as possible. Fortunately, in practice it is possible to realize filters with not too many taps (16, for example) that have both good low-pass filter properties and [|i^(w)|2 + \H(u + 7r)|2] SS 1. Examples of filter design and the actual filter

coefficients can be found in [John80j. The filters as used in this thesis are taken from [John80] and are listed in Appendix C.

Finally, it must be mentioned here that there are filter techniques that can meet the requirements of zero phase distortion and no amplitude dis tortion simultaneously. Although these filter banks are sometimes also referred to as QMF banks ([Vaid87]), the technique was first introduced by Smith and Barnwell in 1984 [Smit84,Smit86] under the name of Con jugate Quadrature Filters (CQF's). The advantage over QMF's is that in principle (that is, without coding errors) the CQF technique yields perfect reconstruction. In contrast to Eq. 2.13 the filters are chosen to be

(32)

It can be shown that by taking this approach the filters H0(UJ) and Hi{u>)

can be designed for perfect reconstruction [Smit86]. However, in an exper imental comparison between the QMF and the CQF techniques as applied to image subband coding it was shown that the performance of both coding schemes is nearly identical [Böck88]. In fact, the reconstruction error in the absence of coding is very small for the QMF technique and can be ne glected, especially in the presence of subband encoding errors. Therefore, and also because the implementation and design of the QMF's is easier, it is not necessary to use CQF's for perfect reconstruction in the absence of coding errors.

2.2 T h e extension t o two-dimensional

signals

The extension to multi-dimensional signals was first published by Vet-terli in 1984 [Vett84] for the QMF technique. The application to images has since then proven to be very suitable for the subband encoding of images. Just as in the case of one-dimensional signals, a fundamental splitting scheme can be considered for two-dimensional signals, where the input signal is split into four subbands. The 2D frequency band division is shown schematically in Figure 2.4. Downsampling and upsampling in

7T 11 01 11 10 00 10 11 01 11 — 7T

Figure 2.4: Two-dimensional frequency band division into four subbands. the two-dimensional case is a straightforward extension from ID signals.

(33)

Considering a 2D input signal x(m, n), the 2D output signal y{m,n) for respectively decimation and interpolation is given by:

D e c i m a t i o n : I n t e r p o l a t i o n : ' y(m,n) = x(2m,2n), y(2m,2n) = x(m,n), y{2m,2n + l) = 0, j / ( 2 m + l , 2 n ) = 0, y{2m+ l , 2 n + l) = 0. (2.20) (2.21)

Using these definitions and extending the equations that were derived in the previous section, the two-dimensional input/output relation is given by:

^(wo.wi) = 1 1 1

k=0£=0

1 1

Y2 J2 Hij{u0 + kit, ui + ln)Fij(u>o, u>i)

i=0j=0

"(2.22)

In this two-dimensional case we have three aliasing terms, namely for

{k,C) T^ (0,0). In contrast to the one-dimensional filter bank, it is not

possible now to eliminate the aliasing errors by simply choosing certain re construction filters Fij(uo,ui). However, as was shown by Vetterli [Vett84], one way of solving this problem is by taking separable filters, that is,

#,-(wo,wi) = #(wo)*>(wi).

(2.23)

By substituting these two relations into the 2D input/output relation of Eq. 2.22 it is easy to show that all three aliasing terms vanish when the ID QMF's according to Eq. 2.13 are chosen. The 2D transfer function can then be written as:

T(w0,wi) =

(34)

= [\H(u0)\2 + \H(u0 + TT)|2] e - ^ ^ -1) x

x [iH^l2 + \H{Ul + TT)|2] e - ^ ' ^ -1» . (2.24)

Comparing this with Eq. 2.18 it can be seen that the 2D transfer function is separable and therefore the 2D filter design problem is reduced to the one-dimensional case. Of course, in addition to the advantage of easier design, the complexity of the implementation of separable FIR filters is also much lower than with non-separable filters.

2.3 Tree structures for images

In image subband coding it is often not sufficient to divide the image frequency band into only four subbands and further splitting is desired. One method to obtain more image subbands is to use a filter technique that has a filter bank with more than two band-pass filters. Such tech niques are discussed for instance in [Vaid87]. Another splitting technique makes use of a tree structure. This method has been succesfully applied to image subband coding, see for instance [Wood86b,Ghar88,West88b]. A one-dimensional tree structure for splitting a ID signal consists of building blocks containing the basic two channel QMF bank. The signal is first split up into two subbands and next the subbands can be split up again into smaller subbands using the same basic two channel QMF. In this manner several different frequency band divisions are possible.

For 2D signals (images) there are two different tree structures that can be discerned, a separable and a non-separable tree. In the separable tree structure, first all the rows are split into subbands using a certain one-dimensional tree, after which all the columns are split into subbands, again using a one-dimensional tree. Of course, the order in which the rows and columns are split can be interchanged.

The non-separable splitting tree is constructed using building blocks that split an image into 4 subbands, as described in Section 2.2. Each building block consists of first splitting the signal rows and then splitting the signal columns into 2 subbands. In Figure 2.5 both types of splitting trees are illustrated by splitting the image into 9 subbands using the sepa rable tree, and into 13 subbands using the non-separable tree. Note, that although the resulting frequency band divisions generally are different for both splitting methods, some splitting schemes can be achieved using both tree structures. An example is the frequency band division with 16 equally

(35)

ID splitting tree for all rows ID splitting tree for all columns

Figure 2.5: Tree structures to split an image into subbands: separa ble tree (top) and non-separable tree (bottom). ("2" means splitting into two one-dimensional subbands and "4" means splitting into four two-dimensional subbands). The image frequency band division at each stage is shown at the bottom of each tree.

(36)

sized subbands as used in [Wood86b] and in [West88bj. That particular splitting scheme will also be used in the next chapter. In Chapter 4 we will investigate which splitting method is the most appropriate when the subbands are encoded separately.

2.4 B o u n d a r y values

Subband coding has been applied to both speech and images, and although the coding concept is similar for both types of signals, in general different filtering and coding techniques are employed. Except for the differences in dimensionality and statistical properties, one of the major differences lies in the limited support of an image, while speech is virtually of infinite length. This fundamental difference is of consequence for both the bit allocation and the filtering technique. The finite data length of the image can often be a problem at the boundaries in many image processing techniques, for example in image restoration [Wood85]. In subband coding of images we encounter a similar problem, for which up till now three different solutions have been used in literature.

Woods and O'Neil [Wood86b] have implemented the FIR filters in the QMF banks by means of the Fast Fourier Transform (FFT), thus implicitly employing circular convolution. This filtering method is consistent with the theory, which implies that the overall system has an input/output transfer function that is constant over the entire signal. However, the major ob jection to using circular boundary values is that intensity jumps between the adjacent signal boundaries may occur. These discontinuities are in general not encoded accurately enough to prevent the reconstructed im age from having (minor) visual boundary effects. Of course, in absence of coding errors these boundary effects do not occur. One method to avoid the intensity jumps at the boundaries is to extend the data by means of either symmetrical or repeated boundary values. Karlsson and Vetterli have shown that the repeated boundary values can best be used for sub-band coding [Karl89], leaving only very small errors at the boundaries. The third method is proposed by Smith and Eddins [Smit87], who sym metrically extend the entire signal to double dimensions prior to filtering. These boundary values do not introduce intensity jumps, while the overall transfer function is the same as with circular convolution.

The technique as proposed in [Smit87] is computationally of high com plexity. In that approach the signal is first symmetrically extended to

(37)

double dimensions. Next the extended signal is low-pass and high-pass fil tered and the filter outputs are downsampled by a factor 2. To obtain the two subbands a window is applied on the decimator outputs, selecting only half of the samples. For the reconstruction the inverse procedure is fol lowed. First the subbands are symmetrically extended again, upsampled, filtered and added to yield the reconstructed extended signal. By applying a window, finally the reconstructed signal is obtained. In this straightfor ward method the number of multiplications needed for the splitting into two subbands is 4 x M x L, where M is the number of input data samples and L is the number of taps of the QMF.

In this section we will independently derive a far more efficient imple mentation of the splitting and reconstruction procedures while also using symmetrical boundary values. For that purpose, first the spatial domain filtering equations are needed. Following [Croc8l] it is next shown how a reduction in computation can be achieved. Using the computationally reduced filter equations, the symmetrical boundary values will then be de rived for both the splitting and the reconstruction stage. The number of multiplications now needed is only M x L/2, implying a reduction in com putation of approximately a factor 8 with respect to the straightforward method as in [Smit87]. Finally, an example will be given showing the fa vorable approach of taking symmetrical boundary values rather than using a periodic extension.

2.4.1 Spatial domain description

The splitting of an image into subbands involves convolutions with respec tively a low-pass and a high-pass filter. However, at the boundaries of the image the filtering procedure requires values that lie outside the im age and boundary values need to be supplied. The extension of an image with boundary values is illustrated in Figure 2.6, where L is the size of the filter and A is a constant determining the size of the boundaries. We will first derive the spatial domain description of the ID filter problem for splitting the image into subbands (including the boundary). After that we will proceed with reducing the number of computations.

Starting off with the QMF splitting into two subbands, we will consider a one-dimensional discrete signal, having a finite number of samples, that is, we have the input signal x(m), for m — 0 , . . . , M—1. The splitting filters

h0(k) and hi(k) are the low-pass and high-pass FIR filter respectively, with

(38)

L-l-A

L-l-A _image

A

boundary

Figure 2.6: Image with boundary. can generally be described in the spatial domain by

i - i y0{m) = ^2hQ(k)x(m-k +A), k=0 L-l (2.25) 2/i(m) = ^2 h\{k)x[m — k + A). k=0

The constant A represents the filter delay and according to Figure 2.6 determines the right and lower boundary sizes of the image. The left and upper boundary sizes are then equal to L — 1 — A. Consistent with the QMF filter choice of Eq. 2.13 the filters are described with a prototype low-pass filter h(k) according to

h0{k) = h{k),

**M0 = (-1)M*),**

{k = 0,...,L-l) (2.26)

where L is even. After combining the downsampling as defined in Eq. 2.1 with Eqs. 2.25 and 2.26 we get the subband signals in terms of the filter

h(k) and the input signal x{m): x0[m) L-l ]T h(k)x(2m- k + A), k=0 L-l (m = 0 , . . . , M / 2 - l ) (2.27) Xi{m)

= J2(-

1

)

kh

(

k

)

x

i

2m

-

k

+

^)-k=0

The number of multiplications in Eq. 2.27 is equal to 2xM/2xL. However, from Eq. 2.27 we observe that the input signal x[m) is filtered twice using a filter that is different only for the sign of the odd indexed coefficients.

(39)

Therefore, the number of multiplications can be reduced by a factor 2 by first splitting the filters h(k) and (-l)kh(k) into 2 parts, namely for k is

even and for k is odd. In the sequel implying that m = 0 , . . . , M / 2 - 1, the subbands signals can first be rewritten by

L/2-1 x0[m) = Y h(2k)x{2m-2k + A) + Jt=0 L/2-1 L/2-1 + Y h(2k + l)x(2m-2k-l +A), k=° (2.28) Xl{m) = Y h{2k)x{2m-2k + A) + k=0 L/2-1 - Y h{2k + l)x{2m-2k-l + A). k=0

If we next define a sum signal xs(m) and a difference signal Xd{m) as

ƒ x,[m) = xo{m) + xi{m),

\ xd{m) = x0( m ) - x i ( m ) ,

and combine this with Eq. 2.28 we get

(2.29) L/2-1 xa{m) = 2 Y h{2k)x{2m- 2k + A), Jfc=0 L/2-1 (2.30) xd{m) = 2 Y h{2k + \)x{2m - 2k - 1 + A). Jt=0

Hence, if we next define even ("e") and odd ("o") indexed signal values and filter coefficients as

xe(m) — x{2m),

x0{m) = x(2m + l ) ,

and

f he{k) = h{2k),

\ h0{k) = h{2k + 1),

we can rewrite Eq. 2.30 for two different cases of A:

(2.31)

(40)

1. A is even: L / 2 - i A xs(m) = 2 J2 he{k)xe{m- k + —), fc=0 ^ xd(m) = 2 2Z ^0(^)a;o(w - A; H — ) , fc=0 ^ (2.33) 2. >1 is odd: i / 2 - 1 2:s(m) = 2 2 J /ie(/c)£o(»™ — k + A-l ), ife=0 xd(m) = 2 ^ h0{k)xe(m - k H — ) . (2.34)

In either case for A we first convolve the even and odd indexed signal parts

xe(rn) and x0(m) to obtain the signals x${m) and Xd(m), after which the

actual subband signals are finally obtained by

x0{m) = i [xs{m) + xd{m)\,

xi{m) = \[xs{m) - xd{m)}.

(2.35)

The boundary value problem can be seen to occur in either Eq. 2.33 or 2.34, where the actual filtering is performed. The value of A will be determined in the next section, where it will prove to be dependent on the filter length

L.

From either Eq. 2.34 or 2.33 we can deduce that we now have reduced the number of multiplications to 2 x M/2 x L/2. However, we still need boundary values to filter the beginning and the end of the sequences xe(m)

and x0(m). These will be derived in the next section.

For the reconstruction as shown in the right part of Figure 2.1, first the subband signals i0(m) and xi(m) are upsampled by a factor 2 according

to Eq. 2.3. Next, the interpolated subbands öo(m) a nd 2/i(m) are filtered

with the QMF's2/i(£) and -2(-l)*/i(£), and added. For m = 0 , . . . , M - 1 this is described in the spatial domain by

x(m)

L - l

(41)

i - 1

-2Y,{-

l

YK£)yi(m-l + B). (2.36)

t=o

Here B is again a filter delay determining the left and right boundary sizes. By using the same definitions for even and odd indexed filter coefficients and data samples as in Eqs. 2.31 and 2.32, we can relatively easily derive the equations for the reconstruction equation. Here we can again distinguish two cases: 1. B is even: X(.(m) x0(m) 2. B is odd: xe{m) ■ x0{m)

where xs(m) and x<*(m) are defined similar to xs(m) and Xd{m) as in Eq. 2.29.

2 . 4 . 2 S y m m e t r i c b o u n d a r y v a l u e s

The boundary value problem in the splitting stage occurs both in Eq. 2.34 and 2.33. For the convolution of the first and last samples of the sequences

xe(m) and x0{m) extra values are needed. Implementation by means of the

FFT would imply periodic extension of the signals and might introduce intensity jumps at the boundaries that are hard to encode. Therefore, it is preferred that the boundary values constitute a smooth transition between signal and boundary. A solution to this boundary value problem can directly be derived from the QMF symmetry property

h[k) = h[L-l-k), k = 0,...,L-l, (2.39) L/2-1 Bs

= 2 ]T h

e

{l)x

d

{m-l + -),

L

%°-> t (2-37)

= 2 £ h

0

(l)x

g

(m-l+-),

i / 2 - l

= 2 £ h

0

{l)x

t

{m-l +

B-l t-0 i / 2 - l B + \

= 2 £ h

e

{l)x

d

{m-£ + ——),

(2.38)

(42)

by observing, that

h0{k) = h{2k + l)

= h{L-l-2k-l)

= h{2[L/2 - 1 - k])

= he[L/2-l-k), k = 0,...,L/2-l. (2.40)

Consequently, if we use for instance Eq. 2.33 for splitting, then the data samples xe(m) are filtered by he{k). However, the data samples x0{m) are filtered by h0(k) = he(L/2 — 1 — k), which is the reversed sequence of the filter he(k). Therefore, if we would reverse the sequence x0(m) and filter this with he(k) we would obtain a reversed sequence of x<f(ra). In that

case the even and odd indexed signals xe(m) and x0(m) are filtered using exactly the same filter.

We now make use of this filter property by creating a composed sig nal, consisting of the sequence xe(m) connected to the reversed sequence

of x0(m). This composed signal is then filtered as a single signal by means

of cyclic convolution with the filter he(k). Since the two signals xe(m) and

x0(m) are both subsampled versions of the original signal, but one sample

shifted, in general they will connect very well when one of them is reversed, thus showing no intensity jumps. The construction is demonstrated in Fig ure 2.7 where we have taken the 150th image line of the 256x256 image Lena. As can be seen from Figure 2.7(a) and 2.7(b), the two signals are nearly identical and the composed signal as shown in Figure 2.7(c) has no intensity jumps at the connecting boundaries. Since the implementation of the filtering of the composed signal uses cyclic convolution it can be performed in the Fourier domain as well. In this approach we then have (nearly) symmetrical boundary values, which are contained within the sig nal and are thus known to the receiver where the reconstruction is to be performed.

The reconstruction takes place in a similar fashion. Using Eq. 2.38 we first create the sum and difference signals xs(m) and Xd{m) by respectively

adding and subtracting the decoded subband signals xo(m) and xx(m). Next we reverse xd(m), connect it to xa(m) and employ a circular convolu

tion with the filter h0{t) on the composed signal. According to Eq. 2.38 the

filter result will consist of the sequence xe(m) and the reversed sequence

(43)

x0(m)

t

m -♦ m (a) (b) composite signal

1

■♦ m (c)

Figure 2.7: Construction of the data prior to filtering: (a) even indexed samples of the signal, (b) odd indexed samples and (c) attaching of (a) to the reversed sequence of (b), thus yielding symmetric boundary values.

(44)

The choice for A and B (even or odd) depends on the filter length L. How to determine the value for A is demonstrated in Figure 2.8. For the case that L/2 is even, two filter positions are shown, i.e. when the end of

xe(m) is filtered and when by shifting the filter to the next position and

the beginning of the reversed sequence of x0(m) is filtered. In this figure,

L/4 = A/2 (a) xe{m) (b) , i , ■ F r " ' F <• <r

1

, i > r T . i •■ , he(k) or h0{k) reversed x0(m) he(k) or h0[k) L/4 - 1 = A/2 - 1

Figure 2.8: Situation for L/2 is even when going from (a) filtering xe{m)

to (b) filtering the reversed x0[m).

the right boundary size of the sequence xe(m) is equal to L/4. However,

the left boundary size of the reversed sequence x0{m) is the right bound

ary size of x0(m) and is equal to L/4 — 1. Comparing this with Eqs. 2.33

and 2.34 we can conclude that if L/2 is even, then A is even, that is, A must have the same parity as L/2. Further, to meet Eq. 2.33 exactly it is necessary that A/2 = L/4, that is, A = L/2. For the value of B we can follow the same type of reasoning as with A in Figure 2.8. In the case of B, however, we have that B must have a different parity than L/2. In either case, L/2 is even or odd, it will hold that A + B = L — 1.

Finally, to demonstrate the effect of choosing proper boundary values, in Figure 2.9 subband 2 is shown after splitting the same image line as in the example of Figure 2.7 into two subbands. The example is shown for periodic boundary values (dashed line) and for the symmetric boundary values as described above (straight lines). It can clearly be seen from this figure that by using cyclic convolution intensity jumps are created, while

Subband coding of images