Marcin Pazio, dr Krzysztof Cisowski Application of colour image segmentation for localisation and extraction text from imagesSesja: Kształcenie w dziedzinie elektroniki i telekomunikacji.Politechnika Gdańska

Pełen tekst

(1)www.pwt.et.put.poznan.pl. Marcin Pazio, Krzysztof Cisowski University of Technology, Faculty of Electronics, Telecommunications and Informatics Narutowicza 11/12, 80-952 Gdańsk, Poland e-mail: mapa@pg.gda.pl, krci@eti.pg.gda.pl. 2005. Poznańskie Warsztaty Telekomunikacyjne Poznań 8 - 9 grudnia 2005. APPLICATION OF COLOUR IMAGE SEGMENTATION FOR LOCALIZATION AND EXTRACTION TEXT FROM IMAGES. Abstract – In our environment text plays very important rule. Bus numbers, advertisements, street names – are only few examples. On the other hand, OCR software usually fail to recognize text in images captured with camera. Image segmentation, with subsequent contextual analysis of segment parameters, can derive information about text contained within the image.. 3. COLOUR SPACE Source image obtained from the camera is coded using the RGB model. Captured image is translated to the uniform L*a*b* [1] colour space. Conversion from the RGB to L*a*b* system is twostage process. First image is transformed to the XYZ colour space (1).. 1. INTRODUCTION This article contains description of multistage image processing algorithm intended to be use in a portable de vice supporting blind persons. Localizing the text in an image captured with an electronic camera should be the first step before processing the text with OCR (Optical Character Recognition) software. What's more, text should be not only localized, but also extracted from the background, in order to improve OCR accuracy. In addition, shape of the letters in the extracted text should not be distorted. It's obvious, that whole processing should not request any – except image capturing – actions from the operator.. 2. IMAGE SOURCE From the few sources of digital images we can practi cally use only one: digital CCD or CMOS camera. These devices can produce high quality colour image. Unfortunately, both CCD and CMOS image converters are in fact monochromatic devices. Colours are extracted using RGB filter located before the sensor matrix (Fig. 1). For each pixel missing colour components are calculated on basis of values taken from the surrounding pixels. Pixel's missing colour data reconstruction causes severe resolution fall of recorded colour data.. R , G , B∈⟨ 0..1 ⟩.   .   . 2.4. if R≤0.0405 then R:=. R R0.055 else R := 12.92 1.055 G G0.055 else G := 12.92 1.055. 2.4. if G≤0.0405 then G:=. G B0.055 else B := 12.92 1.055 R :=R×100 , G:=G×100, B := B×100. 2.4. if B≤0.0405 then B:=. [][. ][]. X 0.4124 0.3576 0.1805 R Y := 0.2126 0.7152 0.0722 × G Z 0.0193 0.1192 0.9505 B. Next, XYZ model is converted to the L*a*b* by nonlinear formula (2). Main advantage of the L*a*b* space is that it maintains continuity of continuous RGB colour transitions, and allows recognize colour and brightness relations in a way similar to the human eye. X Y Z , y := , z := 95.047 100 108.883 16 3 if x≤0.008856 then x:= x×7.787 else x :=  x 116 16 3 if y≤0.008856 then y := y×7.787 else y :=  y 116 16 3 if z≤0.008856 then z := z×7.787 else z :=  z 116 L :=116 × y−16 a :=500 × x− y b := 200 × y−z  x :=. Fig. 1: Typical RGB filter for CMOS/CCD sensor structure.. PWT 2005 - POZNAŃ 8-9 GRUDNIA 2005. 1). 1/4. 2).

(2) www.pwt.et.put.poznan.pl. 4. IMAGE SEGMENTATION The goal of image segmentation is to split a image into a homogeneous regions, called segments. Some parameters of segments (location, size, colour) will allow to create sets of segments, that may be a text. Segmentation algorithm is an iterative region growing algorithm with automatic seed pixels determination. A. Searching for the seed pixes Seed pixels are pixels where region growing starts. The L*a*b* parameters of seed pixels are pointed in every iteration of the segmentation, using pixels not included into any segment. First, the image colours are indexed, and pixels with each index are counted: pixel: record of L:lightness a:a_component b:b_component end; Image: array [1..width,1..height] of pixel; PixelData: record of color: pixel counter: number of pixels end; IndexTable: Array[1..MaxIndex] of PixelData begin for x:=1 to image_width do for y:=1 to image_height do if image_pixel[x,y] not included into segment then begin ActualIndex:=1; z:=1; IndexedColor:=false; repeat if IndexTable[z].PixelData.color=image[x,y] then begin //color already indexed inc(IndexTable[z].PixelData.counter); IndexedColor:=true; end increment(z); until (z>ActualIndex) or (indexedcolor=true); if indexedcolor=false then //color not yet indexed begin Inc( IndexTable[ActualIndex].PixelData.counter); IndexTable[ActualIndex].PixelData.color:=image[x,y]; Inc(ActualIndex); end; end;. Next, the algotithm seeks MaxIndex maximizing the IndexTable[MaxIndex].PixelData.counter value. Color given by index table at position MaxIndex: IndexTable[ActualIndex].PixelData.color is the color of seed pixel (or pixels). B. Region Growing Algorithm. pixels, i.e. pixels of color obtained in past step, and iteratively adds pixels to segment. begin for x:=1 to width do for y:=1 to height do if Image[x,y]=seed_pixel_color then begin set new segment number p; GrowUpSegment(x,y); end; end;. Recursive GrowUpSegment() procedure first checks if region growth criterion K() is satisfied for current pixel and then check all 4-neighborhood pixels: GrowUpSegment(x,y); begin if Pixel(x,y) not in any segment and K(Image(x,y),seed_pixel_color) then begin Include Pixel(x,y) to segment p; GrowUpSegment(x-1,y); GrowUpSegment(x+1,y); GrowUpSegment(x,y-1); GrowUpSegment(x,y+1); end. K(p1, p2) is criterion of region growth (3).. {. K s  p1, p2  =. true :∥ab p −ab p ∥K false :∥ab p −ab p ∥≥ K 1. 2. 1. 2. }. 3) where : ∥ab p −ab p ∥: euclidean distance of pixels p1 and p2 in L*a*b* space K : constant value 1. 2. Finding of the seed pixels followed by segment growth is executed iteratively until all pixels are included in the segments. L*a*b* parameters of the segment are calculated as a simply average of corresponding colour and luminance values of all pixels included in the segment.. 5. TEXT LOCALIZATION Text localization is based on comparison of segments characteristics in segmented image using several intuitive rules [3]. Usage of these rules for written information is condition of its readability. This rules utilize geometrical and colour parameters of the segments. Algorithm creates sets of segments that answers to the following conditions: 1. Segments should have similar colour ∣L S −L S ∣∣a S −aS ∣∣b S −bS ∣c 4 4) where L , a , b S , L , a , bS : color components of tested segments c4 : constant value k. k. l. k. l. k. l. l. 2. Segments should have similar height. The criterion of including pixels to the segment are based on distance of seed pixel to its 4-neighborhood pixels [2] in the L*a*b* space. Process starts at the seed. PWT 2005 - POZNAŃ 8-9 GRUDNIA 2005. 2/4.

(3) www.pwt.et.put.poznan.pl. c 1. HS c 2 HS k. (5). l. where H S , H S : heights (in pixels) of tested segments c1, c2 : constants k. l. 3. Segments corresponding to the letters should be positioned on a vertical lines ∣Y S −Y S∣ (6) c3 Hi where Y S , Y S : distance of the lowest pixel in the segment to the bottom of the image H i : image height c3 : constant value k. k. l. l. 4. Gap between segments in set of segments creating text should not be to wide. ∣X S − X S ∣ k. Sh. l. c 3. (7). where Y S , Y S : distance leftmost segment to the rightmost segment in set S h : number of segments in set c 4 : constant value k. l. Above rules are used in algorithm creating sets of segments that may be the text :. for S n :=S0 to S max ℕ n :={S n } p:=n1newline repeat if ∃ K k S z , S p  then ℕn :=ℕ n∪{S p } {S z}∈ℕn. (8). p := p1 until p=max where max : maximum segment number ℕ n : set of segments creating text S n : n-th segment Text localizing algorithm treats each image segment as a seed set and compares it to the rest of the image segments. Kk is criterion of including segment to the set. It is satisfied, when each criterion (4..7) are true.. 6. POST-PROCESSING Described image segmentation algorithm uses global image data for segmentation. Such approach gives short processing times, but segments corresponding to letters are distorted. The main reason for these troubles is the. PWT 2005 - POZNAŃ 8-9 GRUDNIA 2005. way digital image is created. As described in section II, colour data used for segment creation (3) are blurred. Therefore segmented image requires post-processing. Segments created by segmentation procedure are de fined good enough to be analyzed by text localizing algo rithm. Information about segments that may be a text can be used for local enhancement of segmentation results. Such approach can speed up process, as it allows to process only small portions of image. create_set_bounding_rectangle; letter_color:= average_color _of_pixels_creating_set_elements; background_color:=average_color _of_rest_of_pixels_within _bounding_rectangle; for x:=0 to bounding_rectangle_width do for y:=0 to bounding_rectangle_height do begin distance_to_letter:=LabDistance( bounding_rectangle_pixel(x,y), letter_Color); distance_to_background( bounding_rectangle_pixel(x,y), background_Color); if distance_to_letter>distance_to_background then output_image_pixel(x,y):=0 else output_image_pixel(x,y):=1 end;. Function LabDistance(pixel1,pixel2) returns distance in L*a*b* space from pixel1 to pixel2 colour. As the result of post-processing procedure the binary image output_image is given.. 7. EXPERIMENTAL RESULTS Image shot by digital camera was used as the source image. A. Source Image The picture was recorded in TIFF format, in order to avoid a compression artifacts. Photograph was taken on the street, at the average lighting conditions. Image is shown on (Fig. 2). B. Segmentation and text localizing results Image segmentation and text localizing results are shown on (Fig. 3). The sets of segments selected as text are marked with the rectangular boxes. Segments contain ing less than 60 pixels were not analyzed, and sets containing more than 3 segments are treated as a text. Such limits speed up process and reduce number of false alarms.. 3/4.

(4) www.pwt.et.put.poznan.pl. C. Post-processing Results of post-processing (Fig. 4) shows the shape of segments treated as the letters of a text.. For better illustration recognized text is located over the source image. False alarms were rejected manually.. 8. CONCLUSIONS Described method is the way for the fast text localization. It takes about 7 seconds on PIV/2,8Ghz computer for shown illustration sized 598x506 pixels to produce presented results. Further image processing should recognize and eliminate distortions of image created by optical reproduction and perspective. Some of these methods are described in [4]. False recognitions can be quickly eliminated by OCR, as the only small portions of source image are checked by OCR. However, being limited to finding at least 3-letter long words, it can be first step in more complex approaches of finding single letters or numbers [5].. REFERENCES [1]. Adobe Photoshop 4.0 Software Development Kit, Adobe Systems Inc., 1996.. [2]. J. Pitas, “Digital Image Processing Algorithms”, Pretience Hall, 1993.. [3]. S.G. Wheeler, G.S. Wheeler, “Typografia Komputerowa”, Exit, Warsaw.. [4]. R. Cattoni, T. Coianiz, S. Messelodi, C.M. Modena, “Geometric Layout Analysis Techniques for Document Image Understanding: a Review”, ITCIRST Technical Report, Trento, Italy, 1998.. [5]. J. Lebiedź, “Określenie Podobieństwa Kształtu Obiektu Do Litery Lub Cyfry”, II KKTI, Gdańsk, 2004.. Fig. 2. Source image from digital camera.. Photography by Piotr Zabłocki. Fig. 3 Segmented image with marked text boxes.. Fig. 4. Image with pasted post-processed text regions.. PWT 2005 - POZNAŃ 8-9 GRUDNIA 2005. 4/4.

(5)