WO2006090318A2

WO2006090318A2 - System for enhancing a digital image

Info

Publication number: WO2006090318A2
Application number: PCT/IB2006/050529
Authority: WO
Inventors: Ahmet Ekin; Jeroen Tegenbosch
Original assignee: Koninklijke Philips Electronics N.V.
Priority date: 2005-02-24
Filing date: 2006-02-17
Publication date: 2006-08-31
Also published as: WO2006090318A3

Abstract

The present invention concerns a system for enhancing a digital image which comprises a detection device (1), provided for detecting objects within the image, and a processing device (2), provided for processing the image by subsets upon detection of an object within the subsets. The detection device (1) mainly comprises a circuit (12) for the detection of text objects within said image, and the processing device (2) comprises a circuit (28) for the determination of the parameters of the text objects that are detected, said determination consisting in applying for each subset local operations to an area within the subset, said local operations varying upon the content of the subset containing this processed area and upon the membership of said processed area to a text object. A refinement circuit (3) allows to perform final classical enhancement operations.

Description

SYSTEM FOR ENHANCING A DIGITAL IMAGE

TECHNICAL FIELD OF THE INVENTION

The present invention generally relates to the field of enhancement of a digital image and especially of text regions in a digital image, and more particularly to a system for enhancing a digital image comprising a detection device, provided for the detection of objects within the image, and a processing device, provided for processing the image by subsets upon detection of an object within said subsets.

BACKGROUND OF THE INVENTION

There are numerous ways to enhance the visual quality of digital images. In many cases enhancement algorithms are local operations applied to the image data in subsets at a low-level, i.e. at the pixel level or in local windows having a small size. Such enhancement algorithms do not take into consideration high level context, i.e. object information, such as faces and texts. The same processing operations are achieved independently of the objects within the image processed. The main reason for preferring low-level to high-level operations for enhancement algorithms has been mainly the difficulty of extracting high-level information and the corresponding increase in computational requirement.

Recently robust object detection process has been made possible by using only very little extra computation. Accordingly, there are methods of enhancing digital images comprising detecting objects within the image and enhancing the image by global operations that are applied to the object detected within the image. Such a method is for example described in document US 2003/0223622 Al that describes a specific enhancement for portrait images. However this method does not enhance the image by local operations and requires a user input for the final judgement of the visual quality and thus is not a completely automatic process. Furthermore, this method focuses on enhancement for faces and is not appropriate for other kind of objects. Other methods, such as the one described in US 2004/0022440, enhance a digital image by detecting objects within the image, and processing the image by region upon detection of an object within the subsets. More precisely, the method disclosed in this document divides the image into subsets, recognizes the type of the subsets and recognizes the type of object within a region as a function of the type of the subsets that form the region. A specific processing is then applied to the region upon the type of object identified. This enhancement is adapted for objects of large size as object detection favours large objects at the expense of discarding small-sized objects. Thus, text objects, which can be of various sizes and can be smaller than the neighbouring objects, are not enhanced.

As indicated above, the known methods comprise enhancing the images by subsets but are not appropriate in the cases of text objects.

SUMMARY OF THE INVENTION

Therefore it is desirable to develop a new way of enhancing a digital image by local operations and especially an image containing text objects. To this end, the invention relates to a system such as defined in the introductory part of the description, and which is characterized in that :

- said detection device comprises a circuit for the detection of text objects within said image; and

- said processing device comprises a circuit for the determination of parameters of the text objects that are detected, said determination consisting in applying, for each subset, local operations to an area within said subset, said local operations varying upon the content of the subset containing said processed area and upon the membership of this processed area to a text object.

Other features of the method of the system according to the invention are further recited in the dependant claims.

BRIEF DESCRIPTION OF THE FIGURES

The present invention will now be described, by way of example, with reference to the accompanying drawings in which :

- Figure 1 illustrates a preferred implementation of a system according to the invention ;

- Figure 2 is a schematic representation of a digital image when said system works ;

- Figure 3 is a graphic of a specific enhancement operation ;

- Figures 4A and 4B are symbolical representations of text areas when the system of figure 1 works.

DETAILED DESCRIPTION

Referring to Figure 1, a system for enhancing a digital image is illustrated. In this implementation, the digital image is a video-frame that is part of a sequence of frames and thus has a previous frame and a following frame. Each digital image is an array of pixels. The illustrated system comprises a detection device 1 and a processing device 2.

The detection device 1 is provided for performing a detection of objects within the image, for example by using the algorithm described in the document "Robust real-time object detection" by Paul Viola and Michael Jones, Second International workshop on statistical and computational theories of vision, Vancouver, Canada, 2001. This device 1 mainly comprises a circuit 12 for detecting text objects within the processed image.

Some existing text detection algorithms exploit the high contrast properties of overlay text regions. In a favourable text detection algorithm, the horizontal and vertical derivatives of the frame where text will be detected are computed first in order to enhance the high contrast regions. It is well known in the image and video processing literature that simple masks approximate the derivative of an image.

After the derivatives are computed for each of the colour channels (or intensity and chrominance channels depending on the selected colour space), the edge orientation feature is computed. The edge orientation feature has first been proposed by Rainer Lienhart and Axel Wernicke "Localizing and Segmenting Text in Images, Videos and Web Pages", IEEE Transactions on Circuits and Systems for Video Technology, Vol. 12, No.4. pp. 256-268, April 2002.

A statistical learning tool can be used to find an optimal text/non-text classifier. Support Vector Machines (SVMs) result in binary classifiers and have nice generalization capabilities. An SVM-based classifier trained with 1,000 text blocks and, at most, 3,000 non-text blocks for which edge orientation features are computed, has provided good results in our experiments. Because it is difficult to find the representative hard-to-classify non-text examples, the popular bootstrapping approach that was introduced by K.K. Sung and T. Poggio in "Example-based learning for view-based human face detection, "IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 1, pp. 39-51, jan. 1998 can be followed. Bootstrap-based training is completed in several iterations and, in each iteration, the resulting classifier is tested on some images that do not contain text. False alarms over this data set represent difficult non-text examples that the current classifier cannot correctly classify. These non-text samples are added to the training set; hence, the non-text training dataset grows and the classifier is retrained with this enlarged dataset.

When a classifier is being trained, an important issue to decide upon is the size of the image blocks that are fed to the classifier because the height of the block determines the smallest detectable font size whereas the width of the block determines the smallest detectable text width. A block size of 12 x 12 is chosen for the training of the classifier because in a typical frame with a height of 400 pixels, it is rare to find smaller front size than 12.

Having computed the SVM-based classifier parameters in the training stage, text detection in a new frame is performed in two stages: 1) Detection of text-candidate blocks by using SVM-based classifier, and 2) binarization of text to extract pixel- accurate text mask for pixel-accurate enhancement. In the first stage, edge orientation features are extracted for every 12x12 window in the image and all pixels in the current window are classified as text or not by the SVM-based classifier. Because text can be larger than 12 pixels, font size independence is achieved by running the classifier with 12 x 12 window size over multiple resolutions, and location independence is achieved by moving the window in horizontal and vertical directions to evaluate the classifier over the whole image. This first stage extracts text-candidate blocks, also called regions of interest (ROI), that need to be further processed to extract binary text mask.

The circuit 12 for the detection of a text in the image is responsible for extraction of text line boundaries, binarization of the text, and extraction of individual words. Initially, the coordinates of the horizontal text lines are computed. For that purpose, edge detection is performed in the region of interest (ROI) to find the high- frequency pixels most of which are expected to be text. Because the ROI is mainly dominated by text, it is expected that the top of a text line will demonstrate an increase in the number of edges whereas the bottom of a text line will show a corresponding fall in the number of edges. Projections along horizontal and/or vertical dimensions are effective descriptors to easily determine such locations. In contrast to intensity projections that are used in many text segmentation algorithms, edge projections are robust to the variations in the colour of the text.

The circuit 12 may also perform a thresholding to extract binary text mask. This step involves automatically computing a threshold value to find the binary and pixel- wise more accurate text mask. The pixels occurring just outside the text line boundaries are defined as background. The threshold value is set such that no pixel outside the detected text lines, which refer to background, is assigned as text pixel. The circuit 12 may also determine a word boundary by performing a morphological closing operation on the identified text pixels and identifying further pixels as text pixels if the further pixels are located within the word boundary. A morphological closing operation and a connected-component labelling algorithm are applied to the resulting text mask to segment individual words. The closing operation joins separate characters in words while connected-component labelling algorithm extracts connected regions (words in this case).

In the embodiment described, the circuit 12 is followed by a circuit 14 provided for performing a mask image creation, this mask image having the label of the object type for each pixel. For example, the value 0 is attributed for pixels that are not part of an object, the value 1 for a pixel being part of a text object and the value 2 for a pixel being part of a face object. Advantageously, the detecting device 1 also comprises a circuit 16 provided for estimating the colours in the image in order to detect the object more accurately. In the embodiment described, this is achieved by comparing the values of the colour parameters of each pixel to the parameters of each neighbours in order to detect the edges of the object more accurately, the mask image being corrected consequently. Advantageously, the device 1 also comprises a circuit 18 provided for determining the parameters of the text objects detected. For example, this circuit 18 permits to determine if a text is horizontal or slanted and also classify the text upon its size. For example this computation of the size of the text line is achieved by taking the absolute difference between the lowest and the highest y-coordinate of the text line. In an alternative embodiment, the size is determined by finding the upper and lower base line coordinates of the text line so that the outline effect due to upper elongated letters and lower elongated characters can be prevented. The height in this case can be assigned as the absolute difference between the lower and the upper baseline y-coordinate.

Furthermore, in an alternative embodiment, the circuit 18 may also include a computation of parameter called herein a distance map for each object type. A distance map for an object can be created by computing the distance of each pixel from the closest object pixel in the image. In such a map, detected object pixels are assigned a value of zero. Without limiting the invention, such a distance map can be computed by running Chamfer distance transformation algorithm over the mask image. The enhancement of the image can be a function of the object type, object properties, as well as the distance to the object.

Furthermore, in the text objects, maximum and minimum luminance values are computed and, if the difference is high and occurs for every character evenly distributed across the whole text-line, it is considered as text shading. After detection of the texts objects within the image, creation of the mask image, and determination of text objects parameters, enhancement of the digital image is achieved by scanning and processing the frame by pixels.

In the processing device 2 that follows the detection device 1, the processing of the digital image is achieved by local, or low-level, operations processing the image data in subsets or local windows, which can be of varying sizes such as for example 3 x 3, 5 x 5 or 13 x 13 pixels, as represented with reference to figure 2. One of these local windows, identified by the reference W in figure 2, is moved along the entire digital image. The centred pixel or elementary group of pixels of the window W being the current processed area, identified by the reference A. For every pixel, the output is defined as a function of the current pixel value and the pixel values of the neighbouring pixels within the local window and as a function of the membership of the processed pixel to an object. This function corresponds to a transformation function of the pixel, i.e. a local operation. The processing device 2 comprises a circuit 22 provided for verifying the membership of a text object. This test is achieved by checking the value of the mask image for the corresponding pixel. The subsequent operations are dependant upon the type of object detected. If the test performed in the circuit 22 reveals that the pixel is not part of a text object, it is followed by a an operation of setting default enhancement parameters, performed in a circuit 24. In the opposite case, if the test reveals that the pixel is part of a text object, an operation of setting specific enhancement parameters for the detected object is performed in a circuit 26. The circuit 24 and 26 are followed by a circuit 28 in which current enhancement parameters are applied to achieve low-level processing operations. In the embodiment described, default settings result in the application of

Luminance Transient Improvement, referred further as LTI and colour Transient Improvement referred as CTI followed by a thinning operation. LTI and CTI are nonlinear sharpening techniques which make edges steeper and create sharpened transition at the beginning and end of each edge. As a result, the perceived sharpness is increased. LTI makes edges steeper at the expense of making a line of which the steeped edge is part of thicker. Thinning operations compensates for the thickening effect of LTI, this combination of operations results in thin lines with steep edges. More precisely, the LTI algorithms make the edge steeper and in a process of doing so, it modifies picture values on both sides of the edge centre, i.e. dark samples are getting darker and bright samples brighter.

In case a text object has been detected, the circuit 28 includes the application of specific processing low-level operations using the corresponding specific enhancement parameters set in the circuit 26. Advantageously, processing operations vary not only upon the type of object the processed pixel is part of, but also upon the parameters of this object, these parameters, such as size and orientation, being determined in the circuit 18. Accordingly, processing varies upon the text size and/or the relative colour between text and background as well as artificial text shading and text inclination.

Finally, a refinement circuit 3 is provided in which further classical enhancement operations are applied to the processed image. For example, object-based optimization of settings or any kind of known processing is applied.

Referring now to figures 3, 4 A and 4B, the details of said specific processing operations will be described. In the embodiment described, the following cases are distinguished. If the text is slanted, this is not an artificial text but a scene text appearing in the image itself. In that case, default parameters are set in the circuit 26 and normal LTI and thinning operations are applied.

If the text is horizontal, then it is considered as an overlay and processed in a specific manner. For a text size smaller than a predetermined size both LTI and thinning operations are inhibited by setting inhibition parameters in the circuit 26 to prevent these operations to be applied in the circuit 28.

When it is determined in the circuit 18 that artificial text shading exists, the parameters are set so that LTI and thinning are not applied for the shaded text.

For a text size larger than a predetermined size and without shading, default LTI and/or CTI parameters are set together with modified thinning parameters. More precisely, an offset is used in order to prevent the thickening effect of the LTI algorithm. In the embodiment described, this offset is applied to the original signal second order derivative. In case of a black-line a positive offset is applied, while in case of white-line a negative offset is.

As represented in figure 3, a modified adaptive second order derivative is formed by pulling down the second order derivative of the edge of the normal text, and by pulling this signal up for the inverse text. Figure 3 represents the detail of this operation. The original edge is represented by curve 30, the signal of the edge after application of the LTI algorithm by curve 32, and the second order derivative of the original signal by curve 34. The offset signal resulting from the application of the method of the invention is represented by curve 36. The median of the detected edge begin and edge end of the original signal taking into account its luminance and/or chrominance, the modified second order derivative is assigned as the new pixel value. Furthermore, when a text object is detected, contrast enhancement is achieved by increasing brightness for normal text and lowering brightness for inverse text.

The result of the application of the process on the invention is shown in figures 4 A and 4B. Figure 4 A represents a text object after a classical LTI operation and figure 4B represents the same text object after implementation of the invention. It appears that said implementation results in a good quality text increasing the visual quality of the image. An adjustment of the parameters of the image enhancement algorithms is provided by using both high level content object features and low level processing operations. The invention uses the default image processing parameters at location where objects do not exist and specific processing parameters for regions of the image which are part of text objects. Furthermore, as a consequence of the specific parameters, the invention turns down the magnitude of peaking and LTI parameters for the text objects and turns off the thinning of text objects by setting inhibition parameters for this algorithm. If text colour is available, the invention increases the contrast between the characters and the background by increasing the brightness of the text. Other embodiments than those described are also possible by using different low-level operations. The invention can be achieved by other devices such as computers and the like or dedicated devices, such devices mainly comprising a unit adapted to detect objects within digital images and especially text objects, and a unit adapted to process the image by subsets upon detection of objects within the subsets, this last unit being adapted to apply, for each subset, local operations to an area within the subset, and said local operations varying upon the content of the subset containing the area and upon the membership of the processed area to a text object.

The invention can also be carried out by a computer program for a processing unit comprising a set of instructions, which, when loaded into said processing unit, causes the processing unit to carry out the method described above.

There are numerous ways of implementing functions by means of items of hardware or software, or both. In this respect, the drawings are very diagrammatic and represent only one possible embodiment of the invention. Thus, although a drawing shows different functions as different blocks, this by no means excludes that a single item of hardware or software carries out several functions. Nor does it exclude that an assembly of items of hardware or software or both carry out a function.

The remarks made herein before demonstrate that the detailed description, with reference to the drawings, illustrates rather than limits the invention. There are numerous alternatives, which fall within the scope of the appended claims. Any reference sign in a claim should not be construed as limiting the claim. The word "comprising" does not exclude the presence of other elements or steps than those listed in a claim. The word "a" or "an" preceding an element or step does not exclude the presence of a plurality of such elements or steps.

Claims

1. A system for enhancing a digital image comprising a detection device, provided for the detection of objects within the image, and a processing device, provided for processing the image by subsets upon detection of an object within said subsets, wherein :

- said detection device comprises a circuit for the detection of text objects within said image ; and

- said processing device comprises a circuit for the determination of parameters of the text objects that are detected, said determination consisting in applying, for each subset, local operations to an area within said subset, said local operations varying upon the content of said subset containing said processed area and upon the membership of this processed area to a text object.

2. A system according to claim 1, wherein said parameters are selected in the group consisting of relative colours between text and background, artificial text shading, text inclination, text size, and the distance from the text pixels.

3. A system according to claim 1, wherein the text object detection circuit is provided for detecting horizontal text line boundaries by detecting the increase and the corresponding decrease in edge projections..

4. A system according to claim 3, wherein the text object detection circuit is further provided for determining a set of pixel values only occurring between the horizontal text line boundaries, by comparing with the pixels just outside the text line boundaries, and identifying pixels as text pixels if the pixels have value from said set of pixel values.

5. A system according to claim 4, wherein the text object detection circuit is further provided for determining a boundary by performing a morphological closing operation on the identified text pixels and identifying further pixels as text pixels if said further pixels are located within the boundary.

6. A system according to claim 5, wherein the text object detection circuit is further provided for creating a mask image of the processed image, where the values of the mask image vary upon the type of object detected.

7. A system according to claim 1, wherein the processing device comprises a circuit in which current enhancement parameters are applied to achieve local, also called low- level processing operations in the form of algorithms with parameters set as a function of the objects detected.

8. A system according to claim 7, wherein the low-level operations applied when processing text regions comprise setting a specific offset in the parameters of at least one algorithm.

9. A system according to claim 7, wherein the low-level operations applied when processing text regions comprise setting inhibition parameters to prevent the application of at least one algorithm.

10. A system according to any of claims 7 to 9, wherein the low-level operations applied when processing regions comprising objects that are not text objects comprise setting default parameters for at least one algorithm.

11. A system according to any of claims 7 to 9, wherein at least one algorithm applied in said circuit achieving low-level operations when processing the image is selected in the group consisting of Luminance Transient Improvement (LTI), Colour Transient Improvement (CTI), brightness modification, contrast modification and thinning operation.

12. A system according to any of claims 1 to 11 , wherein said subset is a sliding window, the centred area of which is processed.

13. A method of enhancing a digital image, characterized in that it comprises:

- a detecting step, provided for detecting objects within said image and especially text objects; and

- a processing step provided for determining parameters of the text objects thus detected by applying, for each subset, local operations to an area within the subset, said local operations varying upon the content of said subset containing said processed area thus processed and upon the membership of the processed area to a text object.

14. Computer program for a processing unit comprising a set of instructions which, when loaded into said processing unit, causes the processing unit to carry out the steps of the method as claimed in claim 13.