WO2006043097A1

WO2006043097A1 - Processing of digital images

Info

Publication number: WO2006043097A1
Application number: PCT/GB2005/004094
Authority: WO
Inventors: William Frederick George Gallafent
Original assignee: Bourbay Limited
Priority date: 2004-10-21
Filing date: 2005-10-21
Publication date: 2006-04-27
Also published as: GB0423455D0

Abstract

In an embodiment of the invention a digital image is segmented to define one or more image segments, each image segment being part of an image having similar visual characteristics. Next, segmentation data comprising a set of values or ranges of values of parameters representing properties of the image segments are generated. The segmentation may be used to reconstruct the image thereby providing a means to compress the image. The segmentation data may also be used for the purposes of searching, sorting, indexing or categorising the image. The segmentation data is analysed and compared with a mapping between one or more sets of segmentation data and one or more keywords. The keywords mapped onto the analysed segmentation data are then associated with the image. The presence of objects within the image may be determined by analysing the segmentation data of a subset of the image segments and comparing the analysed segmentation data with a mapping between one or more sets of segmentation data and one or more objects.

Description

Processing of Digital Images

Field of the Invention

The present invention relates to processing of digital images. In particular, the present invention relates to generation of data which characterise digital images, for example for the purposes of sorting, searching, indexing, categorising or compression.

Background of the Invention A digital image is usually displayed as an array of pixels, each pixel having specified visual characteristics, and comprises raw pixel data specifying the visual characteristics of each pixel. Visual characteristics includes characteristics such as colour, brightness, hue, saturation and texture. Although this type of data structure is convenient tor displaying the image, it is usually not the most efficient means in terms of quantity of data for storing an image. Furthermore, this type of data structure does not provide useful information regarding the visual content of the irπaye for liie purpose of sorting, searching, indexing or categorising digital images.

Accordingly, techniques have been developed which allow digital images to be represented by data structures requiring smaller quantities of data than a pixel- by-pixel representation thereby allowing image files to be compressed. Techniques have also been developed which associate additional data with an image describing general properties of the image or the conceptual visual content of the image. This aiiows images to be sorted, searched, indexed or categorised on the basis of the additional data.

Current methods for sorting, searching, indexing or categorising digital images tend to consist in the use of a conventional database or indexing system, whereby a record for each image is initially created which contains the image and a small selection of automatically generated metrics of that image {such as dimensions (width and height), and storage format (e.g. JFIF, TIFF or PNG)). This allows rudimentary searching, on the basis of these parameters, to take nlaro Subsequently, additional data, such as keywords which describe the visual content of the image, may be added by a human operator who chooses them on the basis of a visual examination of the image. After this has been done, searching for keywords in the database will produce a set of images which have been assigned those keywords.

It is also possible for keywords to be assigned to an image record automatically, . when the image is found in a context (for example, a web page) which includes textual information which may be considered to relate to the image. In this case, words which occur near the image in its context are considered to be likely to be words which describe the content of the image, and as such are added as keywords to the image record, allowing the image to be returned as a result of a search for those keywords.

One problem with the manual methods described above is that the amount of human input required to describe an image adequately is large. In this case, the amount of time required to process a large number of images will be prohibitive: In addition, human error will mean that images are often assigned incorrect keywords, or fail to have enough keywords assigned that searching is successful. One problem with the automated method described above is that the text which occur near to an image may not always provide a reliable indication of the content of the image. Consequently, keywords may be erroneously assigned to images.

There exist numerous techniques for compressing digital images. Current image compression algorithms fall into two categories: lossy and lossless.

The latter are data compression algorithms which allow the exact reproduction of the original image data to result from a compression - decompression cycle.

The former generally decompose the image in to a mathematical approximation of its content in which "less important" information is discarded. For example, one algorithm describes an image as a set of small tiles, each of which is, in turn, described by a set of coefficients of cosine functions together with a set of colours. [1] In this way, less data is required to describe the image Due to the nature of lossy algorithms however, the reconstructed image is no longer identical to the original, only an approximation, the amount of detail being lost being dependent on the amount of less important information which has been

- discarded. One potential disadvantage of such methods is that in some cases, the information that is lost may be important for subsequent processing of the reconstructed image. In particular, certain processing requires segmentation of images and the image may be reconstructed in such. a way that reconstructed image would not be properly segmented.

. We have aDpreciated the need for a method of processing digital images to generate data which describes the visual characteristics and visual content of the images which is useful for the purposes of sorting, searching, indexing or categorising. We have further appreciated the need to provide such a method requiring minimal human interaction whilst retaining accuracy. We have further appreciated the need for such a method in which the data describing the image retains information required for subsequent processing of the image.

Summary of the Invention

The invention is defined in the independent claims to which reference may now be directed. Preferred features are set out in the dependent claims.

Brief Description of the Figures

The invention will now be described in greater detail with reference to the Figures in which:

Figure 1 shows a first digital image which has been segmented;

Figure 2 shows a table containing data relating to the segments of the first image shown in Figure 1 ;

Figure 3 shows a table containing data relating to an object in the first image shown in Figure 1 ;

Figure 4 shows a second digital image which has been segmented;

Figure 5 shown a table containing data relating to the segments of the second image shown in Figure 4; and

Figure 6 shows a schematic diagram of the steps carried out for a method according to the present invention. Detailed Description

The methods described below may be implemented on any suitable computer system. One embodiment comprises a processor, a display for displaying digital images and user interfaces, and one or more input devices such as a mouse and a keyboard to allow the user to input data and interact with the system. The system also comprises a store or memory for storing data in one or more databases and for storing the necessary computer code that is executed by the processor when the system is used.

In an exemplary method according to the present invention a digital image comprising pixel data is processed to generate a new set of data which describes the visual characteristics of the image and from which the original image may be reconstructed. Advantageously, the new set of data is significantly smaller in size than the original pixel data, but certain information which may be useful for subsequent processing of the reconstructed image is not lost. The new set of data generated in this step may be used to sort, search, index or categorise the image in a database. Advantageously, the new set of data generated in this step is generated automatically.

The present invention provides a number of advantages over prior methods. The data generated according to the present invention is small in size and relatively insignificant compared to the data size of the image itself. The data is useful in the context of many different applications allowing indexing, searching or sorting algorithms to make decisions with a minimal amount of computation. The information is well defined and specified, so that tools to extract it can be written independently of the tools written to create it. The information allows assessments of "degree of similarity" to be made, so that it is possible, for example, to arrange images in order according to how similar they are to a chosen image.

In general, the information allows for improvements in the known processes discussed above, by improving the efficiency and degree of automation with which images are described, by improving the quality of data associated with the image, and consequently by improving the function of tools which perform the searching, indexing and other tasks which use the data as their input.

In a first step a segmentation of the image is performed whereby the image is divided into image segments comprising contiguous regions of pixels. Each image segment contains pixels having visual characteristics such as colour or texture which are considered, similar in the context of the image data. The segmentation is performed so that every pixel in the image is a member of a segment. An image may be segmented for example using the Watershed algorithm- described in our UK patent application number GB 0130210.8 and our

International patent application number PCT/GB2005/000798, or the method of lower thinning, or any other suitable method.

In some segmentation methods, colour space is first segmented to define two or more colour classes, each representing a . grouping of similar colours. Then, contiguous regions of the image containing pixels having colours contained in the same colour class define a particular image segment. An image segment comprising pixels having colours contained in a particular colour class may be said to be associated with that colour class. Since there may be several separate contiguous regions of -the image containing colours belonging to a particular colour class, there may be several image segments associated with a given colour class.

The next step comprises calculating and storing salient data about the segments in the image which may be referred to as segmentation data or metadata. For example, segmentation data may represent information regarding the size, shape, position or other visual characteristics of each image segment, and information regarding relationships between different image segments, or statistical .information relating to groups of the image segments. For example, segmentation data may include the size (such as the number of pixels), and characteristic colour (such as the average colour) for the largest N image segments (N being for example a small number of order 10). The segmentation data may also include the median (or other average) size of the image segments, or the average size of those image segments associated with each colour class The segmentation data may further include an identification of the colour class (B) that an image segment of a colour class (A) is most likely to be adjacent to, for each colour class or other similar derived data. Many other parameters specifying the characteristics of the image segments and their relationships with one another will readily occur to the skilled person.

!n genera!, the segmentation data constitute a data set which characterises the whole image. If a sufficient set of segmentation data is generated, then the original image may be reconstructed using the set of segmentation data. Since such a set of segmentation is smaller in size than the original pixel data, a - —compression of the data representing the image is achieved. Additionally, the segmentation data may be used to sort, search, index or categorise images so that, for example, a user could search for all images in a database having segmentation data matching particular criteria.

One advantageous feature of the present invention is that the segmentation data can be calculated completely automatically, without human supervision thereby increasing the efficiency of processing the digital images.

Using the method described above, images may be searched, sorted, indexed or categorised on the basis of the segmentation data. However, the segmentation data is not necessarily the most convenient type of data for providing information regarding the conceptual visual content of an image. According to a preferred feature of the present invention, identifiers including keywords or other tokens may be assigned to an image representing the visual content of an image. A keyword may be assigned to an image for example based on the visual content of the entire image, or based on the presence of an object within the image. For example, the keyword 'landscape' may be assigned to an image of a landscape and the keyword 'tree' may be assigned to the image by virtue of the presence of a tree in the image.

According to the method described below, as a database of images is built up, a manual assigning of keywords to images is performed. However, as more images are added to the database, keywords may be assigned to images automatically on the basis that simiiar images have already had those keywords assigned to them. The similarity between images may be determined using the automatically generated segmentation data.

Keywords may be associated with images by defining a mapping between sets of values or ranges of values of one or more parameters in the segmentation data and one or more keywords or tokens. The mapping may be a two way mapping defined so that given a set of values or ranges of values, the associated keyword may be determined, and vice versa. In this case, when a keyword is mapped onto a set of values or ranges of values, the set of values or ranges of values may ^■ also bs said to be mapped- onto the keyword, . and vice versa. The mapping may be one to many or many to one.

As a simple example, the segmentation data parameters defining 'shape¹ and 'colour' when having the respective values representing 'circle' and 'yellow' may be mapped onto the keyword 'sun'. In this way, any image whose segmentation data contains those parameters having those values may be assigned the keyword 'sun'. The mapping may be more sophisticated than this simple example so that, for example, the mapping is dependent on the values of other parameters in the segmentation data. For example, the previous example of mapping may be made only if the colour parameters of an image segment surrounding the circular yellow image segment have values representing 'blue' (indicating sky). This distinguishes images containing the sun from other images containing other yellow circular objects. This process of associating keywords with images relies on the mapping between the segmentation data and keywords. In a further example, segmentation data representing image segments having certain reguiar geometrical configurations and/or textures may be mapped onto the keyword 'building'.

Such a mapping may be predefined, for example defined by data generated previously and stored in a mapping database. In one embodiment, the mapping may be at least partly defined by a user manually assigning keywords to images.

In this case, a user views an image and assigns one or more keywords to the image accordingly describing the visual content of the image. The particular set of segmentation data ^"derived from the image may then be mapped onto the keywords input by the user. When only a single image has been assigned keywords, the mapping between the segmentation data and keywords will not in general represent a reliable mapping between the keywords and the visual content of the image. However, when a sufficient number of images have been assigned keywords by the user, there will occur correlations between the specific user inputted keywords and particular sets of values of parameters in the segmentation data. These correlations can then be used to refine the mapping. Accordingly, as more images are added to the database, the mapping between the keywords and sets of values of the segmentation data will become more reliable in the sense that the keywords truly represent the visual content of - - images having segmentation data values which map to those keywords.

In this way, as more images are added to the database of images, the mapping improves so that subsequently images may be automatically assigned keywords with human interaction required only to refine the automated process and correct erroneously assigned keywords. As the number of images increases, and the mapping becomes more reliable, the frequency at which a user is required to correct errors reduces, thereby reducing human interaction.

The above process for assigning keywords to entire images by defining a mapping between those keywords and characteristic sets of values of segmentation data may be applied to specific objects within an image. An object within an image is formed from a group of image segments, being a subset of all the image segments of the image. For example, the group of image segments representing a tree in an image form an object. A mapping between keywords and particular sets of values of segmentation data which correspond to a particular object may be defined. Once such a mapping has been defined any image can be analysed to determine whether it contains a group of image segments whose corresponding set of segmentation data maps onto a particular keyword. If so, then the image may be considered to contain the object associated with that keyword.

Such a mapping may be predefined as described above. However, as with the assignment of keywords to entire images, the mapping may be defined by a process of user interaction. For example, a user may view an image and identify a particular object within the image. This may be achieved by the user manually selecting those image segments representing the identified object. The user also inputs a keyword to be associated with the selected object. Then, the segmentation data values corresponding to the image segments forming the selected object are mapped onto the inputted keyword. This mapping may be stored in an object database. As before, if only one object is used to define the mapping, the mapping may be unreliable. However, as the user manually assigns the same keyword with the same type of object appearing in many different images, the mapping will become more reliable. The mapping defined in the object database may then be used to search for the existence of defined objects - in other images.

In some cases, a specific keyword may be mapped onto several different sets of values or ranges of values of segmentation data parameters. This allows account to be taken of the fact that an object, such as a car for example, may look different when viewed from different angles.

In this way, keywords may be automatically assigned to images by searching for occurrences of defined objects within the images. When a defined object is found within an image, the keyword associated with that object may be assigned to the image.

The automatic assignment of keywords to entire images using the method described above may be more effective for keywords which describe the visual content of images in more general terms, such as 'landscape' or 'buildings'. The automatic assignment of keywords to individual objects may be more effective for keywords which describe more specific aspects of the visual content of images. A certain degree of leeway may be provided for so that, for example, all objects which are within a particular range of similarity may be found within images.

The segmentation data and keyword data generated by the method described above form, in whole or in part, a rich, searchable layer of information pertaining to the image. Analytical characteristics of the image (data about image segments and objects) may be used to perform this searching, for example allowing queries of the type "find me images iike this one" or "find me images with objects like this object". The tokens or keywords associated with images may be used to find other images with which these tokens have previously been associated. The tokens or keywords associated with objects may be used to find images containing objects with which these tokens have been associated.

This data layer provides a means to categorise images completely flexibly according to content, building a database of images to which new images may be added with a high probability that they will be automatically assigned appropriate tokens, based upon their similarity to other, already tokenised, images previously in the set, and their containing objects resembling those previously defined. As - more images a^re intrryji iπpd to the data set. and (by means of human intervention) more categories may be added, and images and objects may be (manually or automatically) assigned certain tokens. Tokens which have been mis-assigned by the automatic process may be removed through human intervention so that the quality of the automatic token assignment process improves. This reassignment/removal of mis-assigned tokens not only improves the quality of subsequent automatic assignments, but also simply improves the quality of the stored data, and thus the efficacy of any searching/indexing etc^* algorithm reliant thereon.

The two main classes of searching are image/object comparison and keyword/token matching. The former occurs when an image is processed by the system to generate information about the segmentation of the image and objects it contains. Similar images or images containing similar objects, or which have similar characteristics, may be found by issuing queries such as "Find me images like this image", "Find me images containing the same objects as this image" or "Find me images containing more than one object like this object". The latter class of searching does not require a sample image, and allows the user to search for images using keywords/tokens by issuing queries such as "Find me images containing a tree", "Find me images without any trees" or "Find me images which are mostly sky").

The invention renders automatic a significant proportion of human input needed to describe an image at the time when it enters the database. When the database is aiready populated with other similar images, the similarities will allow for the new image to be categorised automatically. Human intervention can be used to refine or improve this categorisation, or to assist in the event that the database is too sparsely populated, and does not contain any images to which the new image is considered sufficiently similar; this intervention improves the likelihood that subsequently introduced images will be correctly automatically categorised. 5

By way of example, consider the problem of generating a database containing a variety of pictures. Pictures imported in to the database have their salient objects identified, to facilitate subsequent searching according to key words, or for images containing similar objects to those identified in a new image not already in -19- the database.

Figure 1 shows a simple segmentation of a picture 1 of a single tree 3 on a plain natural landscape 5. The tree itself is divided in to several segments 7, some of which are green, and some brown. The remaining segments 9, 11 in the image 15 are one large pale blue segment for the sky 11 , and three for the terrain 9a, 9b, 9c, one of which happens to be green 9b, the others brown. This constitutes a coarse segmentation of the image, not sufficient to resolve fine detaii, but abie io differentiate and identify key areas of a similar nature.

20 From this segmentation can be derived several basic measures of the nature of segments, as described above. Figure 2 shows a segmentation table 21 containing segmentation data relating to the segmentation of the image shown in Figure 1. This set of segmentation data or characteristics is stored in the database with a reference to this image. In this case, a user intervenes and adds

25 the key words "landscape" and "tree^π to this image's entry in tlϊe imag^~e database. Subsequent searches can thus be made in the database for these key words, or for characteristics of the segmentation matching certain criteria for example, search for images which are at least 20% segments with hue between 100 degrees and 130 degrees, or make a more refined search, including other

30 restrictions on the nature of the segmentation).

If subsequent searching on the basis of objects is required, objects in this image may be identified in order that a database of sets of characteristics with associated keywords and tokens may be constructed. !π order to achieve this, a

35 user indicates groups of segments in the image, identifying each group as an object. In this case, the user identifies all the segments in the tree. The resulting segmentation data are stored in an object table 31 shown in figure 3. The object table shown in Figure 3 is then stored in an object database, along with the word "tree" Similar tables relating to other objects may also be created and stored in the object database. To provide rapid retrieval of images containing an identified object, links to the image database pointing to all the images known to contain this object.

The behaviour of the system as subsequent images are imported will now be described Figure.4 shows a further image 41 which contains several trees and bushes 43 in a landscape and which also contains a large building 45 with mirrored windows.

The segmentation data produced by the segmentation of the image shown in Figure 4 are stored in a second segmentation" table 51 shown in figure 5. The segmentation data for the image shown in Figure 4 differs significantly from that of the image shown in Figure 1. In particular, the trees and bushes are much- smaller than the tree in the first image, and, because of the presence of the building, the relative proportions of dark (trunk) coloured segments in the image are not characteristic of an image containing only trees. It is likely that image 1 will not match image 2, therefore, using any reasonable similarity criteria, such as relative proportions of image of each segment type, most likely neighbour segment type for each segment type, since their segmentation data differ significantly.

Next, objects in the image may be identified. This may be carried out as for the first image, whereby a user identifies groups of segments for each object. When the object database is populated, the new image can be tested against known objects to determine if any similar objects are present in this image. By examining in turn the segments in the image shown in Figure 4, contiguous groups containing segments which match these segmentation data may be discovered. In this example, a connected group of five green segments and two brown segments, which have similar characteristics and therefore segmentation data to those in the known object, may be found, ss indicated "A" in figure 4. A further four segments (two green, two brown) which separately also match this object may also be found, "B" in figure 4. Because other segments in the image, such as the large green ground segment, or the building wall, do not match these criteria (wrong most-likely neighbour segments, or wrong topology (the building is brown but has lots of holes where the windows are), they will not match this object. In this way, objects may automatically be identified in an image. The entry for this image in the image database will include the "tree" keyword, and the "tree" entry in the object database will be updated to include in its image list this new image.

One example process is described diagrammatically in figure 6.

When the database is populated, new images can be queried against the database to determine the presence or absence of objects corresponding to certain key words. This has applications in the field of surveillance in which images from surveillance cameras may be queried to determine for example the presence of a car in the image. This could be used to alert its owner to the fact that it is no longer there (for a negative match), or to aiert the owner of the^~area that it is there (for a positive match), which might raise an alarm if it should not be.

Using segmentation data representing the image segmentation as a description of the image also allows the reproduction of an approximation of the image from the information alone. The segmentation may be based on for example position, colour, texture measure. Since the segmentation data does not represent images exactly the segmentation data represents a fossiiy compressed version of the image. The information retained by this process is far more suitable than that stored by conventional methods such as JPEG, since it allows the compressed image to be restored such that a perfect segmentation may be achieved. This is important for any subsequent automatic processing of the image, being the basic starting point for higher-level analysis. Current lossy approaches have a tendency to keep information which allows the image to be reconstructed in a way which provides the human viewer with an acceptable approximation to the original image, but discards information which allows for a high quality segmentation to be performed. Since the method of the invention retains exactly the information desired, viz the segmentation itself, it is keeping exactly the required information to allow successful performance of subsequent processing.

The higher level derived information, such as details of dimensions, colours and topology of the largest segments of the image, or at a higher level the automatically derived keyword associations for a given signature in these terms

(e.g. a high lightness circular segment might have been associated with a keyword 'sun'), allow for very rapid searching for high level conceptual content; no computation is required at search time to derive this information, since it has been constructed previously and stored with the image.

Current lossy compression methods provide a generic image-wide representation of data, failing to react to the fact that one area of an image may require different treatment from another area. The important information for reconstruction" (fur further automatic processing and for acceptable viewing quality) of a large smoothly graded area is very different from that required for a fine-grained multicoloured surface texture. The segmentation data derived using the invention avoids this problem. Large flat-coloured or smoothly shaded areas for example may be stored as large single image segments, while highly detailed areas are stored as many small segments. Thus more information is automatically stored where it is needed, and less where it is not. Where the invention is applied to the application of lossy compression aiming to reproduce an acceptable viewable version of the compressed image, additional parameters may be stored for each image segment to improve visual quality. For example, a small set of functions such as Bezier spiines or Fourier components and associated coefficients to describe the variation of lightness across the segment, horizontally and vertically may be stored for each image.

A further exemplary method of processing images consists in the following steps:

0 (optional): Generate from the raw image data further channels of pixel information, such that each pixel in the image gains parameters other than those already present (describing the colour of that pixel). These extra channels of data might represent variability (how broad a range of pixel colour values are present in the locality of this pixel). 1 : Using a method such as the multidimensional watershed algorithm described in UK patent application number GB 0130210.8, or the method of lower thinning, or other topological methods, arrive at a classification (grouping) of the colours (or colours + extra data channels as generated in 0) in the image such that each group of colours contains colours which are considered similar in the context of the image data, and every colour present in the image is a member of a group.

2: Divide the image up into segments, such that each segment contains pixels of .colours of only one colour group (also described in UK patent application number GB 0130210.8).

3 (optional): Calculate and store salient data about the colour classification and image segmentation, which might include: position, size (number of pixels), and characteristic colour for the largest N segments (N a small number of order 10); median (or other average) size of segments for each colour class, or in general throughout the image (this constitutes a form of texturai information); coloάr' class (B) that a segment of a colour class (A) is most likely to be adjacent to, for each colour class; or other similar derived data.

4 (optional): Using the high level information generated in (3), together with a previously generated mapping between values (or ranges of values) of certain parameters within that information set, associate a set of key words, or other tokens, with the image, each being chosen because it has been associated previously (by human intervention or automatically as a part of this process) to another image.

Claims

1. A method of processing a digital image comprising the steps of:

segmenting the image to define one or more image segments, each image segment being part of an image having similar visual characteristics; and generating segmentation data comprising a set of values or ranges of values of parameters representing properties of the image segments.

2. The method of claim 1 in which the properties of the image segments includes the relationship of the image segments to each other.

3. The method of claim 1 or 2 comprising the further step of reconstructing the image using the segmentation data.

4. The method of claim 1. 2 or 3 comprising the further step of associating one or more identifiers with the image, the identifiers representing the visual content of the image.

5. The method of claim 4 in which the identifiers include keywords.

6. The method of claim 4 or 5 comprising the further steps of assigning a identifier to the image automatically by analysing the segmentation data of the image; and comparing the analysed segmentation data with a mapping between one or more sets of segmentation data and one or more identifiers.

7. The method of claim 6 comprising the further step of storing the mapping between one or more sets of segmentation data and one or more identifiers in a database.

8. The method of claim 7 comprising the further steps of manually assigning an identifier to an image; and storing a mapping between the segmentation data of the image and the identifier.

9. The method of any preceding claim comprising the further steps of identifying that an object is present in an image by analysing the segmentation data of a subset of the image segments; and comparing the analysed segmentation data with a mapping between one or more sets of segmentation data and one or more objects.

10. The method of claim 9 comprising the further step of assigning an identifier to an image if the image contains an object associated with the identifier.

1 1. The method of claim 7 comprising the further steps of allowing a usεr to select one or more image segments in the image; and storing a mapping between the segmentation data of the selected image segments and an identifier in a database.

12. The method of any of claims 6 to 11 comprising the further step of refining the mapping based on correlations between a plurality of sets of segmentation data which are associated with an identifier or object.

13. The method of any preceding claim comprising the further step of searching, sorting, indexing or categorising a set of images according to the segmentation data and/or the identifiers associated with each image.

14. A system for searching, sorting, indexing, categorising or compressing a digitai image arranged to underiake the method of any preceding claim.