|Publication number||WO2013104938 A2|
|Publication date||18 Jul 2013|
|Filing date||9 Jan 2013|
|Priority date||11 Jan 2012|
|Also published as||WO2013104938A3|
|Publication number||PCT/2013/6, PCT/HU/13/000006, PCT/HU/13/00006, PCT/HU/2013/000006, PCT/HU/2013/00006, PCT/HU13/000006, PCT/HU13/00006, PCT/HU13000006, PCT/HU1300006, PCT/HU2013/000006, PCT/HU2013/00006, PCT/HU2013000006, PCT/HU201300006, WO 2013/104938 A2, WO 2013104938 A2, WO 2013104938A2, WO-A2-2013104938, WO2013/104938A2, WO2013104938 A2, WO2013104938A2|
|Applicant||77 Elektronika Muszeripari Kft.|
|Export Citation||BiBTeX, EndNote, RefMan|
|Patent Citations (9), Referenced by (1), Classifications (3), Legal Events (3)|
|External Links: Patentscope, Espacenet|
NEURAL NETWORK AND A METHOD FOR TRAINING THEREOF
The invention relates to a method for training a neural network and to a neural network trained with the method.
There are numerous prior art methods for training neural networks. It is known that the training methods have to be suitable for permitting the neural network to identify the resulting rules in the course of a large number of training steps and to provide the results according to these rules.
Neural networks are especially widely used for example in image recognition methods, for the recognition and categorisation of image elements representing objects, and furthermore preferably for automatic specification of the number of elements belonging to each category. These methods and apparatuses can be applied especially preferably in medical and diagnostic devices, e.g. in automatic analysis of body fluids such as urine or blood. In these methods a large number of training images are fed to a neural network dimensioned in accordance with the size of the images to be recognised and the information therein, and the weights linking the various neurons of the neural network are varied in the function of the correctness of the results provided by the neural network,. One of the best known methods for tuning weights between neurons is the so-called back propagation. This training is carried out in a known manner, and in this respect detailed description can be found in the literature. The process of training can be summarised in that the neural network is 'awarded' if it provides a correct result, and it is 'punished' if it outputs an incorrect one. Methods to train neural networks are disclosed, e.g. in the patent documents US 5,052,043, US 5,796,410, US 5,903,884, US 6,876,966 B1 , US 7,130,776 B2, US 7,286,687 B2, US 7,418,128 B2, US 7,480,409 B2 and US 2009/0196493 A1. Jn the . known solutions, training of the neural network is performed by training images of the same size as the images to be analysed. However, in the case of large size images, the training procedure is extremely time consuming, and an extremely large number of weight values have to be determined during the procedure. The known methods are less suitable for image recognition purposes in which it is necessary to categorise elements appearing in the images on the basis of visual information detectable in the images. This is because in the case of training with full size images, the neural network learns primarily not the particular categories, but those images in which various category elements may be present simultaneously. As a result of all these factors, by the known solutions, the training process for categorisation-based image recognition is less efficient or less tangible or controllable.
A further disadvantage of the known solutions is that they do not comprise steps by which wrong recognitions resulting from uncategorised elements could be managed.
During experiments it has been recognised that if a so-called convolution neural network is applied for categorisation based image recognition, it will be sufficient to perform the training process only for an appropriate sub-network (i.e. network part) of the complete neural network, using training images which are smaller than the images to be analysed, and which only represent up to one category of the elements. According to the invention, 'convolution neural network' means a special type of the neural networks, in which, except for the input layer, each neuron in each layer is in contact with a corresponding neuron block of the previous layer, which neuron block extends over the complete thickness of the layer (in direction z). In addition, in the sub-layers of each layer (in levels consisting of neurons), each neuron is linked with the same weights to the corresponding neurons in the neuron block of the previous layer. The neuron blocks linked to adjacent neurons may in the given case be overlapping. DESCRIPTION OF THE INVENTION
The object of the invention is to provide a training method for neural networks, especially in applications where it is required to categorise the elements appearing in the images. The object of the invention is to provide a training method, which is free of the disadvantages of the prior art solutions to the greatest possible extent. A further object of the invention is to provide a controllable and manageable training method for neural networks adapted for categorisation tasks. An object of the invention furthermore is to provide a training method which results in faster converging and faster adjustment of the weights between the neurons than prior art techniques. An object of the invention is furthermore to provide neural networks generated with the training method mentioned above.
The method according to the invention is defined by the features of claim 1 , and the neural network according to the invention is described by the features of claim 9. Preferred embodiments of the invention are defined in the dependent claims.
BRIEF DESCRIPTION OF THE DRAWINGS
Preferred embodiments of the invention are described below by way of example with reference to the following drawings, where
Fig. 1 is a schematic view of generating a probability map with a neural network according to the invention,
Fig. 2 is a schematic view of the structure of an exemplary convolution neural network according to the invention,
Fig. 3 is a schematic view of the output layer of an exemplary sub-network according to the invention, and
Fig. 4 is a schematic view of the structure of an exemplary sub-network according to the invention.
MODES FOR CARRYING OUT THE INVENTION
The neural network and training method according to the invention will hereinafter be described primarily for the purpose of body fluid analysis, e.g. urine analysis. Such an analysis is an extremely useful practical example of an application where automatic categorisation of elements appearing in photos of samples is required.
According to the invention, an element shown in an image means a visual appearance of any object which can be recognised and categorised. In the photos of a urine sample, objects/elements to be categorised can be by way of example the following:
- bacterium (BAC),
- squamous epithelial cell (EPI),
- non-squamous epithelial cell (NEC),
- red blood cell (RBC),
- white blood cell (WBC), and
- background (BKGND). Of course, in addition to those of above, further elements/objects to be categorised may be discovered in the photos of urine samples. Including the background, altogether and typically 10 to 30 element categories can be set up.
As shown in Fig. 1 , probability maps 111-n belonging to predetermined element categories are generated on the basis of the visual information detectable in image 10. The various probability maps 11 show presence probability distribution of an element of the given category.
The probability maps 1 i-n may be generated also with the same resolution as that of the image 10. However, in the course of elaborating the invention, it has been recognised that for example in the case of high resolution images available in medical diagnostics, it would be extremely time consuming to analyse full resolution probability maps 1 11-n. It has been found that it is sufficient to generate probability maps 1 1 i-n of lower resolution than that of the original image 10, in a way that several pixels of the image 10 are associated with each of the probability values of the probability maps 1 1 i-n. In a preferred realization, a raster point is assigned to 4x4 pixels of the image 10 in the probability map. This probability value represents the presence probability of an element of the given category regardjng the given 4x4 pixels of the image 10. In the course of elaborating the invention it has been proven that a reduction of the resolution does not deteriorate the accuracy of categorisation, because such combined probability values also appropriately represent the presence probability in the given detail of the image, in view of the fact that the typical particle size or object size is larger than 4x4 pixels. The probability maps 11 i-n can be presented also as probability images, each pixel of which carrying visual information according to the magnitude of the probability value, but is can also be considered as a matrix, each of the values of which corresponding to the probability values being present in the given position.
In an exemplary preferred embodiment, images 10 of 1280x960 resolution and probability maps 1 1 -n of 320x240 resolution are applied. For generating the probability maps 1 1 i-n, a convolution neural network trained according to the invention is used. The neural network analyses the visual information appearing in the image, and on the basis of examining this visual information, it determines the probability values in each position for the various categories.
The parameters of an exemplary convolution neural network applicable for generating probability maps according to Fig. 1 and presented in Fig. 2, are given in the following Table 1.
Table 1 On the basis of the visual information detectable in the image 10, the neural network is adapted for generating probability maps 1 1 associated with each of categories of elements in image the 10, which probability map 1 comprises presence probability values of an element of the given category. The neural network is a convolution neural network, which comprises an input layer 20 detecting inputs from the image 10, at least one intermediate layer 21 , and an putput .layer 23 comprising sub-layers of a number corresponding to the number of the categories and providing the presence probability values.
It can be seen that the area of the input layer 20 is slightly larger than the resolution of the image 10. The reason for this is that in order to reduce the edge effect, preferably a mirroring is applied at the edge of the image 10, and through this the minimising of the detrimental effects of edge effect can be ensured in the examination range. The exemplary convolution neural network comprises a single intermediate layer 21. It can be seen that in the direction from the input layer 20 towards the output layer 23 the thickness of layers increases, but their area gradually decreases, and hence the output layer 23 reaches the size of the probability maps 11 in the directions x - y and the number of the probability maps 11 -n in the direction z.
The resolution of the input layer 20 of the neural network is matched with the resolution of the image 10 to be analysed. The input can be the original grey shade or coloured image 10, but preferably in addition to the original image 10, transformed images generated therefrom may also be inputted to the neural network. By way of example, the input layer 20 consists of three sub-layers; they receive on the one hand the pixels of the image 10 to be analysed, and on the other hand the pixels of the two transformed images generated therefrom by means of transformation. The image to be analysed or the image variants of the same field of vision may also be recorded by various technological methods. Such methods can be bright field, dark field or phase contrast microscopy, polarised and non-polarised photography techniques, images taken in several focal planes, colour or black- and-white images, RGB colour images, images taken by illumination from several angles, images generated by detection, or holographic microscopy. Neurons 24 of the intermediate layer 21 following the input layer 20 'see' a 6x6x3 (x, y, z) size neuron block of the input layer 20, i.e. they are linked to the neurons 24 of this neuron block via weights 25. The intermediate layer 21 following the input layer 20 consists of eight sub-layers. In each sub-layer, the adjacent neurons 24 are linked with identical sets of weights to the neuron blocks in the input layer 20 seen by each neuron 24. In the various sub-layers of the intermediate layer 21 , the identically positioned neurons 24 see the same neuron block in the input layer 20, but their sets of weights are different. In the embodiment, regarding the intermediate layer 21 , the lateral overlap of the neuron blocks in the input layer 20 seen by the neurons 24 is four neurons, i.e. the neuron blocks seen by the neurons 24 located side by side are overlapping by four neurons in the directions x and y. The output layer 23 already has 15 sub-layers, and its lateral (x - y) extension is smaller than that of the intermediate layer 21. The neurons 24 in the output layer 23 see neuron blocks of 6x6x8 size in the intermediate layer 21 , with a lateral overlap of four neurons. In Table 1 , the number of weights describing the complete network is presented for each layer in the last column.
As shown in Table 1 , the dimensions of the consecutive layers in the directions x and y can be calculated from the width of the field of vision of the neurons 24 and from the overlap. If, for example, the width of the field of vision of the neuron is 6 and the overlap is 4, then width x is the following in the next layer:
x(i+1 ) = (x(i) - 6) / (6 - 4) + 1.
The training method according to the invention is partly based on the characteristic that convolution neural networks have the same sets of weights in the directions x and y in each sub-layer. Therefore, a more efficient training can be achieved, if these sets of weights are determined by means of training with images smaller than the image to be analysed for a sub-network adjusted to the size of the training image, which training images are characterised by an element associated with only up to one type of category. This is understood according to the invention that in the training image, mainly only one element associated with one type of category or - according to the description below - a so-called uncategorised element appears. In this way, it is not the images to be analysed, but the categories that are trained to the convolution neural network.
The sub-network comprises the neurons sensing the inputs from the training image, and neurons of the at least one intermediate layer and the output layer directly and indirectly linked thereto by weights. Once the training is completed, the trained sub-network can be simply extended in the directions x and y by multiplying the neurons and the sets of weights. Of course, the size is to be carried out in a way that the number of sub-layers and the sets of weights remain unchanged.
The parameters of the sub-network used in the training are to be selected in a way that they are adjusted on the one hand to the inputs (number of sub-layers of the input layer), the outputs (number of sub-layers of the output layer) and the network structure intended to be accomplished (number of intermediate layers and number of sub-layers thereof). In addition, it is advisable to adjust the parameters of the sub-network input layer 20', the intermediate layer 21 ' and the output layer 23' to the size of their training images. It is also advisable if the area (dimensions in the directions x and y) of the output layer 23' of the sub-network is relatively small. Preferably, it is beneficial to apply an output layer 23' of an area of 1 x1 , 3x3, 5x5, etc neurons (dimensions x - y). An output layer 23' of an area larger than one neuron is advisable, because the recognition process becomes indifferent to the lateral displacements in the training images, e.g. in the current case to four pixel displacements. This is because in the case of an output layer 23' it is an expectation that the maximum of the output sub-layer (i.e. the highest one of the respective presence probability values) associated with the given category should be larger than the maximum of the other sub-layers. Therefore, an output layer 23' enables more flexible training being indifferent to lateral image displacements. A schematic view of an output layer is shown in Fig. 3. In the figure, the first sublayer of the output layer 23' belongs e.g. to the category of red blood cells, and the second sub-layer to the category of white blood cells. In an especially preferred embodiment of the invention, training images characterised by an uncategorised element is also used to perform the training, where the weights are adjusted in a way that each sub-layer of the output layer 23' of the sub-network provides as low as possible maximum of presence probability. In such a way the mistuning of the training process by the uncategorised elements can be avoided efficiently and also false network responses given to uncategorised and unknown elements are eliminated.
Fig. 4 shows a schematic design of the sub-network to be trained, which subnetwork consists of neurons 24 linked directly or indirectly to the neurons 24 of the output layer 23' according to Fig. 3. The parameters of the sub-network are given in the following Table 2.
As shown in Table 2, the number of layers in the sub-network and in the neural network to be generated for the analysis are identical, the field of view of neurons in each sub-layer and the size of overlaps are identical and, of course, the number of weights associated with the various layers (z χ x' χ y' χ ζ') is also the same. On the basis of the field of view of neurons and the overlaps, the extensions of the sub-network layers in directions x and y are obtained.
For training the sub-network according to Fig. 4, an image database comprising categorised images is compiled. In the course of this, images are produced with a size matching with the sub-network input layer in the directions x and y. Preferably, the lateral size of the categorised image is slightly more than V2-times the x - y extension of the sub-network input layer 20'. Hence it can be avoided that the rotational transformations of the categorised image result in empty sections in the detected area, and therefore a trouble-free application of a rotational transformation in any angle is made available.
These categorised images are preferably cropped from sample images to be analysed in a way that in the centre thereof only an element of up to one category is found, by retouching or by keeping a specified distance from the other elements (therefore, an uncategorised element can also be in the centre of the image in the given case). Therefore, the training images characterised by an element associated with a single category are retouched training images which only comprise the element of the given category in the centre or they are natural training images which comprise the element of the given category in the centre, and in their predetermined environment they do not have an element associated with a different category. These categorised images are preferably produced in a way that they are examined individually by an expert, tagging with a label corresponding to the given category the area covered by the relevant element. The twenty typical categories applied for the exemplary mentioned urine analysis are represented by the output layers 23, 23' comprising twenty sub-layers as shown in the figures.
Producing of images categorised as described above is very time consuming, and in practical applications it is not possible to create more than a few ten thousands categorised images. However, for training the sub-network, much more than, even as many as 10 to 100 million training images are required according to experience. This problem is handled according to the invention by generating the training images by random transformation from categorised images. This transformation is carried out in real time between the selection and the feeding in of the categorised image. The transformation can comprise reflection, rotation, expansion, shrinking and/or noise addition in one or more spatial frequencies, or any combination thereof. It is advisable to set the transformation parameters empirically. It has been .proven by the experiments that it is advisable to determine the transformation parameters also in a category-specific way.
When preparing the image database according to the invention, special attention must be paid to its consistency, i.e. that it should not comprise contradictions. Contradictions may make it difficult to train the neural network and may render the process unstable.
The training method according to the invention is preferably also characterised by applying roughly to the same amount of training images for each category. It is a characteristic of categorisation based image recognition that particular categories appear extremely frequently and other categories in the given case extremely rarely in the images to be analysed. However, training may not follow the appearance frequency mentioned above, because if a rarely appearing element/category is rarely shown to the neural network, it will neglect and forget the given category. Therefore, it is important that the categories appear in training with roughly the same weight.
The invention has been described above for the purpose of urine analysis regarding the images 10 of a urine sample, but of course this does not restrict the applicability of the invention to this technical field. The element recognition and categorisation according to the invention may be preferably used also in further applications mentioned in the introduction, where the need for image recognition and categorisation arises.
The invention is not limited to the preferred embodiments described in details above, but further variants and modifications are possible within the scope of protection determined by the claims. For example, the invention is not only suitable for processing two-dimensional images, but it may be used also for the analysis of images generated by a three- dimensional imaging process. In this case the probability maps are preferably also three-dimensional maps.
|Cited Patent||Filing date||Publication date||Applicant||Title|
|US5052043||7 May 1990||24 Sep 1991||Eastman Kodak Company||Neural network with back propagation controlled through an output confidence measure|
|US5796410||12 Jun 1990||18 Aug 1998||Lucent Technologies Inc.||Generation and use of defective images in image analysis|
|US5903884||8 Aug 1995||11 May 1999||Apple Computer, Inc.||Method for training a statistical classifier with reduced tendency for overfitting|
|US6876966||16 Oct 2000||5 Apr 2005||Microsoft Corporation||Pattern recognition training method and apparatus using inserted noise followed by noise reduction|
|US7130776||25 Mar 2002||31 Oct 2006||Lockheed Martin Corporation||Method and computer program product for producing a pattern recognition training set|
|US7286687||24 Mar 2005||23 Oct 2007||Siemens Ag||Method for generating learning and/or sample probes|
|US7418128||31 Jul 2003||26 Aug 2008||Microsoft Corporation||Elastic distortions for automatic generation of labeled data|
|US7480409||10 Aug 2005||20 Jan 2009||Fujitsu Limited||Degraded character image generation method and apparatus|
|US20090196493||15 Feb 2008||6 Aug 2009||Bernard Widrow||Cognitive Memory And Auto-Associative Neural Network Based Search Engine For Computer And Network Located Images And Photographs|
|Citing Patent||Filing date||Publication date||Applicant||Title|
|WO2017051943A1 *||24 Sep 2015||30 Mar 2017||주식회사 뷰노코리아||Method and apparatus for generating image, and image analysis method|
|Cooperative Classification||G06K9/00127, G06K9/4628|
|4 Sep 2013||121||Ep: the epo has been informed by wipo that ep was designated in this application|
Ref document number: 13707027
Country of ref document: EP
Kind code of ref document: A2
|30 Jan 2014||DPE1||Request for preliminary examination filed after expiration of 19th month from priority date (pct application filed from 20040101)|
|11 Feb 2015||122||Ep: pct application non-entry in european phase|
Ref document number: 13707027
Country of ref document: EP
Kind code of ref document: A2