CN104598881A

CN104598881A - Feature compression and feature selection based skew scene character recognition method

Info

Publication number: CN104598881A
Application number: CN201510014950.4A
Authority: CN
Inventors: 张永铮; 周宇; 王一鹏
Original assignee: Institute of Information Engineering of CAS
Current assignee: Institute of Information Engineering of CAS
Priority date: 2015-01-12
Filing date: 2015-01-12
Publication date: 2015-05-06
Anticipated expiration: 2035-01-12
Also published as: CN104598881B

Abstract

The invention relates to a feature compression and feature selection based skew scene character recognition method. The feature compression and feature selection based skew scene character recognition method comprises the steps of extracting CHOG features from pixel points in a character area, determining character-level clustering number according to the difference degree of the CHOG features, clustering the CHOG features to obtain compressed character-level features, merging the compressed features and performing re-clustering to generate an initial visual feature dictionary, establishing a visual feature histogram descriptors, training a linear support vector machine, sorting the importance of the features of the feature histogram descriptors and selecting a plurality of most important features to form a final dictionary, re-calculating histogram descriptors of a sample, training multiple radial basis function support vector machines to be used as a final character classifier for recognizing skew scene characters and obtaining a recognition result. The feature compression and feature selection based skew scene character recognition method can ensure high recognition accuracy rate and recall rate while overcoming the shortcoming of failure of a feature point detection method.

Description

The crooked scene character recognition method of feature based compression and feature selecting

Technical field

The invention belongs to computer vision, Word Input and recognition technology field, be specifically related to the crooked scene character recognition method of the compression of a kind of feature based and feature selecting.

Background technology

In recent years, along with the increase of the mobile device of built-in camera, all kinds of number of pictures of taking in natural scene becomes explosive increase.A lot of very valuable application, such as: assist based on the picture query of Word message, intelligent driving, the reading of vision disorder personnel assists and the understanding etc. of scene, all depend on the method obtaining Word message from picture.Therefore, the Word Input in natural scene and the key problem identified as processing this new data and originating, become the much-talked-about topic of computer vision research in recent years.After scene picture Chinese territory, block extracts by text detection algorithm, need a set of algorithm for scene Text region.Scene word is not easy to be identified due to the reason such as fuzzy, uneven illumination, low resolution.And, because these scene photos mostly are handheld device shooting, so wherein word is usually tilt.Due to these reasons, traditional sloped correcting method can not prove effective on scene picture character.Therefore, although traditional optical character recognition system (OCR) is very ripe, in order to identify that scene word is still necessary to develop recognition system targetedly.

Behind the region that text detection algorithm detects containing word, high-quality word shape information can be obtained by some antidotes.These methods are corrected the character area detected by analyzing word shape and supposing that word is present in horizontal line of text, and then identify.But the word in scene picture is owing to being subject to above-mentioned interference, and its shape often can not be efficiently extracted by the rankine steam cycle.Research shows traditional binarization method, edge detection method and most stabiliser field method all cannot be isolated can for the binaryzation mask (Mishra of traditional OCR system identification, A., Alahari, K., Jawahar, C.:Top-down and bottom-up cues for scene text recognition.In:CVPR. (2012)).In addition, due to current for the exploitation of scene text detection algorithm mainly solves is all without crooked identification problem, need research to there being the identification of crooked scene word.

Existing crooked Text region algorithm is extracted by dense feature and realizes.Because the character area in scene picture is less, and picture quality is not high, so feature point detecting method usually lost efficacy.Therefore, be necessary to extract feature thick and fast on picture.Existing crooked character recognition method adopts the feature descriptor of Scale Invariant Feature Transform (SIFT) as single character zone of 128 dimensions, every two pixel extraction SIFT feature on the image after standardization.All feature collection that all training samples extract are become a feature set, then by the method dimensionality reduction of cluster, a final generation visual signature dictionary.Then find out all vocabulary the most close with the feature in training sample, and generate final Bag-of-Words (BoG) histogram descriptor.When new samples is tested, use identical method to extract characteristic dyadic and quantize.Owing to using intensive feature extraction to represent single character, along with the increase of vocabulary in dictionary, computation complexity will be multiplied.

Summary of the invention

The object of the invention is to the crooked scene character recognition method designing and Implementing the compression of a kind of feature based and feature selecting.Same use has the invariable rotary feature that is similar to SIFT feature as the low-level image feature describing character picture, then character level cluster and visual signature dictionary cluster twice cluster is used to compress original dense feature, the stronger intercharacter separating capacity of acquisition is extracted by means of dense feature to reach, and while overcoming the inefficacy of feature point detection method, maintenance at a high speed, efficiently.Finally, by grading to compressed feature, fall acting on little characteristic filter to intercharacter differentiation.The sorter finally trained not only speed is fast, and can ensure very high recognition accuracy and recall rate.

For achieving the above object, the present invention adopts following technical scheme:

A crooked scene character recognition method for feature based compression and feature selecting, its step comprises:

1) on each pixel of character area, extract CHOG (Circular-Fourier Histogram of OrientedGradient) feature;

2) according to the number of clusters of the difference degree determination character level of the CHOG feature in the different pixels extracted;

3) after determining number of clusters, cluster is carried out to obtain the character level feature after compressing to CHOG feature;

4) the compressed feature in all training samples is merged, and again carry out cluster, generate an initial visual signature dictionary;

5) described initial visual signature dictionary is used to set up visual signature histogram descriptor;

6) training linear support vector machine, is sorted to the importance of feature in the histogram descriptor of character by linear SVM, selects some most important features as final dictionary;

7) use described final dictionary again to calculate the histogram descriptor of sample, then train a multiclass radial basis function support vector machine, it can be used as final script classify device;

8) use described final script classify device to identify crooked scene word, and obtain recognition result.

Further, step 2) use Elbow method determination number of clusters.

Further, step 3) and step 4) use K-Means method to carry out cluster.

Compared with prior art, beneficial effect of the present invention is as follows:

1) being different from SIFT feature, there is not discrete interpolation in the extraction of CHOG feature that the present invention adopts, and can along with pictorial information automatic aligning and automatic rotation proper vector to reach the object of rotational invariance, accuracy rate is higher;

2) the present invention uses multiple dimensioned CHOG feature to do the different description of fine degree to character area, does like this and can obtain multiple dimensioned information, and shortens the average length of feature descriptor, improves processing speed;

3) the present invention is by the Feature Compression of character level, and when building histogrammic, computation complexity is reduced to and directly uses about 1% ~ 5% of original dense feature, greatly improves treatment effeciency;

4) the present invention uses feature selection approach to delete choosing to dictionary, reduces the computation complexity when testing further, and improves the accuracy rate identified; 10% of the model finally obtained and the not enough present mode of the complexity of testing on new samples, greatly improves the speed of identification.

Accompanying drawing explanation

Fig. 1 is the overview flow chart of the inventive method.

Fig. 2 is the CHOG feature extraction of the inventive method and the process flow diagram of study.

Fig. 3 is the recognition accuracy figure that the inventive method uses the dictionary of different size.

Embodiment

For enabling above-mentioned purpose of the present invention, feature and advantage become apparent more, and below by specific embodiments and the drawings, the present invention will be further described.

Each pixel that the present invention is used in character zone extracts Circular-Fourier Histogram of Oriented Gradient (CHOG) feature (Skibbe, H., Reisert, M.:Circular fourier-hog features for rotation invariant objectdetection in biomedical images.In:ISBI. (2012)).It is fast that CHOG has dense feature extraction rate, and have the advantage of rotational invariance.In order to solve the identification problem of crooked word, the present invention adopts CHOG to be described single word as low-level image feature.Fig. 1 is the overview flow chart of the inventive method.Concrete steps are as follows:

1) first, each pixel of character area extracts CHOG feature.

2) Elbow method (a kind of existing method) then, is used to decide according to the difference degree of the CHOG feature in the different pixels extracted the number of clusters of character level.

3) after use Elbow method determination number of clusters, K-Means is used to carry out cluster to obtain the character level feature after compressing.

4) by after the compressed feature collection in all training samples, the visual signature dictionary that K-Means cluster generates a rear feature of compression is reused.

5) method then, by finding the arest neighbors of compressed feature in this dictionary calculates a BoG histogram as the final descriptor of single character zone.

6), after, the importance of a series of linear SVM to these features is trained to grade.Because final feature descriptor is a histogram, therefore linear SVM directly can reflect the importance of feature to the weight that feature is given.After these linear SVMs that the initial stage of obtaining trains, comprehensively these linear SVMs are to the evaluation of feature importance in dictionary, select K most important feature as final dictionary.In new sample is tested, use the dictionary of simplifying can improve effect and the speed of identification further.

The present invention carries out the flow process of CHOG feature extraction and study as shown in Figure 2.After extracting the CHOG feature in each pixel, by using Fourier basis to represent CHOG, and CHOG feature is rotated, to obtain rotational invariance according to the gradient of image.In order to obtain enough information to carry out significant classification, use the window function (large, medium and small three yardsticks) of three different sizes altogether, to be described the word local feature of different size.These multiple dimensioned windows can catch character area details in various degree, for sorter provides enough information.

The feature that Elbow method can extract according to previous step diversity factor automatically determine suitable clustering cluster quantity.K-Means method is used to carry out cluster to original dense feature after determining clustering cluster quantity, to reach the object to its compression.

All training samples is extracted the compressed feature obtained to merge, and again carry out cluster, to obtain an initial visual signature dictionary.

Use the initial dictionary obtained in previous step to set up visual signature histogram descriptor, use one-to-many (one-verse-all) strategy to train two class linear SVMs (corresponding positive and negative two classifications) respectively respectively to each character.By linear SVM, the importance of feature in the histogram descriptor of character is sorted.Choose K most important feature, neglect residue character being that vocabulary quantity in dictionary minimizes.When specifically implementing, rule of thumb can determine the occurrence of K.

Use the dictionary of scaled-down version again to calculate the histogram descriptor of sample, then train a multiclass radial basis function support vector machine.It can be used as final script classify device.

The present invention uses the recognition accuracy of the dictionary of different size (selecting different feature quantity) as shown in Figure 3, these dictionaries comprise: ICDAR-Char, SVT-Char, ICDAR-Word, SVT-Word, SVT-Perspective-Word and MSRA-TD-500-Word.Wherein SVT-Perspective-Word and MSRA-TD-500-Word is the data set for crooked scene Text region design.

The present invention is as shown in table 1 in the contrast without the recognition accuracy on crooked word data set and other algorithms, and the recognition accuracy on crooked word DBMS collection is as shown in table 2 with contrast.Experimental data shows, the present invention has the highest accuracy rate.The present invention achieves the performance that state-of-the-art method is close with other in without crooked scene Text region, achieves best recognition performance in crooked scene Text region.And computation complexity of the present invention only for arrange 13% of deputy method in crooked scene Text region.

Table 1. is contrasting without the recognition accuracy on crooked word data set

The recognition accuracy contrast of table 2. on crooked word DBMS collection

In above-mentioned table 1, table 2, other algorithm contrasted with the present invention can with reference to such as Publication about Document:

1.Mishra,A.,Alahari,K.,Jawahar,C.:Top-down and bottom-up cues for scene textrecognition.In:CVPR.(2012)

2.Phan,T.Q.,Shivakumara,P.,Tian,S.,Tan,C.L.:Recognizing text with perspectivedistortion in natural scenes.In:ICCV.(2013)

3.Wang,K.,Babenko,B.,Belongie,S.:End-to-end scene text recognition.In:ICCV.(2011)

4.Mishra,A.,Alahari,K.,Jawahar,C.:Scene text recognition using higher order languagepriors.In:BMVC.(2012)

5.Wang,T.,Wu,D.J.,Coates,A.,Ng,A.Y.:End-to-end text recognition with convolutionalneural networks.In:ICPR.(2012)

6.ABBYY FineReader Professional 9.0:http://www.abbyy.com/.(2008)

Above embodiment is only in order to illustrate technical scheme of the present invention but not to be limited; those of ordinary skill in the art can modify to technical scheme of the present invention or equivalent replacement; and not departing from the spirit and scope of the present invention, protection scope of the present invention should be as the criterion with described in claim.

Claims

1. a crooked scene character recognition method for feature based compression and feature selecting, its step comprises:

1) on each pixel of character area, CHOG feature is extracted;

5) use described initial visual signature dictionary, set up visual signature histogram descriptor;

6) training linear support vector machine, is sorted to the importance of feature in the histogram descriptor of character by linear SVM, selects K most important feature as final dictionary;

2. the method for claim 1, is characterized in that: step 2) use Elbow method determination number of clusters.

3. method as claimed in claim 1 or 2, is characterized in that: step 3) and step 4) use K-Means method to carry out cluster.

4. the method for claim 1, is characterized in that: step 1) after extracting the CHOG feature in each pixel, by using Fourier basis to represent CHOG, and CHOG feature is rotated, to obtain rotational invariance according to the gradient of image.

5. method as claimed in claim 4, is characterized in that: adopt multiple dimensioned window function seizure character area details in various degree, think the information that sorter provides enough.

6. method as claimed in claim 5, it is characterized in that: described multiple dimensioned window function, is the window function of three different sizes.

7. the method for claim 1, is characterized in that: step 6) use one-verse-all strategy to train two class linear SVMs respectively respectively to each character, corresponding positive and negative two classifications.