CN104598881A - Feature compression and feature selection based skew scene character recognition method - Google Patents

Feature compression and feature selection based skew scene character recognition method Download PDF

Info

Publication number
CN104598881A
CN104598881A CN201510014950.4A CN201510014950A CN104598881A CN 104598881 A CN104598881 A CN 104598881A CN 201510014950 A CN201510014950 A CN 201510014950A CN 104598881 A CN104598881 A CN 104598881A
Authority
CN
China
Prior art keywords
feature
character
chog
scene
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510014950.4A
Other languages
Chinese (zh)
Other versions
CN104598881B (en
Inventor
张永铮
周宇
王一鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Information Engineering of CAS
Original Assignee
Institute of Information Engineering of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Information Engineering of CAS filed Critical Institute of Information Engineering of CAS
Priority to CN201510014950.4A priority Critical patent/CN104598881B/en
Publication of CN104598881A publication Critical patent/CN104598881A/en
Application granted granted Critical
Publication of CN104598881B publication Critical patent/CN104598881B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/273Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion removing elements interfering with the pattern to be recognised
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components

Abstract

The invention relates to a feature compression and feature selection based skew scene character recognition method. The feature compression and feature selection based skew scene character recognition method comprises the steps of extracting CHOG features from pixel points in a character area, determining character-level clustering number according to the difference degree of the CHOG features, clustering the CHOG features to obtain compressed character-level features, merging the compressed features and performing re-clustering to generate an initial visual feature dictionary, establishing a visual feature histogram descriptors, training a linear support vector machine, sorting the importance of the features of the feature histogram descriptors and selecting a plurality of most important features to form a final dictionary, re-calculating histogram descriptors of a sample, training multiple radial basis function support vector machines to be used as a final character classifier for recognizing skew scene characters and obtaining a recognition result. The feature compression and feature selection based skew scene character recognition method can ensure high recognition accuracy rate and recall rate while overcoming the shortcoming of failure of a feature point detection method.

Description

The crooked scene character recognition method of feature based compression and feature selecting
Technical field
The invention belongs to computer vision, Word Input and recognition technology field, be specifically related to the crooked scene character recognition method of the compression of a kind of feature based and feature selecting.
Background technology
In recent years, along with the increase of the mobile device of built-in camera, all kinds of number of pictures of taking in natural scene becomes explosive increase.A lot of very valuable application, such as: assist based on the picture query of Word message, intelligent driving, the reading of vision disorder personnel assists and the understanding etc. of scene, all depend on the method obtaining Word message from picture.Therefore, the Word Input in natural scene and the key problem identified as processing this new data and originating, become the much-talked-about topic of computer vision research in recent years.After scene picture Chinese territory, block extracts by text detection algorithm, need a set of algorithm for scene Text region.Scene word is not easy to be identified due to the reason such as fuzzy, uneven illumination, low resolution.And, because these scene photos mostly are handheld device shooting, so wherein word is usually tilt.Due to these reasons, traditional sloped correcting method can not prove effective on scene picture character.Therefore, although traditional optical character recognition system (OCR) is very ripe, in order to identify that scene word is still necessary to develop recognition system targetedly.
Behind the region that text detection algorithm detects containing word, high-quality word shape information can be obtained by some antidotes.These methods are corrected the character area detected by analyzing word shape and supposing that word is present in horizontal line of text, and then identify.But the word in scene picture is owing to being subject to above-mentioned interference, and its shape often can not be efficiently extracted by the rankine steam cycle.Research shows traditional binarization method, edge detection method and most stabiliser field method all cannot be isolated can for the binaryzation mask (Mishra of traditional OCR system identification, A., Alahari, K., Jawahar, C.:Top-down and bottom-up cues for scene text recognition.In:CVPR. (2012)).In addition, due to current for the exploitation of scene text detection algorithm mainly solves is all without crooked identification problem, need research to there being the identification of crooked scene word.
Existing crooked Text region algorithm is extracted by dense feature and realizes.Because the character area in scene picture is less, and picture quality is not high, so feature point detecting method usually lost efficacy.Therefore, be necessary to extract feature thick and fast on picture.Existing crooked character recognition method adopts the feature descriptor of Scale Invariant Feature Transform (SIFT) as single character zone of 128 dimensions, every two pixel extraction SIFT feature on the image after standardization.All feature collection that all training samples extract are become a feature set, then by the method dimensionality reduction of cluster, a final generation visual signature dictionary.Then find out all vocabulary the most close with the feature in training sample, and generate final Bag-of-Words (BoG) histogram descriptor.When new samples is tested, use identical method to extract characteristic dyadic and quantize.Owing to using intensive feature extraction to represent single character, along with the increase of vocabulary in dictionary, computation complexity will be multiplied.
Summary of the invention
The object of the invention is to the crooked scene character recognition method designing and Implementing the compression of a kind of feature based and feature selecting.Same use has the invariable rotary feature that is similar to SIFT feature as the low-level image feature describing character picture, then character level cluster and visual signature dictionary cluster twice cluster is used to compress original dense feature, the stronger intercharacter separating capacity of acquisition is extracted by means of dense feature to reach, and while overcoming the inefficacy of feature point detection method, maintenance at a high speed, efficiently.Finally, by grading to compressed feature, fall acting on little characteristic filter to intercharacter differentiation.The sorter finally trained not only speed is fast, and can ensure very high recognition accuracy and recall rate.
For achieving the above object, the present invention adopts following technical scheme:
A crooked scene character recognition method for feature based compression and feature selecting, its step comprises:
1) on each pixel of character area, extract CHOG (Circular-Fourier Histogram of OrientedGradient) feature;
2) according to the number of clusters of the difference degree determination character level of the CHOG feature in the different pixels extracted;
3) after determining number of clusters, cluster is carried out to obtain the character level feature after compressing to CHOG feature;
4) the compressed feature in all training samples is merged, and again carry out cluster, generate an initial visual signature dictionary;
5) described initial visual signature dictionary is used to set up visual signature histogram descriptor;
6) training linear support vector machine, is sorted to the importance of feature in the histogram descriptor of character by linear SVM, selects some most important features as final dictionary;
7) use described final dictionary again to calculate the histogram descriptor of sample, then train a multiclass radial basis function support vector machine, it can be used as final script classify device;
8) use described final script classify device to identify crooked scene word, and obtain recognition result.
Further, step 2) use Elbow method determination number of clusters.
Further, step 3) and step 4) use K-Means method to carry out cluster.
Compared with prior art, beneficial effect of the present invention is as follows:
1) being different from SIFT feature, there is not discrete interpolation in the extraction of CHOG feature that the present invention adopts, and can along with pictorial information automatic aligning and automatic rotation proper vector to reach the object of rotational invariance, accuracy rate is higher;
2) the present invention uses multiple dimensioned CHOG feature to do the different description of fine degree to character area, does like this and can obtain multiple dimensioned information, and shortens the average length of feature descriptor, improves processing speed;
3) the present invention is by the Feature Compression of character level, and when building histogrammic, computation complexity is reduced to and directly uses about 1% ~ 5% of original dense feature, greatly improves treatment effeciency;
4) the present invention uses feature selection approach to delete choosing to dictionary, reduces the computation complexity when testing further, and improves the accuracy rate identified; 10% of the model finally obtained and the not enough present mode of the complexity of testing on new samples, greatly improves the speed of identification.
Accompanying drawing explanation
Fig. 1 is the overview flow chart of the inventive method.
Fig. 2 is the CHOG feature extraction of the inventive method and the process flow diagram of study.
Fig. 3 is the recognition accuracy figure that the inventive method uses the dictionary of different size.
Embodiment
For enabling above-mentioned purpose of the present invention, feature and advantage become apparent more, and below by specific embodiments and the drawings, the present invention will be further described.
Each pixel that the present invention is used in character zone extracts Circular-Fourier Histogram of Oriented Gradient (CHOG) feature (Skibbe, H., Reisert, M.:Circular fourier-hog features for rotation invariant objectdetection in biomedical images.In:ISBI. (2012)).It is fast that CHOG has dense feature extraction rate, and have the advantage of rotational invariance.In order to solve the identification problem of crooked word, the present invention adopts CHOG to be described single word as low-level image feature.Fig. 1 is the overview flow chart of the inventive method.Concrete steps are as follows:
1) first, each pixel of character area extracts CHOG feature.
2) Elbow method (a kind of existing method) then, is used to decide according to the difference degree of the CHOG feature in the different pixels extracted the number of clusters of character level.
3) after use Elbow method determination number of clusters, K-Means is used to carry out cluster to obtain the character level feature after compressing.
4) by after the compressed feature collection in all training samples, the visual signature dictionary that K-Means cluster generates a rear feature of compression is reused.
5) method then, by finding the arest neighbors of compressed feature in this dictionary calculates a BoG histogram as the final descriptor of single character zone.
6), after, the importance of a series of linear SVM to these features is trained to grade.Because final feature descriptor is a histogram, therefore linear SVM directly can reflect the importance of feature to the weight that feature is given.After these linear SVMs that the initial stage of obtaining trains, comprehensively these linear SVMs are to the evaluation of feature importance in dictionary, select K most important feature as final dictionary.In new sample is tested, use the dictionary of simplifying can improve effect and the speed of identification further.
The present invention carries out the flow process of CHOG feature extraction and study as shown in Figure 2.After extracting the CHOG feature in each pixel, by using Fourier basis to represent CHOG, and CHOG feature is rotated, to obtain rotational invariance according to the gradient of image.In order to obtain enough information to carry out significant classification, use the window function (large, medium and small three yardsticks) of three different sizes altogether, to be described the word local feature of different size.These multiple dimensioned windows can catch character area details in various degree, for sorter provides enough information.
The feature that Elbow method can extract according to previous step diversity factor automatically determine suitable clustering cluster quantity.K-Means method is used to carry out cluster to original dense feature after determining clustering cluster quantity, to reach the object to its compression.
All training samples is extracted the compressed feature obtained to merge, and again carry out cluster, to obtain an initial visual signature dictionary.
Use the initial dictionary obtained in previous step to set up visual signature histogram descriptor, use one-to-many (one-verse-all) strategy to train two class linear SVMs (corresponding positive and negative two classifications) respectively respectively to each character.By linear SVM, the importance of feature in the histogram descriptor of character is sorted.Choose K most important feature, neglect residue character being that vocabulary quantity in dictionary minimizes.When specifically implementing, rule of thumb can determine the occurrence of K.
Use the dictionary of scaled-down version again to calculate the histogram descriptor of sample, then train a multiclass radial basis function support vector machine.It can be used as final script classify device.
The present invention uses the recognition accuracy of the dictionary of different size (selecting different feature quantity) as shown in Figure 3, these dictionaries comprise: ICDAR-Char, SVT-Char, ICDAR-Word, SVT-Word, SVT-Perspective-Word and MSRA-TD-500-Word.Wherein SVT-Perspective-Word and MSRA-TD-500-Word is the data set for crooked scene Text region design.
The present invention is as shown in table 1 in the contrast without the recognition accuracy on crooked word data set and other algorithms, and the recognition accuracy on crooked word DBMS collection is as shown in table 2 with contrast.Experimental data shows, the present invention has the highest accuracy rate.The present invention achieves the performance that state-of-the-art method is close with other in without crooked scene Text region, achieves best recognition performance in crooked scene Text region.And computation complexity of the present invention only for arrange 13% of deputy method in crooked scene Text region.
Table 1. is contrasting without the recognition accuracy on crooked word data set
The recognition accuracy contrast of table 2. on crooked word DBMS collection
In above-mentioned table 1, table 2, other algorithm contrasted with the present invention can with reference to such as Publication about Document:
1.Mishra,A.,Alahari,K.,Jawahar,C.:Top-down and bottom-up cues for scene textrecognition.In:CVPR.(2012)
2.Phan,T.Q.,Shivakumara,P.,Tian,S.,Tan,C.L.:Recognizing text with perspectivedistortion in natural scenes.In:ICCV.(2013)
3.Wang,K.,Babenko,B.,Belongie,S.:End-to-end scene text recognition.In:ICCV.(2011)
4.Mishra,A.,Alahari,K.,Jawahar,C.:Scene text recognition using higher order languagepriors.In:BMVC.(2012)
5.Wang,T.,Wu,D.J.,Coates,A.,Ng,A.Y.:End-to-end text recognition with convolutionalneural networks.In:ICPR.(2012)
6.ABBYY FineReader Professional 9.0:http://www.abbyy.com/.(2008)
Above embodiment is only in order to illustrate technical scheme of the present invention but not to be limited; those of ordinary skill in the art can modify to technical scheme of the present invention or equivalent replacement; and not departing from the spirit and scope of the present invention, protection scope of the present invention should be as the criterion with described in claim.

Claims (7)

1. a crooked scene character recognition method for feature based compression and feature selecting, its step comprises:
1) on each pixel of character area, CHOG feature is extracted;
2) according to the number of clusters of the difference degree determination character level of the CHOG feature in the different pixels extracted;
3) after determining number of clusters, cluster is carried out to obtain the character level feature after compressing to CHOG feature;
4) the compressed feature in all training samples is merged, and again carry out cluster, generate an initial visual signature dictionary;
5) use described initial visual signature dictionary, set up visual signature histogram descriptor;
6) training linear support vector machine, is sorted to the importance of feature in the histogram descriptor of character by linear SVM, selects K most important feature as final dictionary;
7) use described final dictionary again to calculate the histogram descriptor of sample, then train a multiclass radial basis function support vector machine, it can be used as final script classify device;
8) use described final script classify device to identify crooked scene word, and obtain recognition result.
2. the method for claim 1, is characterized in that: step 2) use Elbow method determination number of clusters.
3. method as claimed in claim 1 or 2, is characterized in that: step 3) and step 4) use K-Means method to carry out cluster.
4. the method for claim 1, is characterized in that: step 1) after extracting the CHOG feature in each pixel, by using Fourier basis to represent CHOG, and CHOG feature is rotated, to obtain rotational invariance according to the gradient of image.
5. method as claimed in claim 4, is characterized in that: adopt multiple dimensioned window function seizure character area details in various degree, think the information that sorter provides enough.
6. method as claimed in claim 5, it is characterized in that: described multiple dimensioned window function, is the window function of three different sizes.
7. the method for claim 1, is characterized in that: step 6) use one-verse-all strategy to train two class linear SVMs respectively respectively to each character, corresponding positive and negative two classifications.
CN201510014950.4A 2015-01-12 2015-01-12 Feature based compresses the crooked scene character recognition method with feature selecting Expired - Fee Related CN104598881B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510014950.4A CN104598881B (en) 2015-01-12 2015-01-12 Feature based compresses the crooked scene character recognition method with feature selecting

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510014950.4A CN104598881B (en) 2015-01-12 2015-01-12 Feature based compresses the crooked scene character recognition method with feature selecting

Publications (2)

Publication Number Publication Date
CN104598881A true CN104598881A (en) 2015-05-06
CN104598881B CN104598881B (en) 2017-09-29

Family

ID=53124655

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510014950.4A Expired - Fee Related CN104598881B (en) 2015-01-12 2015-01-12 Feature based compresses the crooked scene character recognition method with feature selecting

Country Status (1)

Country Link
CN (1) CN104598881B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019041609A1 (en) * 2017-08-29 2019-03-07 Shenzhen United Imaging Healthcare Co., Ltd. System and method for amplitude reduction in rf pulse design
CN110161035A (en) * 2019-04-26 2019-08-23 浙江大学 Body structure surface crack detection method based on characteristics of image and bayesian data fusion
CN110399798A (en) * 2019-06-25 2019-11-01 朱跃飞 A kind of discrete picture file information extracting system and method based on deep learning
CN113254468A (en) * 2021-04-20 2021-08-13 西安交通大学 Fault query and reasoning method for certain type of equipment
CN115410207A (en) * 2021-05-28 2022-11-29 国家计算机网络与信息安全管理中心天津分中心 Detection method and device for vertical texts

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101064008A (en) * 2006-04-29 2007-10-31 北大方正集团有限公司 Method for recognizing print form italic character
CN101329734A (en) * 2008-07-31 2008-12-24 重庆大学 License plate character recognition method based on K-L transform and LS-SVM
US20100008581A1 (en) * 2008-07-08 2010-01-14 Xerox Corporation Word detection method and system
CN102663446A (en) * 2012-04-24 2012-09-12 南方医科大学 Building method of bag-of-word model of medical focus image
CN103942550A (en) * 2014-05-04 2014-07-23 厦门大学 Scene text recognition method based on sparse coding characteristics

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101064008A (en) * 2006-04-29 2007-10-31 北大方正集团有限公司 Method for recognizing print form italic character
US20100008581A1 (en) * 2008-07-08 2010-01-14 Xerox Corporation Word detection method and system
CN101329734A (en) * 2008-07-31 2008-12-24 重庆大学 License plate character recognition method based on K-L transform and LS-SVM
CN102663446A (en) * 2012-04-24 2012-09-12 南方医科大学 Building method of bag-of-word model of medical focus image
CN103942550A (en) * 2014-05-04 2014-07-23 厦门大学 Scene text recognition method based on sparse coding characteristics

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
丁锴 等: "基于规范割的空间金字塔图像分类算法", 《北京航空航天大学学报》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019041609A1 (en) * 2017-08-29 2019-03-07 Shenzhen United Imaging Healthcare Co., Ltd. System and method for amplitude reduction in rf pulse design
US10634749B2 (en) 2017-08-29 2020-04-28 Shanghai United Imaging Healthcare Co., Ltd. System and method for amplitude reduction in RF pulse design
CN110161035A (en) * 2019-04-26 2019-08-23 浙江大学 Body structure surface crack detection method based on characteristics of image and bayesian data fusion
CN110161035B (en) * 2019-04-26 2020-04-10 浙江大学 Structural surface crack detection method based on image feature and Bayesian data fusion
US10783406B1 (en) 2019-04-26 2020-09-22 Zhejiang University Method for detecting structural surface cracks based on image features and bayesian data fusion
CN110399798A (en) * 2019-06-25 2019-11-01 朱跃飞 A kind of discrete picture file information extracting system and method based on deep learning
CN110399798B (en) * 2019-06-25 2021-07-20 朱跃飞 Discrete picture file information extraction system and method based on deep learning
CN113254468A (en) * 2021-04-20 2021-08-13 西安交通大学 Fault query and reasoning method for certain type of equipment
CN113254468B (en) * 2021-04-20 2023-03-31 西安交通大学 Equipment fault query and reasoning method
CN115410207A (en) * 2021-05-28 2022-11-29 国家计算机网络与信息安全管理中心天津分中心 Detection method and device for vertical texts
CN115410207B (en) * 2021-05-28 2023-08-29 国家计算机网络与信息安全管理中心天津分中心 Detection method and device for vertical text

Also Published As

Publication number Publication date
CN104598881B (en) 2017-09-29

Similar Documents

Publication Publication Date Title
Li et al. Scene text detection via stroke width
CN101976258B (en) Video semantic extraction method by combining object segmentation and feature weighing
Zhou et al. Principal visual word discovery for automatic license plate detection
Zamberletti et al. Text localization based on fast feature pyramids and multi-resolution maximally stable extremal regions
JP5775225B2 (en) Text detection using multi-layer connected components with histograms
US20190180094A1 (en) Document image marking generation for a training set
US20140193029A1 (en) Text Detection in Images of Graphical User Interfaces
CN104598881A (en) Feature compression and feature selection based skew scene character recognition method
Ye et al. Scene text detection via integrated discrimination of component appearance and consensus
Tabassum et al. Text detection using MSER and stroke width transform
Forczmański et al. Stamps detection and classification using simple features ensemble
Wu et al. Contour restoration of text components for recognition in video/scene images
Sanketi et al. Localizing blurry and low-resolution text in natural images
Bai et al. A fast stroke-based method for text detection in video
EP3380990B1 (en) Efficient unconstrained stroke detector
Giri Text information extraction and analysis from images using digital image processing techniques
Chang Intelligent text detection and extraction from natural scene images
CN103605993A (en) Image-to-video face identification method based on distinguish analysis oriented to scenes
Tsai et al. Mobile visual search using image and text features
Yu et al. Chinese text detection and recognition in natural scene using hog and SVM
Nor et al. Image segmentation and text extraction: application to the extraction of textual information in scene images
Sushma et al. Text detection in color images
Gaikwad et al. Video scene segmentation to separate script
Patel et al. Text segmentation from images
He et al. Chinese character recognition in natural scenes

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20170929