EP1668568A2 - Method and system for compressing handwritten character templates - Google Patents

Method and system for compressing handwritten character templates

Info

Publication number
EP1668568A2
EP1668568A2 EP04784422A EP04784422A EP1668568A2 EP 1668568 A2 EP1668568 A2 EP 1668568A2 EP 04784422 A EP04784422 A EP 04784422A EP 04784422 A EP04784422 A EP 04784422A EP 1668568 A2 EP1668568 A2 EP 1668568A2
Authority
EP
European Patent Office
Prior art keywords
character
feature vectors
model
uncompressed
templates
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP04784422A
Other languages
German (de)
French (fr)
Other versions
EP1668568A4 (en
Inventor
Li-Xin Zhen
Feng-Jun 1916 room 30 Building Fan Yu Plaza GUO
Jian-Cheng Huang
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Motorola Mobility LLC
Original Assignee
Motorola Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Motorola Inc filed Critical Motorola Inc
Publication of EP1668568A2 publication Critical patent/EP1668568A2/en
Publication of EP1668568A4 publication Critical patent/EP1668568A4/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/22Character recognition characterised by the type of writing
    • G06V30/226Character recognition characterised by the type of writing of cursive writing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/762Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
    • G06V10/763Non-hierarchical techniques, e.g. based on statistics of modelling distributions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/1914Determining representative reference patterns, e.g. averaging or distorting patterns; Generating dictionaries, e.g. user dictionaries

Definitions

  • the invention relates generally to electronic recognition of handwritten text.
  • the invention is particularly useful for, but not necessarily limited to, compressing handwritten character templates for use in portable electronic devices having limited memory.
  • BACKGROUND OF THE INVENTION Computer recognition of printed text has presented vexing technical difficulties for several decades. Many optical character recognition (OCR) techniques have only recently been refined to enable scanned text documents to be processed with a high level of accuracy. However such OCR techniques generally are able to interpret only text that is printed by a typewriter or an electronic printer. Computer recognition of handwritten characters remains a very difficult technical challenge.
  • Existing methods for recognizing handwritten characters often include high-resolution templates that capture a time dimension related to the act of writing text in addition to the two physical dimensions of the written text.
  • Such templates are usually created using electronic tablets that record text strokes when a pen or stylus contacts the tablet.
  • Other techniques include the use of electronic pens that record the movement of the pen tip as characters are written.
  • Features of the input handwritten characters are then matched with templates of model characters using various pattern recognition techniques.
  • the templates of model characters comprise the statistical average values of feature vectors from numerous input samples of each character.
  • the memory required to store the above described multi-dimensional, digitised handwriting data is considerable.
  • the model template dictionaries to which the input templates are compared become very large. Nevertheless a consumer demand for handwritten character recognition applications continues to grow. In particular there is a demand for handwritten character recognition capabilities on small handheld devices such as personal digital assistants (PDAs). Such devices have very limited memory and processing capacities.
  • Existing techniques for compressing the size of the above described handwritten data include converting the handwritten characters into multidimensional feature vectors. Storing such feature vectors generally requires much less memory than storing the corresponding handwritten pixel data. However such feature vectors still often require more memory than is available on handheld devices.
  • a method of compressing handwritten character templates comprising the steps of: generating a codebook comprising vectors defining the centers of clusters of uncompressed model character feature vectors provided from model character templates; and comparing said uncompressed model character feature vectors with said codebook to provide compressed templates of model characters.
  • the method may preferably include calculating distances between said clusters of uncompressed model character feature vectors and uncompressed input character feature vectors, and providing top- « candidate characters by calculating the distances between said uncompressed input character feature vectors and said model character templates, where n is based on the value of a threshold.
  • the step of providing uncompressed input character feature vectors may include the step of extracting features from a normalized input character template.
  • the codebook may include no more than 256 clusters.
  • the uncompressed input character feature vectors may be 8- dimensional vectors.
  • the top-n candidate characters may be provided by calculating the distances between said uncompressed input character feature vectors and said model character templates using a sorting method to provide said candidate characters.
  • the candidate characters may be provided in a step of a character recognition method.
  • the present invention comprises a system for compressing handwritten character templates including: a codebook generator module for generating a codebook, wherein said codebook comprises vectors defining the centers of clusters of uncompressed model character feature vectors provided from model character templates; and a template compression module operatively connected to said codebook generator module for comparing said uncompressed model character feature vectors with said codebook to provide compressed templates of model characters.
  • the system includes a template matching module operatively connected to said template compression module for providing candidate characters by comparing the distances between uncompressed input character feature vectors and said model character templates.
  • the system may also include a distance lookup table generator.
  • FIG. 1 is a schematic block diagram of the software elements of a system according to the present invention
  • FIG. 2 is a general flow diagram illustrating a method of compressing handwritten character templates according to the present invention
  • FIG. 3 is a schematic bock diagram of a three-dimensional template of a handwritten character
  • FIG. 4 is a flow diagram summarizing a method for providing feature vectors
  • FIG. 5 is a flow diagram illustrating a method for generating a codebook
  • FIG. 6 is a flow diagram illustrating a method for generating compressed templates of model characters
  • FIG. 7 is a flow diagram illustrating a method for providing candidate characters
  • FIG. 8 is a schematic diagram illustrating a compressed model character template.
  • FIG. 1 there is illustrated a schematic block diagram of the software elements of a system 100 used to compress templates of handwritten characters according to a preferred embodiment of the present invention.
  • FIG. 2 there is a general flow diagram illustrating a method 200 of compressing handwritten characters using the software elements of the system 100.
  • the software elements include a codebook generator module 105 that creates uncompressed model character feature vectors 110 from model character templates and then sorts the feature vectors
  • the model character templates comprise statistical average values of feature vectors from numerous input samples of each character.
  • the uncompressed model character feature vectors 110 include feature vectors for all characters in a character set. For some languages, such as the Chinese language, there may be more than ten thousand characters in a character set.
  • the vector centers of the clusters 115 are then indexed to form a codebook 125. (Step 210.)
  • a template compression module 120 Next there is a template compression module 120.
  • Codes from the codebook 125 are compared with the uncompressed model character feature vectors 110 to create compressed templates of model characters 135.
  • a template matching module 140 comprises a lookup table generator 155 that compares the codes in the codebook 125 with the uncompressed input character feature vectors 130 to calcuate a distance lookup table 145.
  • a template matching generator 160 then calculates the distances between the compressed templates of model characters 135 and the uncompressed input character feature vectors 130 and sorts the results.
  • Step 225. Finally, the system 100 provides the top n candidate characters 150, where n is a constant .
  • Step 230. With reference to FIG. 3, there is illustrated a schematic diagram of an input to the present invention comprising a digitised, high-resolution, three- dimensional info-image 300 of a handwritten character 305.
  • the info-image 300 includes the two dimensional physical features of the character 305 as well as a time dimension that provides direction information concerning the handwritten strokes. The stroke direction of the vertical and horizontal features of the character 305 are shown by the arrows 310.
  • Methods for creating a three-dimensional image such as the info-image 300 are well known in the art, so the steps of such a method are merely summarized herein.
  • FIG. 3 also illustrates how the info-image 300 of the digitised character 305 is converted to uncompressed input character feature vectors 130. Pixels of the character 305 are first fit to a grid 315 and normalized so that the size of the character 305 is the same as the size of model characters used to create the uncompressed model character feature vectors 110. Each element of the grid
  • each mesh 320 comprises 7x7 elements.
  • One example of a feature vector 130 is an 8- dimension directional vector. Each dimension of the vector 130 corresponds to a stroke direction as illustrated by the eight arrows 325 shown in FIG. 3 that are created by dividing a circle into 45-degree increments. Those skilled in the art will recognize that more or fewer dimensions for the feature vectors 130 may also be used according to the present invention.
  • Each element of the mesh 320 that contains pixels from a handwritten stroke is then assigned to one of the eight directions based on a best fit of the actual stroke direction inside that element.
  • FIG. 4 there is a flow diagram summarizing a method 400 for providing the feature vectors 130 described above.
  • step 405 an info- image 300 of an input character 305 is received.
  • step 410 the info-image 300 of the input character 305 is normalized to a grid 315.
  • each element of the grid 315 is further divided into a mesh 320.
  • directional features of each mesh 320 are extracted.
  • values are assigned to each dimension of an uncompressed input character feature vector 130.
  • the model character templates comprise statistical average values of feature vectors from numerous input samples of each character. The features of each input sample are extracted according to a process similar to the above method 400. Referring to FIG. 5, there is a flow diagram illustrating a method 500 for generating the codebook 125 in the codebook generator module 105 using the uncompressed model character feature vectors 110 of all model characters in a character set. At step 505 the uncompressed model character feature vectors 110 are provided as described above.
  • the uncompressed model character feature vectors 110 construct a template of each model character.
  • the uncompressed model character feature vectors 110 are clustered according to a clustering process such as are well known in the art. For example a cluster method such as the K-Means method or a GLA algorithm using vector quantization may be employed.
  • the 8-dimensional uncompressed model character feature vectors 110 are grouped into 256 different clusters 115. The clustering may be based, for example, on the Euclidean distances between the vectors 110.
  • the method 500 then provides the codebook 125 to the template compression module 120.
  • the codebook 125 consists of the vector centers of the 256 clusters 115, each cluster 115 being assigned a code number between 0 and 255. Depending on the number of model characters that are analysed, hundreds of thousands of uncompressed model character feature vectors 110 may be condensed into just the 256 clusters 115. According to other embodiments of the present invention, portable electronic devices employing the present invention that may have very minimal memory resources could use even fewer clusters 115 to define the codebook 125. Referring to FIG. 6, there is a flow diagram illustrating a method 600 for generating the compressed templates of model characters 135 using the template compression module 120. At step 605, each uncompressed model character feature vector 110 is compared with the clusters 115 in the codebook 125.
  • each uncompressed model character feature vector 110 is then replaced with the single code number of the cluster 115 that best matches each feature vector 130.
  • the compressed templates of model characters 135 are then provided at step 615 to the template matching unit 140.
  • the above thus illustrates how the original uncompressed model character feature vectors 110 are converted into the compressed templates of model characters 135.
  • a memory module storing the corresponding uncompressed model character feature vectors 110 would need to include k*m 8-dimensional vectors, where m is the total number of meshes 320 used to define each character 305.
  • each 8-byte uncompressed model character feature vector 110 is reduced to a single compressed template of a model character 135 requiring only one byte of memory. The necessary memory is thus reduced to 1/8 the size originally required.
  • FIG. 7 a flow diagram is shown illustrating a method 700 for providing final candidate characters 150 for use in an associated character recognition algorithm.
  • the distance between each of the clusters 115 in the codebook 125 and each uncompressed input character feature vector is reduced to 1/8 the size originally required.
  • the vector distances may be calculated using the following formula:
  • the distance between A and B thus equals the summation of the absolute value of the difference between each corresponding dimension.
  • the distances are then provided in a lookup table 145.
  • the size of the distance lookup table 145 is m > ⁇ 256, where m is again the total number of meshes 320 used to define each input character 305. Or, in other words, m is the total number of uncompressed input character feature vectors 130 used to define each input character 305.
  • FIG. 8 which illustrates a compressed model character template 805.
  • the template shown in FIG. 8 comprises 49 meshes and thus 49 code indices from the codebook 125.
  • the values of the meshes are assigned as ⁇ 5,10, ..., 15 ⁇ .
  • the index of the first mesh 810 is 5, so the distance between the first mesh feature of the first mesh 810 and the first feature in the first mesh 320 of a compressed input character template is Disj 5 .
  • the value of Disi 5 can be found in the lookup table 145.
  • the distance is determined between each feature in the other meshes of the compressed model character template 805 and the corresponding features in a compressed input character template. After the distances of all 49 meshes are determined, the total distance (DIS) is calculated between the compressed input character template and the compressed model character template 805 according to the following equation: DIS - Disi 5 + Dis2 ⁇ o+ + Dis4915 Eqn. 2
  • the system 100 calculates the distances between the compressed templates of model characters 135 and the uncompressed input character feature vectors 130. These distances are then sorted at step 720. Finally at step 230 the system 100 provides the top n candidate characters 150.
  • the present invention is therefore an improved method of compressing handwritten character templates for use in an electronic character recognition process.
  • the use of a codebook 125 enables multi-dimensional uncompressed model character feature vectors 110 to be compressed to a single codebook index that may preferably be only a single byte.
  • the invention may preferably use a simple lookup table technique to match input character feature vectors 130 with model character templates by comparing the distances between individual features in both the input characters and the model templates.
  • the present invention When used in conjunction with a character recognition system, the present invention thus enables highly accurate character recognition while reducing both memory and processing overhead.
  • the reduced memory and processing overhead makes the invention particularly suitable for use with portable electronic devices such as mobile phones and personal digital assistants (PDAs).
  • PDAs personal digital assistants
  • the detailed description provides a preferred exemplary embodiment only, and is not intended to limit the scope, applicability, or configuration of the present invention. Rather, the detailed description of the preferred exemplary embodiment provides those skilled in the art with an enabling description for implementing the preferred exemplary embodiment of the invention. It should be understood that various changes can be made in the function and arrangement of elements and steps without departing from the spirit and scope of the invention as set forth in the appended claims.

Abstract

A method and system for compressing handwritten character templates. The system includes a codebook generator module (105) for generating a codebook (125). The codebook (125) includes vectors defining the centers of clusters (115) of uncompressed model character feature vectors (110) provided from model character templates. A template compression module (120) is connected to the codebook generator module (105) for comparing the uncompressed model character feature vectors (110) with the codebook (125) to provide compressed templates of model characters (135). Optionally, a template matching module (140) is connected to the template compression module (120) for providing candidate characters (150) by comparing the distances between uncompressed input character feature vectors (130) and the model character templates.

Description

METHOD AND SYSTEM FOR COMPRESSING HANDWRITTEN CHARACTER TEMPLATES
FIELD OF THE INVENTION The invention relates generally to electronic recognition of handwritten text. The invention is particularly useful for, but not necessarily limited to, compressing handwritten character templates for use in portable electronic devices having limited memory. BACKGROUND OF THE INVENTION Computer recognition of printed text has presented vexing technical difficulties for several decades. Many optical character recognition (OCR) techniques have only recently been refined to enable scanned text documents to be processed with a high level of accuracy. However such OCR techniques generally are able to interpret only text that is printed by a typewriter or an electronic printer. Computer recognition of handwritten characters remains a very difficult technical challenge. Existing methods for recognizing handwritten characters often include high-resolution templates that capture a time dimension related to the act of writing text in addition to the two physical dimensions of the written text.
Such templates are usually created using electronic tablets that record text strokes when a pen or stylus contacts the tablet. Other techniques include the use of electronic pens that record the movement of the pen tip as characters are written. Features of the input handwritten characters are then matched with templates of model characters using various pattern recognition techniques.
The templates of model characters comprise the statistical average values of feature vectors from numerous input samples of each character. The memory required to store the above described multi-dimensional, digitised handwriting data is considerable. The model template dictionaries to which the input templates are compared become very large. Nevertheless a consumer demand for handwritten character recognition applications continues to grow. In particular there is a demand for handwritten character recognition capabilities on small handheld devices such as personal digital assistants (PDAs). Such devices have very limited memory and processing capacities. Existing techniques for compressing the size of the above described handwritten data include converting the handwritten characters into multidimensional feature vectors. Storing such feature vectors generally requires much less memory than storing the corresponding handwritten pixel data. However such feature vectors still often require more memory than is available on handheld devices. Particularly for larger character sets, such as a Chinese language character set that may include more than ten thousand characters, the size of the associated templates may be very large. Therefore improved methods for compressing handwritten character templates are needed. SUMMARY OF THE INVENTION According to one aspect of the present invention there is provided a method of compressing handwritten character templates comprising the steps of: generating a codebook comprising vectors defining the centers of clusters of uncompressed model character feature vectors provided from model character templates; and comparing said uncompressed model character feature vectors with said codebook to provide compressed templates of model characters. The method may preferably include calculating distances between said clusters of uncompressed model character feature vectors and uncompressed input character feature vectors, and providing top-« candidate characters by calculating the distances between said uncompressed input character feature vectors and said model character templates, where n is based on the value of a threshold. The step of providing uncompressed input character feature vectors may include the step of extracting features from a normalized input character template. Preferably the codebook may include no more than 256 clusters. The uncompressed input character feature vectors may be 8- dimensional vectors. The top-n candidate characters may be provided by calculating the distances between said uncompressed input character feature vectors and said model character templates using a sorting method to provide said candidate characters. The candidate characters may be provided in a step of a character recognition method. According to another aspect, the present invention comprises a system for compressing handwritten character templates including: a codebook generator module for generating a codebook, wherein said codebook comprises vectors defining the centers of clusters of uncompressed model character feature vectors provided from model character templates; and a template compression module operatively connected to said codebook generator module for comparing said uncompressed model character feature vectors with said codebook to provide compressed templates of model characters. Preferably the system includes a template matching module operatively connected to said template compression module for providing candidate characters by comparing the distances between uncompressed input character feature vectors and said model character templates. The system may also include a distance lookup table generator. In this specification, including the claims, the terms "comprises," "comprising" or similar terms are intended to mean a non-exclusive inclusion, such that a method or apparatus that comprises a list of elements does not include those elements solely, but may well include other elements not listed.
BRIEF DESCRIPTION OF THE DRAWINGS In order that the invention may be readily understood and put into practical effect, reference will now be made to a preferred embodiment as illustrated with reference to the accompanying drawings, wherein like reference numbers refer to like elements, in which: FIG. 1 is a schematic block diagram of the software elements of a system according to the present invention; FIG. 2 is a general flow diagram illustrating a method of compressing handwritten character templates according to the present invention; FIG. 3 is a schematic bock diagram of a three-dimensional template of a handwritten character; FIG. 4 is a flow diagram summarizing a method for providing feature vectors; FIG. 5 is a flow diagram illustrating a method for generating a codebook; FIG. 6 is a flow diagram illustrating a method for generating compressed templates of model characters; FIG. 7 is a flow diagram illustrating a method for providing candidate characters; and FIG. 8 is a schematic diagram illustrating a compressed model character template.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT OF THE INVENTION
With reference to FIG. 1 , there is illustrated a schematic block diagram of the software elements of a system 100 used to compress templates of handwritten characters according to a preferred embodiment of the present invention. With reference to FIG. 2, there is a general flow diagram illustrating a method 200 of compressing handwritten characters using the software elements of the system 100. The software elements include a codebook generator module 105 that creates uncompressed model character feature vectors 110 from model character templates and then sorts the feature vectors
110 into clusters 115. The model character templates comprise statistical average values of feature vectors from numerous input samples of each character. The uncompressed model character feature vectors 110 include feature vectors for all characters in a character set. For some languages, such as the Chinese language, there may be more than ten thousand characters in a character set. The vector centers of the clusters 115 are then indexed to form a codebook 125. (Step 210.) Next there is a template compression module 120.
Codes from the codebook 125 are compared with the uncompressed model character feature vectors 110 to create compressed templates of model characters 135. (Step 215.) A template matching module 140 comprises a lookup table generator 155 that compares the codes in the codebook 125 with the uncompressed input character feature vectors 130 to calcuate a distance lookup table 145. (Step 220.) A template matching generator 160 then calculates the distances between the compressed templates of model characters 135 and the uncompressed input character feature vectors 130 and sorts the results. (Step 225.) Finally, the system 100 provides the top n candidate characters 150, where n is a constant .
(Step 230.) With reference to FIG. 3, there is illustrated a schematic diagram of an input to the present invention comprising a digitised, high-resolution, three- dimensional info-image 300 of a handwritten character 305. The info-image 300 includes the two dimensional physical features of the character 305 as well as a time dimension that provides direction information concerning the handwritten strokes. The stroke direction of the vertical and horizontal features of the character 305 are shown by the arrows 310. Methods for creating a three-dimensional image such as the info-image 300 are well known in the art, so the steps of such a method are merely summarized herein. FIG. 3 also illustrates how the info-image 300 of the digitised character 305 is converted to uncompressed input character feature vectors 130. Pixels of the character 305 are first fit to a grid 315 and normalized so that the size of the character 305 is the same as the size of model characters used to create the uncompressed model character feature vectors 110. Each element of the grid
315 is then subdivided into an even finer mesh 320, which is then analysed to extract a feature vector 130. In the embodiment shown in FIG. 3, each mesh 320 comprises 7x7 elements. One example of a feature vector 130 is an 8- dimension directional vector. Each dimension of the vector 130 corresponds to a stroke direction as illustrated by the eight arrows 325 shown in FIG. 3 that are created by dividing a circle into 45-degree increments. Those skilled in the art will recognize that more or fewer dimensions for the feature vectors 130 may also be used according to the present invention. Each element of the mesh 320 that contains pixels from a handwritten stroke is then assigned to one of the eight directions based on a best fit of the actual stroke direction inside that element. The directional dimensions of the elements of the mesh 320 are then summed to create the uncompressed input character feature vectors 130. An 8-dimension directional feature vector is defined as where the value of v,- is the count of the z'th directional dimension in the mesh 320, where (K-i<=8). Thus the feature vector describing the character 305 shown in FIG. 3 would be assigned the following values: v/ 3. Referring to FIG. 4, there is a flow diagram summarizing a method 400 for providing the feature vectors 130 described above. At step 405 an info- image 300 of an input character 305 is received. At step 410, the info-image 300 of the input character 305 is normalized to a grid 315. At step 415 each element of the grid 315 is further divided into a mesh 320. At step 420, directional features of each mesh 320 are extracted. Next, at step 425 values are assigned to each dimension of an uncompressed input character feature vector 130. According to the present invention, the model character templates comprise statistical average values of feature vectors from numerous input samples of each character. The features of each input sample are extracted according to a process similar to the above method 400. Referring to FIG. 5, there is a flow diagram illustrating a method 500 for generating the codebook 125 in the codebook generator module 105 using the uncompressed model character feature vectors 110 of all model characters in a character set. At step 505 the uncompressed model character feature vectors 110 are provided as described above. The uncompressed model character feature vectors 110 construct a template of each model character. Next, at step 510 the uncompressed model character feature vectors 110 are clustered according to a clustering process such as are well known in the art. For example a cluster method such as the K-Means method or a GLA algorithm using vector quantization may be employed. According to a preferred embodiment of the present invention, the 8-dimensional uncompressed model character feature vectors 110 are grouped into 256 different clusters 115. The clustering may be based, for example, on the Euclidean distances between the vectors 110. At step 515 the method 500 then provides the codebook 125 to the template compression module 120. The codebook 125 consists of the vector centers of the 256 clusters 115, each cluster 115 being assigned a code number between 0 and 255. Depending on the number of model characters that are analysed, hundreds of thousands of uncompressed model character feature vectors 110 may be condensed into just the 256 clusters 115. According to other embodiments of the present invention, portable electronic devices employing the present invention that may have very minimal memory resources could use even fewer clusters 115 to define the codebook 125. Referring to FIG. 6, there is a flow diagram illustrating a method 600 for generating the compressed templates of model characters 135 using the template compression module 120. At step 605, each uncompressed model character feature vector 110 is compared with the clusters 115 in the codebook 125. Next, at step 610, each uncompressed model character feature vector 110 is then replaced with the single code number of the cluster 115 that best matches each feature vector 130. The compressed templates of model characters 135 are then provided at step 615 to the template matching unit 140. The above thus illustrates how the original uncompressed model character feature vectors 110 are converted into the compressed templates of model characters 135. For example, consider an alphabet or library of k digitised characters used in connection with the above described embodiment of the present invention. A memory module storing the corresponding uncompressed model character feature vectors 110 would need to include k*m 8-dimensional vectors, where m is the total number of meshes 320 used to define each character 305. However, using the above process of the present invention, each 8-byte uncompressed model character feature vector 110 is reduced to a single compressed template of a model character 135 requiring only one byte of memory. The necessary memory is thus reduced to 1/8 the size originally required. Referring to FIG. 7, a flow diagram is shown illustrating a method 700 for providing final candidate characters 150 for use in an associated character recognition algorithm. At step 220, the distance between each of the clusters 115 in the codebook 125 and each uncompressed input character feature vector
130 is calculated. As is well known in the art, the vector distances may be calculated using the following formula:
where A={al,a2,a3,a4,a5,a6,a7,a8} and B={bl,b2,b3,b4,b5,b6,b7,b8}. The distance between A and B thus equals the summation of the absolute value of the difference between each corresponding dimension. At step 710 the distances are then provided in a lookup table 145. The size of the distance lookup table 145 is m ><256, where m is again the total number of meshes 320 used to define each input character 305. Or, in other words, m is the total number of uncompressed input character feature vectors 130 used to define each input character 305. The remainder of the flow diagram provided in FIG. 7 will now be described by referencing the schematic diagram of FIG. 8, which illustrates a compressed model character template 805. The template shown in FIG. 8 comprises 49 meshes and thus 49 code indices from the codebook 125. The values of the meshes are assigned as {5,10, ..., 15}. The index of the first mesh 810 is 5, so the distance between the first mesh feature of the first mesh 810 and the first feature in the first mesh 320 of a compressed input character template is Disj 5 . The value of Disi 5 can be found in the lookup table 145.
Similarly, the distance is determined between each feature in the other meshes of the compressed model character template 805 and the corresponding features in a compressed input character template. After the distances of all 49 meshes are determined, the total distance (DIS) is calculated between the compressed input character template and the compressed model character template 805 according to the following equation: DIS - Disi 5 + Dis2 ιo+ + Dis4915 Eqn. 2
Referring again to FIG. 7, at step 225 the system 100 calculates the distances between the compressed templates of model characters 135 and the uncompressed input character feature vectors 130. These distances are then sorted at step 720. Finally at step 230 the system 100 provides the top n candidate characters 150. The present invention is therefore an improved method of compressing handwritten character templates for use in an electronic character recognition process. The use of a codebook 125 enables multi-dimensional uncompressed model character feature vectors 110 to be compressed to a single codebook index that may preferably be only a single byte. Further the invention may preferably use a simple lookup table technique to match input character feature vectors 130 with model character templates by comparing the distances between individual features in both the input characters and the model templates. When used in conjunction with a character recognition system, the present invention thus enables highly accurate character recognition while reducing both memory and processing overhead. The reduced memory and processing overhead makes the invention particularly suitable for use with portable electronic devices such as mobile phones and personal digital assistants (PDAs). The detailed description provides a preferred exemplary embodiment only, and is not intended to limit the scope, applicability, or configuration of the present invention. Rather, the detailed description of the preferred exemplary embodiment provides those skilled in the art with an enabling description for implementing the preferred exemplary embodiment of the invention. It should be understood that various changes can be made in the function and arrangement of elements and steps without departing from the spirit and scope of the invention as set forth in the appended claims.

Claims

WE CLAIM:
1. A method of compressing handwritten character templates comprising the steps of: generating a codebook comprising vectors defining the centers of clusters of uncompressed model character feature vectors provided from model character templates; comparing said uncompressed model character feature vectors with said codebook; and providing compressed templates of model characters.
2. The method of claim 1, further comprising the steps of: calculating distances between said clusters of uncompressed model character feature vectors and uncompressed input character feature vectors; and providing top-« candidate characters by calculating the distances between said uncompressed input character feature vectors and said model character templates, where n is a constant.
3. The method of claim 1, wherein said step of providing uncompressed input character feature vectors comprises the step of extracting features from a normalized input character template.
4. The method of claim 1, wherein said codebook comprises no more than 256 clusters.
5. The method of claim 1, wherein said uncompressed input character feature vectors are 8-dimensional vectors.
6. The method of claim 1, wherein said step of providing top-« candidate characters by calculating the distances between said uncompressed input character feature vectors and said model character templates uses a sorting method to provide said candidate characters.
7. The method of claim 6, wherein said candidate characters are provided in a step of a character recognition method.
8. A system for compressing handwritten character templates comprising: a codebook generator module for generating a codebook, wherein said codebook comprises vectors defining the centers of clusters of uncompressed model character feature vectors provided from model character templates; and a template compression module operatively connected to said codebook generator module for comparing said uncompressed model character feature vectors with said codebook to provide compressed templates of model characters.
9. The system of claim 8, further comprising a template matching module operatively connected to said template compression module for providing candidate characters by comparing the distances between uncompressed input character feature vectors and said model character templates.
10. The system of claim 9 wherein said template matching module further comprises a distance lookup table generator.
11. The system of claim 8, wherein said codebook comprises no more than 256 clusters.
12. The system of claim 8, wherein said uncompressed input character feature vectors are 8-dimensional vectors.
13. The system of claim 9, wherein said template matching module uses a sorting method to provide said candidate characters.
14. The system of claim 13, wherein said candidate characters are provided in a character recognition system.
EP04784422A 2003-09-29 2004-09-17 Method and system for compressing handwritten character templates Withdrawn EP1668568A4 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CNB031255736A CN1303563C (en) 2003-09-29 2003-09-29 Method and system for compressing hand-written character template
PCT/US2004/030554 WO2005034026A2 (en) 2003-09-29 2004-09-17 Method and system for compressing handwritten character templates

Publications (2)

Publication Number Publication Date
EP1668568A2 true EP1668568A2 (en) 2006-06-14
EP1668568A4 EP1668568A4 (en) 2012-02-08

Family

ID=34398358

Family Applications (1)

Application Number Title Priority Date Filing Date
EP04784422A Withdrawn EP1668568A4 (en) 2003-09-29 2004-09-17 Method and system for compressing handwritten character templates

Country Status (5)

Country Link
EP (1) EP1668568A4 (en)
KR (1) KR20060056408A (en)
CN (1) CN1303563C (en)
RU (1) RU2006114780A (en)
WO (1) WO2005034026A2 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003085638A1 (en) * 2002-03-27 2003-10-16 Nokia Corporation Pattern recognition
US8077994B2 (en) 2008-06-06 2011-12-13 Microsoft Corporation Compression of MQDF classifier using flexible sub-vector grouping

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5187751A (en) * 1990-07-24 1993-02-16 Sharp Kabushiki Kaisha Clustering system for optical character reader
US5812697A (en) * 1994-06-10 1998-09-22 Nippon Steel Corporation Method and apparatus for recognizing hand-written characters using a weighting dictionary
US6345119B1 (en) * 1996-02-19 2002-02-05 Fujitsu Limited Handwritten character recognition apparatus and method using a clustering algorithm

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6104833A (en) * 1996-01-09 2000-08-15 Fujitsu Limited Pattern recognizing apparatus and method
CN1181662C (en) * 2000-07-21 2004-12-22 中兴通讯股份有限公司 Hand written character input method for mobile telephone and mobile phone with said function
CN1140864C (en) * 2001-01-02 2004-03-03 无敌科技(西安)有限公司 Hand writing input method for hand held data processor

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5187751A (en) * 1990-07-24 1993-02-16 Sharp Kabushiki Kaisha Clustering system for optical character reader
US5812697A (en) * 1994-06-10 1998-09-22 Nippon Steel Corporation Method and apparatus for recognizing hand-written characters using a weighting dictionary
US6345119B1 (en) * 1996-02-19 2002-02-05 Fujitsu Limited Handwritten character recognition apparatus and method using a clustering algorithm

Non-Patent Citations (11)

* Cited by examiner, † Cited by third party
Title
"Proceedings IEEE Symposium on FPGAs for Custom Computing Machines", APPLICATIONS OF COMPUTER VISION, 1996. WACV '96., PROCEEDINGS 3RD IEEE WORKSHOP ON SARASOTA, FL, USA 2-4 DEC. 1996, LOS ALAMITOS, CA, USA,IEEE COMPUT. SOC, US, 2 December 1996 (1996-12-02), XP010206445, ISBN: 978-0-8186-7620-8 *
GRANDIDIER F: "UN NOUVEL ALGORITHME DE SELECTION DE CARACTERISTIQUES - APPLICATION A LA LECTURE AUTOMATIQUE DE L'ECRITURE MANUSCRITE", THÈSE PRÉSENTÉE À L'ÉCOLE DE TECHNOLOGIE SUPÉRIEURE COMMEEXIGENCE PARTIELLE À L'OBTENTION DU DOCTORAT EN GÉNIE PH.D, XX, XX, 24 January 2003 (2003-01-24), pages A/B,I-XVIII,01, XP001187010, *
KIMURA F ET AL: "HANDWRITTEN NUMERICAL RECOGNITION BASED ON MULTIPLE ALGORITHMS", PATTERN RECOGNITION, ELSEVIER, GB, vol. 24, no. 10, 1 January 1991 (1991-01-01), pages 969-983, XP000243559, ISSN: 0031-3203, DOI: 10.1016/0031-3203(91)90094-L *
KIMURA F ET AL: "Improvement of handwritten Japanese character recognition using weighted direction code histogram", PATTERN RECOGNITION, ELSEVIER, GB, vol. 30, no. 8, 1 August 1997 (1997-08-01) , pages 1329-1337, XP004074593, ISSN: 0031-3203, DOI: 10.1016/S0031-3203(96)00153-7 *
LIU C-L ET AL: "Handwritten digit recognition: benchmarking of state-of-the-art techniques", PATTERN RECOGNITION, ELSEVIER, GB, vol. 36, no. 10, 22 May 2003 (2003-05-22), pages 2271-2285, XP004439122, ISSN: 0031-3203, DOI: 10.1016/S0031-3203(03)00085-2 *
RONG ZHANG ET AL: "A large scale clustering scheme for kernel K-Means", PATTERN RECOGNITION, 2002. PROCEEDINGS. 16TH INTERNATIONAL CONFERENCE ON QUEBEC CITY, QUE., CANADA 11-15 AUG. 2002, LOS ALAMITOS, CA, USA,IEEE COMPUT. SOC, US, vol. 4, 11 August 2002 (2002-08-11), pages 289-292, XP010613524, DOI: 10.1109/ICPR.2002.1047453 ISBN: 978-0-7695-1695-0 *
S Tsuruoka ET AL: "Handwritten `Kanji' and `Hiragana' character recognition using weighted direction index histogram method (abstract)", , 1 January 1988 (1988-01-01), XP55015661, Retrieved from the Internet: URL:INSPEC [retrieved on 2012-01-03] *
See also references of WO2005034026A2 *
SINGHAL V ET AL: "Script-based classification of hand-written text documents in a multilingual environment", RESEARCH ISSUES IN DATA ENGINEERING: MULTI-LINGUAL INFORMATION MANAGEM ENT, 2003. RIDE-MLIM 2003. PROCEEDINGS. 13TH INTERNATIONAL WORKSHOP ON MARCH 10-11, 2003, PISCATAWAY, NJ, USA,IEEE, 10 March 2003 (2003-03-10), pages 47-54, XP010671689, DOI: 10.1109/RIDE.2003.1249845 ISBN: 978-0-7803-7868-1 *
YOSHINOBU HOTTA ET AL: "Handwritten Numeral Recognition Using Personal Handwriting Characteristics Based On Clustering Method", APPLICATIONS OF COMPUTER VISION, 1996. WACV '96., PROCEEDINGS 3RD IEEE WORKSHOP ON SARASOTA, FL, USA 2-4 DEC. 1996, LOS ALAMITOS, CA, USA,IEEE COMPUT. SOC, US, 2 December 1996 (1996-12-02), pages 284-289, XP032304723, DOI: 10.1109/ACV.1996.572078 ISBN: 978-0-8186-7620-8 *
YOSHIYUKI YAMASHITA ET AL: "Classification of handprinted Kanji characters by the structured segment matching method", PATTERN RECOGNITION LETTERS, vol. 1, no. 5-6, 1 July 1983 (1983-07-01), pages 475-479, XP55015660, ISSN: 0167-8655, DOI: 10.1016/0167-8655(83)90089-2 *

Also Published As

Publication number Publication date
CN1303563C (en) 2007-03-07
KR20060056408A (en) 2006-05-24
CN1604124A (en) 2005-04-06
WO2005034026A2 (en) 2005-04-14
EP1668568A4 (en) 2012-02-08
RU2006114780A (en) 2007-11-10
WO2005034026A3 (en) 2005-06-02

Similar Documents

Publication Publication Date Title
CN1167030C (en) Handwriteen character recognition using multi-resolution models
KR100297482B1 (en) Method and apparatus for character recognition of hand-written input
Hochberg et al. Automatic script identification from document images using cluster-based templates
US7298903B2 (en) Method and system for separating text and drawings in digital ink
US4653107A (en) On-line recognition method and apparatus for a handwritten pattern
CN102449640B (en) Recognizing handwritten words
US8761500B2 (en) System and methods for arabic text recognition and arabic corpus building
JPH06243297A (en) Method and equipment for automatic handwritten character recognition using static and dynamic parameter
JP2005242579A (en) Document processor, document processing method and document processing program
Mohiuddin et al. Unconstrained Bangla online handwriting recognition based on MLP and SVM
Nasrollahi et al. Printed persian subword recognition using wavelet packet descriptors
Murthy et al. Choice of Classifiers in Hierarchical Recognition of Online Handwritten Kannada and Tamil Aksharas.
Kunte et al. An OCR system for printed Kannada text using two-stage Multi-network classification approach employing Wavelet features
Chaithra et al. Handwritten online character recognition for single stroke Kannada characters
EP1668568A2 (en) Method and system for compressing handwritten character templates
KR20090111202A (en) The Optical Character Recognition method and device by the numbers of horizon, vertical and slant lines which is the element of Hanguel
Fan et al. Online handwritten Chinese character recognition system
Nagy et al. Adaptive and interactive approaches to document analysis
Shekhar et al. Document specific sparse coding for word retrieval
Zhu et al. Building a compact on-line MRF recognizer for large character set using structured dictionary representation and vector quantization technique
Annanurov et al. Feature selection for Khmer handwritten text recognition
JPH1196302A (en) Handwritten character recognizing device
Assabie et al. Online Handwriting Recognition of Ethiopic Script
Albidewi The use of object-oriented approach for Arabic documents recognition
JP4335185B2 (en) Character identification based on written information

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20060323

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): DE FR GB IT

DAX Request for extension of the european patent (deleted)
RBV Designated contracting states (corrected)

Designated state(s): DE FR GB IT

RIN1 Information on inventor provided before grant (corrected)

Inventor name: GUO, FENG-JUN,1916 ROOM,

Inventor name: ZHEN, LI-XIN

Inventor name: HUANG, JIAN-CHENG

RAP1 Party data changed (applicant data changed or rights of an application transferred)

Owner name: MOTOROLA MOBILITY, INC.

A4 Supplementary search report drawn up and despatched

Effective date: 20120112

RIC1 Information provided on ipc code assigned before grant

Ipc: G06K 9/46 20060101ALI20120105BHEP

Ipc: G06K 9/00 20060101ALI20120105BHEP

Ipc: G06K 9/62 20060101AFI20120105BHEP

17Q First examination report despatched

Effective date: 20120515

RAP1 Party data changed (applicant data changed or rights of an application transferred)

Owner name: MOTOROLA MOBILITY LLC

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20160401

P01 Opt-out of the competence of the unified patent court (upc) registered

Effective date: 20230520