WO1991018366A1 - A method of detecting skew in form images - Google Patents

A method of detecting skew in form images Download PDF

Info

Publication number
WO1991018366A1
WO1991018366A1 PCT/US1991/003102 US9103102W WO9118366A1 WO 1991018366 A1 WO1991018366 A1 WO 1991018366A1 US 9103102 W US9103102 W US 9103102W WO 9118366 A1 WO9118366 A1 WO 9118366A1
Authority
WO
WIPO (PCT)
Prior art keywords
polygon
vector
polygons
determining
value
Prior art date
Application number
PCT/US1991/003102
Other languages
French (fr)
Inventor
Yongchun Lee
Original Assignee
Eastman Kodak Company
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Eastman Kodak Company filed Critical Eastman Kodak Company
Publication of WO1991018366A1 publication Critical patent/WO1991018366A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/146Aligning or centring of the image pick-up or image-field
    • G06V30/1475Inclination or skew detection or correction of characters or of image to be recognised
    • G06V30/1478Inclination or skew detection or correction of characters or of image to be recognised of characters or characters lines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Definitions

  • the present invention relates to image processing techniques in general, and more particularly, to the automatic detection of skew in form images.
  • Digital coding of graphic information is commonly called for in a wide variety of contexts from facsimile data transmission to computerized photograph analysis and pattern recognition, to computer-aided-design applications.
  • the first step in such digitizing is to scan the document in a controlled fashion, measuring the graphic value of the image at each point.
  • Currently available scanning devices are capable of substantially simultaneously delivering a binary output signal for each of n lines of resolution cells, each cell being approximately 0.01 mm square. Thus a one meter long scan line of an engineering drawing for example,
  • the digitized information is in the form of raster output data from 0.01 mm resolution cells
  • a typical 80 character alphabetic line might then be coded as approximately 200 information signals for each 20 cm long scan line, a reduction of 99 percent compared
  • Electronic document deskewing is an essential preprocessing capability necessary to enable further document processing of a digitized paper-based form.
  • the present invention provides a fast and accurate method for detecting skew angle of the form image.
  • skew of a document is usually defined by the orientation of the side boundaries of the document and/or the straight lines contained in the image.
  • skew for a form document, it is predominated by straight lines which are the basic element for constructing a form or tables and the orientation of straight lines which help indicate the status of the form skew.
  • Skew of the form image can be implied when a majority of straight lines deviate from either a horizontal or vertical direction.
  • the present invention combines document contour vectorization with the use of a modified Hough transform for the fast detection of a skew angle of a digitized form image. Skew correction is performed by a matrix multiplication.
  • Figures 1A-1C show a pixel neighborhood and the pixel search order
  • Figure 2A illustrates graphically the linear approximation of a polygon which results in a number of short almost coincident vectors that are replaced by a single vector in their place;
  • Figure 2B illustrates how the deviation from a substitute vector is diminished
  • Figure 3 is a flow diagram for the detection of skew angle
  • Figure 4A is a graphical example showing three points on a form line in Cartisean space
  • Figure 4B is a graphical example of sinusoidal curves in parametric space corresponding to three points on the line in Figure 4A;
  • FIGS 5A-C illustrate graphically the use of histograms in peak detection
  • Figures 6A-D illustrate the transformation and skew correction for the document form.
  • the present invention uses a polygon-based method which overcomes many of the limitations associated with bitmap techniques mentioned earlier.
  • Contour vectorization converts a digitized document into a collection of polygons. This conversion requires contour pixel tracing followed by piecewise linear approximation of the contour pixels.
  • the contour pixel tracing operation starts by scanning pixels in a horizontal direction. Whenever a transition from 0 to 1 is detected, it traces the contour in a rotational order until it returns to the starting point for the completion of a contour.
  • the rotational order that the search is performed is illustrated in Figure 1.
  • the pixel p, is a transition which is detected by scanning the document page from left to right. Using a three by three grid shown in Fig. 1C with the center pixel being p.
  • the piecewise linear approximation process converts a sequence of contour pixels into a sequence of vectors by merging colinear pixels.
  • the sequence of vectors forming a closed boundary becomes a polygon as shown in Figure 2A
  • the piecewise linear approximation process is modified somewhat by imposing an inner product constraint which allows for the detection of sharp corners during iteration.
  • Sequentially scan the contour pixel and merge the consecutive pixels which lie on the same straight line into vectors.
  • the vertices of a polygon are denoted by (V , V 2 ,
  • any three consecutive points vl.-l,, vl., vl.+l, and vl.-l,, vl.+l. form the sides of a triangle . ,, v., v. .
  • the line segment v. _, v. _ is the base of ⁇ - v._ ⁇ , vl., vl.+l, .
  • the height of ⁇ vl.-l, , v ⁇ .,' v. . serves as the deviation for approximating the series of V. .,, V. and v., v. ., to v. ., vl.+l..
  • d The value of d is given by: ⁇ - « ⁇ « ⁇ + ⁇ - y ⁇ - ⁇ > - y ⁇ ( * ⁇ + ⁇ * « ⁇ - ⁇ * + (y ⁇ - ⁇ x ⁇ + ⁇ - * ⁇ - ⁇ ⁇
  • the sharp corner preservation is accomplished by evaluating normalized inner product values of each of two consecutive vectors during iteration and skipping over the segment merging operator when the value is smaller than a negative
  • the negative threshold value is selected because the inner product of edge segments at a sharp corner must be a negative value. Thus, the smaller the normalized value the sharper the corner it indicates. For this particular application, the
  • the contour vectorization process converts a bit map image into a collection
  • the polygon representation allows for the extraction of the straight line orientation conducted in a vector domain requiring less data.
  • a form image will produce a number of contour polygons of widely varying sizes.
  • a collection of closed polygons is obtained which represent object contour components (either inner or outer) .
  • the larger polygons represent the larger graphic outlines (contours) of the image.
  • the graphic outlines can be boundaries of frames or tables in the image.
  • the larger polygons which represent graphic boundaries are extracted for use in line angle detection.
  • these large graphical contour components are composed of straight boundary lines. Therefore, the skew angle detection of a form turns into the detection of the orientation of straight lines.
  • the polygon vectors associated with these graphic boundaries are inputted to a modified Hough transform for detection of straight lines.
  • this line to point transformation is applied to image points, it can be used to detect image points that lie along a given straight line.
  • the modified version developed for this application uses the center coordinates (x,y) of each vector in the extraction of large graphical contour components. Furthermore, the angle and length of each vector are computed from the coordinates of the vector end points as prior information. Transformation of the center coordinates (x,y) of the vectors incorporated with the vector angles into the parametric ( ⁇ , ⁇ ) space, the speed to make a histogram is significantly improved, over application of the standard Hough transform in a bitmap document.
  • p the perpendicular distance from the origin, as shown in Figure 4A
  • the inclination in degrees of that line.
  • any line in image space is described by a point in the parametric space (p, ⁇ ).
  • a point in Cartisean space (x,y) corresponds to a curve in parametric space.
  • the parametric domain curves that corresponding to colinear points in the image space intersect at a common ( ⁇ , ⁇ ) point.
  • the set of points, of which image points can be connected by a straight line, will produce a count in the Hough transform domain of magnitude N at the position ( ⁇ » ⁇ ). where ( ⁇ , ⁇ ) describe the connecting lines.
  • the Hough transform of a point (x. , y. ) is performed by computing p from the above equation for all n values of ⁇ , into which ⁇ is quantized in m intervals of width ⁇ p. In this way, a quantized sinusoidal curve is obtained and along the quantized curve each cell is incremented an equal amount. This procedure is repeated for all points.
  • Colinear points in the image show up as peaks in the parametric (p, ⁇ ) space.
  • the modified Hough transform used to extract straight lines takes the center points (x. , y. ) of vectors in the extracted graphic contour polygons as the point to be transformed points. This will detect sets of vectors that lie along a straight line, and it is the orientation of these detected straight lines that are used for the defining of a form document's skew angle.
  • Peak detection from the transformed domain which is described in detecting skew angle below the peak ⁇ and is defined as the skew
  • step (1) - (8) until last polygon of the large group has been processed.
  • the document form skew angle is the average of the skew angles collected from all of the large polygons.
  • the value of ⁇ is determined when all possible values of p are scanned and yield clusters of high peaks in the accumulation array (histogram Fig. 5A) .
  • the values of ⁇ associated with the clusters of peaks indicate the potential orientation of dominant contour lines which is defined as the deskew angle of the form.
  • the detection of valid peaks is described by an example as shown in Figures 5A-C.
  • the example assumes that three large graphical polygons are extracted.
  • the modified Hough transform yields three accumulator arrays corresponding to the three polygons 1, 2, and 3, respectively.
  • the majority of peaks are found at p,, p_ and p 3 corresponding to -li ⁇ the polygons 1, 2, and 3, respectively as shown in Figures 5A-C.
  • the values of ⁇ associated with the cluster of peaks indicate the orientations of major contour lines.
  • the peak is required to meet two criteria: first, the peak value must exceed a global threshold; secondly, when the ⁇ value associated with the peak is added to 180°, a similar cluster of local peaks should be found.
  • the first requirement is to ignore short segments and keep longer line segments for peak detection. The longer the segments, the more reliable the data will be.
  • the second requirement is to avoid the false line detection which may result from slanted lines.
  • a pair of anti-parallel vectors confirms that they are contour lines of a skewed rectangular-like box. Imposing both of these restrictions to the peaks shown in the example, only the peaks in polygon 1 meet both the criteria.
  • the polygons 2 and 3 fail to meet both the criteria due to the peak values being smaller than the threshold or the corresponding pair is not to be found.
  • the first few highest and qualified peaks are collected and the mean of the ⁇ angles with the collected peaks is taken as the skew angle of the form.
  • the working range for ⁇ in peaks detection is confined in 60° ⁇ ⁇ ⁇ 120° and 240° ⁇ ⁇ ⁇ 300°.
  • the working range of ⁇ is dependent on the expected maximum skew angle of a form to be detected.
  • the range of ⁇ defined above assumes a maximum of 30° for the skew angle of the form to be detected.
  • the computational sequence is illustrated in Figures 6A-D in which the vertice of each of the polygons is rotated about the center point of the document p.
  • the transformation process comprises a three step sequence.
  • the first step performs the translation of the vertice of each of the polygons so that the center point p is at the origin as shown in Figure
  • the second step rotates the vertices of polygons with a degree of ⁇ .
  • the result of such a rotation is shown in Figure 6C.
  • Figure 6D is the result of the third step that translates such that the point at the origin returns to the center of the document.
  • the (Xd, Yd,) and (Xo, Yo) are coordinates of vertice of a polygon before and after transformation, respectively.
  • the determined skew angle is ⁇ and the (X, Y. ) is the coordinate of the center point P.
  • a paper document is scanned and digitized in step 10 to convert the document into a digital image.
  • a thresholding operation is applied to each pixel of the digitized image.
  • step 14 an object contour following operation is used to extract edge pixels of objects (i.e. outlines of objects) in the bitmap image.
  • a linear approximation operation is applied next in step 16 for merging colinear contour pixels into straight segments.
  • step 16 This results in a collection of polygons.
  • Each polygon represents either an inner or outer contour of an object.
  • polygon bounding is performed which calculates the size (width and height) of a bounding polygon by subtracting the extreme coordinates of that polygon in both the horizontal and vertical directions.
  • step 20 a predetermined size threshold value is applied to the collection of polygons. This results in two sets of polygons.
  • the ones collected in step 22 are a collection of small polygons.
  • step 24 The collection established in step 24 are associated with large graphic contours.
  • a modified Hough transform is applied to the polygon vectors in step 26. Use of this transform results in mapping the center coordinates of polygon vectors into a two dimensional accumulator array in a parametric domain which is easier for straight line detection.
  • the accumulator array is scanned to locate the highest peak in the parametric array.
  • Step 30 the final step, the angular value of the highest peaks are read and that angular value is defined as the skew angle of the form document.
  • the present invention is useful in computer based systems that provide automated analysis and interpretation of paper-based documents.
  • the present invention uses the geometrical spatial relationship of the contours and a modified Hough transform for the fast detection of any skew angle of a digitized form image and the skew correction is then performed.
  • the present invention has advantages over previously applied bitmap techniques in accuracy, robustness, efficiency of data structure, storage and document analysis. Accordingly, the present invention is more appropriate in determining the skew angle of forms in the preprocessing steps of document analysis and classification.

Abstract

Document contour vectorization and the use of a modified Hough transform are used in combination to detect the skew angle of a digitized form image so that image skew may then be corrected in the preprocessing of form images prior to document analysis and classification.

Description

A METHOD OF DETECTING SKEW IN FORM IMAGES
Technical Field of the Invention
The present invention relates to image processing techniques in general, and more particularly, to the automatic detection of skew in form images.
Backσround of the Invention
Digital coding of graphic information is commonly called for in a wide variety of contexts from facsimile data transmission to computerized photograph analysis and pattern recognition, to computer-aided-design applications. The first step in such digitizing is to scan the document in a controlled fashion, measuring the graphic value of the image at each point. Currently available scanning devices are capable of substantially simultaneously delivering a binary output signal for each of n lines of resolution cells, each cell being approximately 0.01 mm square. Thus a one meter long scan line of an engineering drawing for example,
5 would contain 10 such resolution cells; a single square centimeter would contain 10 resolution cells.
Where, as indicated above, the digitized information is in the form of raster output data from 0.01 mm resolution cells, a typical 80 character alphabetic line might then be coded as approximately 200 information signals for each 20 cm long scan line, a reduction of 99 percent compared
4 to the 2 x 10 bits of raw raster output data. When it is considered that a sheet of A4 paper contains 6 x 10 such resolution cells, it can be seen that such a coding is still very cumbersome, requiring over a million information signals to code a single page of bi-tonal writing, scan line by scan line. This inefficiency is addressed in the prior art by a number of techniques which look for broader patterns by correlating the run length compressed data across a second dimension, typically by comparing contiguous adjacent scan line data and coding the difference.
Electronic document deskewing is an essential preprocessing capability necessary to enable further document processing of a digitized paper-based form. The present invention provides a fast and accurate method for detecting skew angle of the form image.
To correct a skewed image requires two processing steps: first, establishing the amount of skew and then deskewing of the image by using skew correction. The skew of a document is usually defined by the orientation of the side boundaries of the document and/or the straight lines contained in the image. Secondly, for a form document, it is predominated by straight lines which are the basic element for constructing a form or tables and the orientation of straight lines which help indicate the status of the form skew. Skew of the form image can be implied when a majority of straight lines deviate from either a horizontal or vertical direction.
To detect the orientation of straight boundaries in the past, a Hough transform was used. A direct application of this transform to a bitmap document for line detection has a major disadvantage. It requires extensive computation to make a histogram in parametric space (ρ,θ) due to the large number of pixels. This can be prohibitive for practical applications. Disclosure of the Invention The present invention combines document contour vectorization with the use of a modified Hough transform for the fast detection of a skew angle of a digitized form image. Skew correction is performed by a matrix multiplication. Brief Description of the Drawings
Figures 1A-1C show a pixel neighborhood and the pixel search order;
Figure 2A illustrates graphically the linear approximation of a polygon which results in a number of short almost coincident vectors that are replaced by a single vector in their place;
Figure 2B illustrates how the deviation from a substitute vector is diminished; Figure 3 is a flow diagram for the detection of skew angle;
Figure 4A is a graphical example showing three points on a form line in Cartisean space; Figure 4B is a graphical example of sinusoidal curves in parametric space corresponding to three points on the line in Figure 4A;
Figures 5A-C illustrate graphically the use of histograms in peak detection; and
Figures 6A-D illustrate the transformation and skew correction for the document form. Modes of Carrying Out the Invention
The present invention uses a polygon-based method which overcomes many of the limitations associated with bitmap techniques mentioned earlier. Contour vectorization converts a digitized document into a collection of polygons. This conversion requires contour pixel tracing followed by piecewise linear approximation of the contour pixels. The contour pixel tracing operation starts by scanning pixels in a horizontal direction. Whenever a transition from 0 to 1 is detected, it traces the contour in a rotational order until it returns to the starting point for the completion of a contour. The rotational order that the search is performed is illustrated in Figure 1. The pixel p, , is a transition which is detected by scanning the document page from left to right. Using a three by three grid shown in Fig. 1C with the center pixel being p. the first transition and looking in the specified search order 1-8 of Fig. 1C until the next transition is located. In this instance, ρ? was located in cell number 3 of the three by three grid. That pixel was located using the rule of adding a value of four to the previous direction using module 8 arithmetic, and adding one to the result. p_ in Fig, IB is now the central pixel in the three by three grid in the same search order thus locating the transition in cell 3. The next pixel in the search starting from pixel p_ in direction 8. The process is repeated until a closed contour is completed. After completion of a contour tracing, scanning resumes to find the next transition and then traces the next contour. This process is repeated until the last contour has been completed. Piecewise Linear Approximation
The piecewise linear approximation process converts a sequence of contour pixels into a sequence of vectors by merging colinear pixels. The sequence of vectors forming a closed boundary becomes a polygon as shown in Figure 2A The piecewise linear approximation process is modified somewhat by imposing an inner product constraint which allows for the detection of sharp corners during iteration. Sequentially scan the contour pixel and merge the consecutive pixels which lie on the same straight line into vectors. This forms a polygon which is composed of a sequence of short vectors as shown in Figure 2A The vertices of a polygon are denoted by (V , V2,
V3-...Vl....V ). Calculation of normalized inner product of any two consecutive vectors (e.g. V. and vi+1) where
Figure imgf000007_0001
I tl l'i i I and
-1 ≤ ι i ≤ 1
By considering any three consecutive points vl.-l,, vl., vl.+l, and vl.-l,, vl.+l. form the sides of a triangle . ,, v., v. . The line segment v. _, v. _ is the base of Δ - v._η, vl., vl.+l, . The height of Δ vl.-l, , vι.,' v. . serves as the deviation for approximating the series of V. .,, V. and v., v. ., to v. ., vl.+l.. If the deviation is smaller than a predetermined threshold (ε) and I. is greater than a predetermined negative value the approximator described above is applied. Otherwise, the part v. is kept and the next two consecutive segments are exposed for linear approximation. ik In Fig. 2b vectors v. 1 v. and \. 1~-L 1 v. v. . are shown if the values for d which is the deviation from a replacement vector v. . v. _ is below a given value, the replacement will be made. However, in the event d is above a predetermined value, the original vector will be preserved. The value of d is given by: ύ - «ι<«ι+ι - yι-ι> - yι(*ι+ι* «ι-ι* + (yι-ιxι+ι - *ι-ι ^
A i*i - xi-ι}* + ljr" ι-ι,if
5 Accordingly, the sharp corner preservation is accomplished by evaluating normalized inner product values of each of two consecutive vectors during iteration and skipping over the segment merging operator when the value is smaller than a negative
10 threshold. The negative threshold value is selected because the inner product of edge segments at a sharp corner must be a negative value. Thus, the smaller the normalized value the sharper the corner it indicates. For this particular application, the
15 threshold is set a (-0.5) in radians. By incorporating this constraint in a piecewise linear approximation, it has been found that the process preserves sharp turning acute corners while smoothing out noisy short segments. It should be
20 noted that this capability is particularly critical when linear approximation is applied to line-like objects. It should be noted that during operation of the piecewise linear approximation algorithm, the smoothing threshold is started with one and
25 incremented up to the predetermined value while iteration goes along in order to minimize the distorted results obtained in linear approximation. When completed, the contour vectorization process converts a bit map image into a collection
30 of simple polygons. The polygon representation allows for the extraction of the straight line orientation conducted in a vector domain requiring less data.
After the application of contour
35 vectorization, in general, a form image will produce a number of contour polygons of widely varying sizes. A collection of closed polygons is obtained which represent object contour components (either inner or outer) . The larger polygons represent the larger graphic outlines (contours) of the image. The graphic outlines can be boundaries of frames or tables in the image. By applying a size filter to the collected polygons, the larger polygons which represent graphic boundaries are extracted for use in line angle detection. In a form image, these large graphical contour components are composed of straight boundary lines. Therefore, the skew angle detection of a form turns into the detection of the orientation of straight lines. The polygon vectors associated with these graphic boundaries are inputted to a modified Hough transform for detection of straight lines. The Hough transform technique of mappiny image points into a parametric domain where image structure can be easily recognized and is commonly used for straight line detection. When this line to point transformation is applied to image points, it can be used to detect image points that lie along a given straight line. The modified version developed for this application uses the center coordinates (x,y) of each vector in the extraction of large graphical contour components. Furthermore, the angle and length of each vector are computed from the coordinates of the vector end points as prior information. Transformation of the center coordinates (x,y) of the vectors incorporated with the vector angles into the parametric (ρ,Θ) space, the speed to make a histogram is significantly improved, over application of the standard Hough transform in a bitmap document.
The Hough transform for a point (x,y) in Cartisean space is given by: p, = x cos Θ, + y sin Θ, where p is the perpendicular distance from the origin, as shown in Figure 4A and θ is the inclination in degrees of that line. Thus, any line in image space is described by a point in the parametric space (p,Θ). Similarly, a point in Cartisean space (x,y) corresponds to a curve in parametric space. The parametric domain curves that corresponding to colinear points in the image space intersect at a common (ρ,Θ) point. The set of points, of which image points can be connected by a straight line, will produce a count in the Hough transform domain of magnitude N at the position (ρ»Θ). where (ρ,Θ) describe the connecting lines.
In practice, the Hough transform of a point (x. , y. ) is performed by computing p from the above equation for all n values of Θ, into which Θ is quantized in m intervals of width Δp. In this way, a quantized sinusoidal curve is obtained and along the quantized curve each cell is incremented an equal amount. This procedure is repeated for all points. Colinear points in the image show up as peaks in the parametric (p, Θ) space.
The modified Hough transform used to extract straight lines takes the center points (x. , y. ) of vectors in the extracted graphic contour polygons as the point to be transformed points. This will detect sets of vectors that lie along a straight line, and it is the orientation of these detected straight lines that are used for the defining of a form document's skew angle. Modified Houαh Transform
1. Read a previously extracted large polygon. 2. Calculate the maximum p which is defined by equation ^ W 2) + (H/2) where W=width of the encompassing rectangle and H=height of the encompassing rectangle. Next, set the origin of the point in Cartisean space as the center point of the encompassing rectangle of the polygon.
3. Quantize the value p into m intervals of width Δp, and sample the value of Θ every
ΔΘ in the range of 0° to 360°.
4. Calculate the center point coordinate (x., y-), vector length and angle (Θ.) of the vector.
5. Using the center coordinates (x., y.) and the angle (Θ) of the vector (p), can be computed according to Hough transform equation. Note that this is a point-to-point mapping rather than a point-to-curve mapping described previously.
6. Add the value of vector length at the coordinates (p. , Θ. ) in the histogram.
7. Repeated step (4) - (6) until last vector of the polygon has been processed.
8. Peak detection from the transformed domain which is described in detecting skew angle below the peak Θ and is defined as the skew
P angle of the polygon. 9. Repeat step (1) - (8) until last polygon of the large group has been processed.
10. The document form skew angle is the average of the skew angles collected from all of the large polygons. There are three advantages in using the algorithm set forth above: (I) There is a tremendous reduction in data points resulting from the vectorization process. This results in a substantial reduction in the computation loop in computing the Hough transform. (II) The use of vector angular information, the transformation of data points from Cartisean space (x,y) to parametric space (p,Θ) is a one-to-one mapping instead of one-to-multiple mappings as in the standard equation. This greatly reduces the number of processing steps in computing the accumulator array. (Ill) Use of a weighted accumulator can greatly enhance the peaks which correspond to more reliable long vectors and de-emphasize the noisy short vectors. This significantly improves the detectability of the peaks in parametric space (p,θ). Detection of Skew Angle
If a form document is skewed in a particular orientation, the value of Θ is determined when all possible values of p are scanned and yield clusters of high peaks in the accumulation array (histogram Fig. 5A) . The values of Θ associated with the clusters of peaks indicate the potential orientation of dominant contour lines which is defined as the deskew angle of the form.
The detection of valid peaks is described by an example as shown in Figures 5A-C. The example assumes that three large graphical polygons are extracted. The modified Hough transform yields three accumulator arrays corresponding to the three polygons 1, 2, and 3, respectively. By scanning all possible values of p, the majority of peaks are found at p,, p_ and p3 corresponding to -li¬ the polygons 1, 2, and 3, respectively as shown in Figures 5A-C. The values of Θ associated with the cluster of peaks indicate the orientations of major contour lines. To validate the true peak which reflects the skew angle of a form, the peak is required to meet two criteria: first, the peak value must exceed a global threshold; secondly, when the Θ value associated with the peak is added to 180°, a similar cluster of local peaks should be found. The first requirement is to ignore short segments and keep longer line segments for peak detection. The longer the segments, the more reliable the data will be. The second requirement is to avoid the false line detection which may result from slanted lines. A pair of anti-parallel vectors confirms that they are contour lines of a skewed rectangular-like box. Imposing both of these restrictions to the peaks shown in the example, only the peaks in polygon 1 meet both the criteria. The polygons 2 and 3 fail to meet both the criteria due to the peak values being smaller than the threshold or the corresponding pair is not to be found. In order to obtain better estimation of the actual skew angle, the first few highest and qualified peaks are collected and the mean of the Θ angles with the collected peaks is taken as the skew angle of the form. Note that the working range for Θ in peaks detection is confined in 60° ≤ Θ ≤ 120° and 240° ≤ Θ ≤ 300°. The working range of Θ is dependent on the expected maximum skew angle of a form to be detected. The range of Θ defined above assumes a maximum of 30° for the skew angle of the form to be detected. Document Skew Correction After the skew angle has been determined, the document skew correction on a vectorized document uses geometrical transformation that both translates and rotates vertices of each polygon by the matrix operation shown in the following equation:
Figure imgf000014_0001
wherein t. = X _.L(1-cosθ5) + Y, sin Θ. and t2_ = Y1, (1-cosΘS) - X1. sin ΘβS
The computational sequence is illustrated in Figures 6A-D in which the vertice of each of the polygons is rotated about the center point of the document p. The transformation process comprises a three step sequence.
The first step performs the translation of the vertice of each of the polygons so that the center point p is at the origin as shown in Figure
6B. The second step rotates the vertices of polygons with a degree of Θ . The result of such a rotation is shown in Figure 6C. Figure 6D is the result of the third step that translates such that the point at the origin returns to the center of the document. The (Xd, Yd,) and (Xo, Yo) are coordinates of vertice of a polygon before and after transformation, respectively. The determined skew angle is Θ and the (X, Y. ) is the coordinate of the center point P. Flow Chart
A paper document is scanned and digitized in step 10 to convert the document into a digital image. In step 12, a thresholding operation is applied to each pixel of the digitized image.
This produces a binary (bitmap) image. In step 14, an object contour following operation is used to extract edge pixels of objects (i.e. outlines of objects) in the bitmap image. A linear approximation operation is applied next in step 16 for merging colinear contour pixels into straight segments. This results in a collection of polygons. Each polygon represents either an inner or outer contour of an object. In step 18, polygon bounding is performed which calculates the size (width and height) of a bounding polygon by subtracting the extreme coordinates of that polygon in both the horizontal and vertical directions. In step 20, a predetermined size threshold value is applied to the collection of polygons. This results in two sets of polygons. The ones collected in step 22 are a collection of small polygons. The collection established in step 24 are associated with large graphic contours. Next, a modified Hough transform is applied to the polygon vectors in step 26. Use of this transform results in mapping the center coordinates of polygon vectors into a two dimensional accumulator array in a parametric domain which is easier for straight line detection. In the next step 28, the accumulator array is scanned to locate the highest peak in the parametric array. Step 30, the final step, the angular value of the highest peaks are read and that angular value is defined as the skew angle of the form document. Advantages and Industrial Applicability
The present invention is useful in computer based systems that provide automated analysis and interpretation of paper-based documents. The present invention uses the geometrical spatial relationship of the contours and a modified Hough transform for the fast detection of any skew angle of a digitized form image and the skew correction is then performed. The present invention has advantages over previously applied bitmap techniques in accuracy, robustness, efficiency of data structure, storage and document analysis. Accordingly, the present invention is more appropriate in determining the skew angle of forms in the preprocessing steps of document analysis and classification.

Claims

WHAT IS CLAIMED IS:
1. A method for reliably determining the skew angle of a digitized form document image characterized by the steps of: thresholding the digitized image to produce a binary image; use contour vectorization to convert the binary image into a collection of closed polygons formed by a series of vectors; calculate the width and height of an encompassing rectangle for each polygon; establish a threshold using a either the height or width of the rectangle to separate all the polygons into two categories with a first category containing small polygons having either heights or widths smaller than a predetermined value and the second category of larger polygons wherein either the height or width is greater than a predetermined second value; transforming the vector data for each polygon in the second category of polygons into the parametric domain employing a modified Hough transform resulting in a histogram of (p, Θ) for each polygon; detecting peaks in the histogram in the parametric domain using a predetermined third threshold; determining the angular value Θ of the peak values p above said third threshold; determining if any angular values Θ, have additional corresponding peaks at Θ. + 180° and if so determined, average all values of Θ. so determined and the average value Θ, will be defined as the skew angle of the form document.
2. The method of determining skew angle as set forth in Claim 1 wherein said contour vectorization is comprised of the steps of contour pixel tracing and piecewise linear approximation.
3. The method of determining skew angle as set forth in Claim 2 wherein the center ccordinates of the rectangle that encompasses the polygon is used as the origin of the Cartisean coordinates.
4. The method of determining skew angle as set forth in Claim 3 wherein said modified Hough transform includes the steps of: a) determining the center coordinates of each vector in a given polygon from the second category of large polygons; b) calculating the length and angle (Θ) of each vector; c) using the center coordinates and the angle of each vector in each polygon to determine the value of p in parametric space, using p - x cos Θ + y sin Θ and plot the histogram of (p, Θ) using vector length for each vector in said polygon; d) include on said plot of the histogram all vector lengths for all vectors in a polygon in a cumulative fashion; e) detecting the peaks that exist at
Θ, in the parametric domain and determine if corresponding peaks exist at Θ, + 180 and f) repeat the steps a-e until all the large size polygons in the second cetegory have been processed.
5. The method of determining skew angle as set forth in Claim 4 wherein p is the length of the line from the origin normal to the vector in the Cartisean coordinate system.
6. A polygon based method for determining the skew angle of a digitized form document image comprising the steps of: a) thresholding the digitized image to produce a binary image; b) use contour vectorization to convert the binary image into a collection of closed polygons formed by a series of vectors; c) calculate the width and height for an encompassing rectangle that encompasses the polygon; d) establish a threshold to separate all the polygons into two categories with the first category including smaller polygons that have either the height or width of an encompassing rectangle less than a predetermined value and the second category containing larger polygons haivng either the height or width of an encompassing rectangle larger than a predetermined value; e) determining the center coordinate of each vector in each polygon from the second category of larger polygons; f) determine the length and angle of each vector; g) using said center coordinates and said angle of each vector in each polygon to determine the value of p in parametric space using p = X cos Θ + Y sin Θ and plot the histogram of (p, Θ) using the vector length for each vector in said polygon; h) include on the plot of the histogram all vector lengths for all vectors in each polygon in a cumulative fashion; i) detect the peaks that exist in the parametric domain and determine a corresponding value for Θ; j) repeat the steps e-i until all the larger size polygons having heights or widths of an encompassing rectangle greater than said predetermined value have been processed; k) determine the angular values of the peaks above a predetermined threshold; and
1) determine if any of the specifically determined angular values Θ, have corresponding at Θ.. + 180° and if so determined define Θ, as the skew angle of the form document.
PCT/US1991/003102 1990-05-21 1991-05-09 A method of detecting skew in form images WO1991018366A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US07/526,426 US5054098A (en) 1990-05-21 1990-05-21 Method of detecting the skew angle of a printed business form
US526,426 1990-05-21

Publications (1)

Publication Number Publication Date
WO1991018366A1 true WO1991018366A1 (en) 1991-11-28

Family

ID=24097289

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1991/003102 WO1991018366A1 (en) 1990-05-21 1991-05-09 A method of detecting skew in form images

Country Status (4)

Country Link
US (1) US5054098A (en)
EP (1) EP0482188A1 (en)
JP (1) JPH05500285A (en)
WO (1) WO1991018366A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2136318A3 (en) * 2008-06-19 2010-03-03 Canon Kabushiki Kaisha Image processing apparatus and image processing method
EP2051208A3 (en) * 2007-10-17 2017-05-24 Sony Interactive Entertainment Inc. Generating an asset for interactive entertainment using digital image capture

Families Citing this family (52)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0437964A (en) * 1990-06-01 1992-02-07 Eastman Kodak Japan Kk Document reading device
US5325447A (en) * 1991-10-02 1994-06-28 Environmental Research Institute Of Michigan Handwritten digit normalization method
US5276742A (en) * 1991-11-19 1994-01-04 Xerox Corporation Rapid detection of page orientation
US6002793A (en) * 1992-01-30 1999-12-14 Cognex Corporation Machine vision method and apparatus for finding an object orientation angle of a rectilinear object
US5452374A (en) * 1992-04-06 1995-09-19 Ricoh Corporation Skew detection and correction of a document image representation
US5576948A (en) * 1992-07-28 1996-11-19 Robotic Vision Systems, Inc. Machine vision for adaptive laser beam steering
US5594817A (en) * 1992-10-19 1997-01-14 Fast; Bruce B. OCR image pre-processor for detecting and reducing skew of the image of textual matter of a scanned document
JP3249605B2 (en) * 1992-11-25 2002-01-21 イーストマン・コダックジャパン株式会社 Document edge detection device
US5583956A (en) * 1993-01-12 1996-12-10 The Board Of Trustees Of The Leland Stanford Junior University Estimation of skew angle in text image
JPH07111739B2 (en) * 1993-03-19 1995-11-29 株式会社エイ・ティ・アール視聴覚機構研究所 Image processing device
US5483606A (en) * 1994-01-03 1996-01-09 Xerox Corporation Method for automatically registering a document having a plurality of pages
DE69516751T2 (en) * 1994-04-15 2000-10-05 Canon Kk Image preprocessing for character recognition system
US5517587A (en) 1994-09-23 1996-05-14 International Business Machines Corporation Positioning method and apparatus for line scanned images
DE69600461T2 (en) * 1995-01-17 1999-03-11 Eastman Kodak Co System and method for evaluating the illustration of a form
US5638466A (en) * 1995-08-30 1997-06-10 Horizon Marketing Corporation Aka Wordwand Method and apparatus for deskewing images of symbols having a non-linear baseline
DE19700318A1 (en) * 1997-01-08 1998-07-09 Heidelberger Druckmasch Ag Method for determining the geometry data of scanning templates
DE19700352A1 (en) * 1997-01-08 1998-07-09 Heidelberger Druckmasch Ag Procedure for determining the geometry data of the relevant image section
JP3580670B2 (en) * 1997-06-10 2004-10-27 富士通株式会社 Method for associating input image with reference image, apparatus therefor, and storage medium storing program for implementing the method
US6282326B1 (en) * 1998-12-14 2001-08-28 Eastman Kodak Company Artifact removal technique for skew corrected images
JP2001184956A (en) * 1999-12-28 2001-07-06 Sumitomo Electric Ind Ltd Method of fabricating superconducting wire rod
US6807286B1 (en) * 2000-04-13 2004-10-19 Microsoft Corporation Object recognition using binary image quantization and hough kernels
FR2810765B1 (en) * 2000-06-27 2002-08-23 Mannesmann Dematic Postal Automation Sa SEGMENTATION OF A DIGITAL IMAGE OF A POSTAL OBJECT BY HOUGH TRANSFORMATION
EP1182604A1 (en) * 2000-08-22 2002-02-27 Setrix AG Method and apparatus for reading a bar code
US7627145B2 (en) * 2000-09-06 2009-12-01 Hitachi, Ltd. Personal identification device and method
JP3558025B2 (en) * 2000-09-06 2004-08-25 株式会社日立製作所 Personal authentication device and method
JP4219542B2 (en) * 2000-09-07 2009-02-04 富士ゼロックス株式会社 Image processing apparatus, image processing method, and recording medium storing image processing program
US7554698B2 (en) * 2000-09-15 2009-06-30 Sharp Laboratories Of America, Inc. Robust document boundary determination
US6826311B2 (en) * 2001-01-04 2004-11-30 Microsoft Corporation Hough transform supporting methods and arrangements
US6490421B2 (en) 2001-02-12 2002-12-03 Hewlett-Packard Company Methods and apparatus for correcting rotational skew in duplex images
JP4164272B2 (en) * 2001-04-24 2008-10-15 キヤノン株式会社 Image processing apparatus and image processing method
US7375731B2 (en) * 2002-11-01 2008-05-20 Mitsubishi Electric Research Laboratories, Inc. Video mining using unsupervised clustering of video content
JP3903932B2 (en) * 2003-03-06 2007-04-11 セイコーエプソン株式会社 Image reading control device and program
US7305612B2 (en) * 2003-03-31 2007-12-04 Siemens Corporate Research, Inc. Systems and methods for automatic form segmentation for raster-based passive electronic documents
DE10337831A1 (en) * 2003-08-18 2005-03-24 Sick Ag Method for the optical recognition of alphanumeric characters
JP4574235B2 (en) * 2004-06-04 2010-11-04 キヤノン株式会社 Image processing apparatus, control method therefor, and program
US7391930B2 (en) * 2004-12-17 2008-06-24 Primax Electronics Ltd. Angle de-skew device and method thereof
US8249391B2 (en) * 2007-08-24 2012-08-21 Ancestry.com Operations, Inc. User interface method for skew correction
JP5089308B2 (en) * 2007-09-20 2012-12-05 キヤノン株式会社 Image processing device
TW200928999A (en) * 2007-12-28 2009-07-01 Altek Corp Automatic validation method of business card imaging angle
JP4525787B2 (en) * 2008-04-09 2010-08-18 富士ゼロックス株式会社 Image extraction apparatus and image extraction program
US8023770B2 (en) 2008-05-23 2011-09-20 Sharp Laboratories Of America, Inc. Methods and systems for identifying the orientation of a digital image
US8023741B2 (en) * 2008-05-23 2011-09-20 Sharp Laboratories Of America, Inc. Methods and systems for detecting numerals in a digital image
US8370759B2 (en) 2008-09-29 2013-02-05 Ancestry.com Operations Inc Visualizing, creating and editing blending modes methods and systems
KR101657162B1 (en) * 2009-10-14 2016-09-19 삼성전자주식회사 Image forming apparatus and method for deskewing thereof
CN102196112B (en) * 2010-03-01 2014-09-24 佳能株式会社 Page border detection method and device
US8831350B2 (en) 2011-08-29 2014-09-09 Dst Technologies, Inc. Generation of document fingerprints for identification of electronic document types
EP2752817A4 (en) * 2011-08-30 2016-11-09 Megachips Corp Device for detecting line segment and arc
WO2013094154A1 (en) * 2011-12-21 2013-06-27 パナソニック株式会社 Image processing device and image processing method
US9111140B2 (en) 2012-01-10 2015-08-18 Dst Technologies, Inc. Identification and separation of form and feature elements from handwritten and other user supplied elements
JP6354298B2 (en) * 2014-04-30 2018-07-11 株式会社リコー Image processing apparatus, image reading apparatus, image processing method, and image processing program
CN104835120B (en) * 2015-04-23 2017-07-28 天津大学 A kind of written flattening method of bending based on datum line
US9411547B1 (en) * 2015-07-28 2016-08-09 Dst Technologies, Inc. Compensation for print shift in standardized forms to facilitate extraction of data therefrom

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0131676A2 (en) * 1983-07-04 1985-01-23 URW Software & Type GmbH Method for automatically digitizing the contours of line graphics, e.g. characters

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4759075A (en) * 1983-03-14 1988-07-19 Ana Tech Corporation Method and apparatus for vectorizing documents and symbol recognition
US4620288A (en) * 1983-10-26 1986-10-28 American Semiconductor Equipment Technologies Data handling system for a pattern generator
US4689824A (en) * 1983-12-30 1987-08-25 International Business Machines Corporation Image rotation method
EP0184547B1 (en) * 1984-12-07 1991-11-21 Dainippon Screen Mfg. Co., Ltd. Processing method of image data and system therefor
JPS61199175A (en) * 1985-02-28 1986-09-03 Mitsubishi Electric Corp System for rotating image by optional angle
US4866784A (en) * 1987-12-02 1989-09-12 Eastman Kodak Company Skew detector for digital image processing system

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0131676A2 (en) * 1983-07-04 1985-01-23 URW Software & Type GmbH Method for automatically digitizing the contours of line graphics, e.g. characters

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
PATENT ABSTRACTS OF JAPAN vol. 013, no. 197 (P-868)May 11, 1989 & JP-A-10 18 889 (NEC CORPORATION ) January 23, 1989 see the whole document *
PATENT ABSTRACTS OF JAPAN vol. 12, no. 399 (P-775)October 24, 1988 & JP-A-63 139 207 (MATSUSHITA ELECTRIC IND CO LTD ) June 11, 1988 see the whole document *
PROCEEDINGS OF SPSE SYMPOSIUM ON HYBRID IMAGING SYSTEMS 1987, pages 21 - 24; HENRY S. BAIRD: 'The skew angle of printed documents ' *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2051208A3 (en) * 2007-10-17 2017-05-24 Sony Interactive Entertainment Inc. Generating an asset for interactive entertainment using digital image capture
EP2136318A3 (en) * 2008-06-19 2010-03-03 Canon Kabushiki Kaisha Image processing apparatus and image processing method
US8326085B2 (en) 2008-06-19 2012-12-04 Canon Kabushiki Kaisha Image processing apparatus and image processing method

Also Published As

Publication number Publication date
US5054098A (en) 1991-10-01
EP0482188A1 (en) 1992-04-29
JPH05500285A (en) 1993-01-21

Similar Documents

Publication Publication Date Title
US5054098A (en) Method of detecting the skew angle of a printed business form
US6873732B2 (en) Method and apparatus for resolving perspective distortion in a document image and for calculating line sums in images
Janssen et al. Adaptive vectorization of line drawing images
Liang et al. Flattening curved documents in images
Gatos et al. Segmentation based recovery of arbitrarily warped document images
Cao et al. Rectifying the bound document image captured by the camera: A model based approach
Kumar et al. Modified approach of hough transform for skew detection and correction in documented images
JPH11504738A (en) Rotate run-length coded images
JP4859061B2 (en) Image correction method, correction program, and image distortion correction apparatus
Chin et al. Skew detection in handwritten scripts
Sarfraz et al. Skew estimation and correction of text using bounding box
Kapoor et al. Skew angle detectionof a cursive handwritten Devanagari script character image.
CN116343215A (en) Inclination correction method and system for document image
Safari et al. Document registration using projective geometry
Salarna et al. Moment invariants and quantization effects
Makkar et al. A brief tour to various skew detection and correction techniques
Schneider et al. Robust document warping with interpolated vector fields
Salagar et al. Application of RLSA for skew detection and correction in Kannada text images
EP0485051A2 (en) Detecting skew in digitised images
Zhang et al. Research on deskew algorithm of scanned image
Amin et al. Fast algorithm for skew detection
JPH06203202A (en) Image processor
Rodríguez-Piñeiro et al. A new method for perspective correction of document images
EP0702320A1 (en) Skew detection
Lai et al. Effective edge-corner detection method for defected images

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): JP

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE DK ES FR GB GR IT LU NL SE

WWE Wipo information: entry into national phase

Ref document number: 1991911202

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 1991911202

Country of ref document: EP

WWW Wipo information: withdrawn in national office

Ref document number: 1991911202

Country of ref document: EP