Search Images Maps Play YouTube News Gmail Drive More »
Advanced Patent Search | Page images | Web History | Sign in

Patents

  

US 20040042667A1

(19) United States

(12) Patent Application Publication (io) Pub. No.: US 2004/0042667 Al

Lee et al. (43) Pub. Date: Mar. 4,2004

(54) EXTRACTING INFORMATION FROM SYMBOLICALLY COMPRESSED DOCUMENT IMAGES

(76) Inventors: Dar-Shyang Lee, Union City, CA (US);

Jonathan J. Hull, San Carlos, CA (US)

Correspondence Address:

BLAKELY, SOKOLOFF, TAYLOR & ZAFMAN
LLP

Seventh Floor

12400 Wilshire Boulevard

Los Angeles, CA 90025 (US)

(21) Appl. No.: 10/676,881

(22) Filed: Sep. 30, 2003

Related U.S. Application Data

(62) Division ol application No. 09/289,772, filed on Apr. 8, 1999, now Pat. No. 6,658,151.

Publication Classification

(51) Int. CI.7 G06K 9/68

(52) U.S. CI 382/230; 382/218; 382/243

(57) ABSTRACT

A method and apparatus for extracting information from symbolically compressed document images. A deciphering module generates first and second text strings by deciphering respective sequences of template identifiers in first and second symbolically compressed document images. A conditional n-gram module receives the first and second text strings from the deciphering module and extracts n-gram terms therefrom based on a predicate condition. A comparison module generates a measure of similarity between the first and second symbolically compressed document images based on the n-gram terms extracted by the conditional n-gram module.

[blocks in formation]
[graphic][merged small][merged small][merged small][merged small][graphic][graphic][merged small][merged small][merged small][merged small][merged small][merged small][graphic][merged small][merged small]
[graphic]
[graphic]
« PreviousContinue »