CA2077274A1 - Method and apparatus for summarizing a document without document image decoding - Google Patents

Method and apparatus for summarizing a document without document image decoding

Info

Publication number
CA2077274A1
CA2077274A1 CA2077274A CA2077274A CA2077274A1 CA 2077274 A1 CA2077274 A1 CA 2077274A1 CA 2077274 A CA2077274 A CA 2077274A CA 2077274 A CA2077274 A CA 2077274A CA 2077274 A1 CA2077274 A1 CA 2077274A1
Authority
CA
Canada
Prior art keywords
document
document image
summarizing
image decoding
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CA2077274A
Other languages
French (fr)
Other versions
CA2077274C (en
Inventor
M. Margaret Withgott
Steven C. Bagley
Dan S. Bloomberg
Per-Kristian Halvorsen
Daniel P. Huttenlocher
Todd A. Cass
Ronald M. Kaplan
Ramana B. Rao
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xerox Corp
Original Assignee
Xerox Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xerox Corp filed Critical Xerox Corp
Publication of CA2077274A1 publication Critical patent/CA2077274A1/en
Application granted granted Critical
Publication of CA2077274C publication Critical patent/CA2077274C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/26Techniques for post-processing, e.g. correcting the recognition result
    • G06V30/262Techniques for post-processing, e.g. correcting the recognition result using context analysis, e.g. lexical, syntactic or semantic context
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Abstract

A method and apparatus for excerpting and summarizing an undecoded document image, without first converting the document image to optical character codes such as ASCII text, identifies significant words, phrases and graphics in the document image using automatic or interactive morphological image recognition techniques, document summaries or indices are produced based on the identified significant portions of the document image. The disclosed method is particularly adept for improvement of reading machines for the blind.
CA002077274A 1991-11-19 1992-09-01 Method and apparatus for summarizing a document without document image decoding Expired - Fee Related CA2077274C (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US79454391A 1991-11-19 1991-11-19
US794,543 1991-11-19

Publications (2)

Publication Number Publication Date
CA2077274A1 true CA2077274A1 (en) 1993-05-20
CA2077274C CA2077274C (en) 1997-07-15

Family

ID=25162943

Family Applications (1)

Application Number Title Priority Date Filing Date
CA002077274A Expired - Fee Related CA2077274C (en) 1991-11-19 1992-09-01 Method and apparatus for summarizing a document without document image decoding

Country Status (5)

Country Link
US (1) US5491760A (en)
EP (1) EP0544432B1 (en)
JP (1) JP3292388B2 (en)
CA (1) CA2077274C (en)
DE (1) DE69229537T2 (en)

Families Citing this family (67)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5590317A (en) * 1992-05-27 1996-12-31 Hitachi, Ltd. Document information compression and retrieval system and document information registration and retrieval method
US5701500A (en) * 1992-06-02 1997-12-23 Fuji Xerox Co., Ltd. Document processor
DE69329218T2 (en) * 1992-06-19 2001-04-05 United Parcel Service Inc Method and device for input classification with a neural network
US5850490A (en) * 1993-12-22 1998-12-15 Xerox Corporation Analyzing an image of a document using alternative positionings of a class of segments
DE69519323T2 (en) * 1994-04-15 2001-04-12 Canon Kk System for page segmentation and character recognition
EP0702322B1 (en) * 1994-09-12 2002-02-13 Adobe Systems Inc. Method and apparatus for identifying words described in a portable electronic document
CA2154952A1 (en) * 1994-09-12 1996-03-13 Robert M. Ayers Method and apparatus for identifying words described in a page description language file
IL113204A (en) * 1995-03-30 1999-03-12 Advanced Recognition Tech Pattern recognition system
US5689716A (en) * 1995-04-14 1997-11-18 Xerox Corporation Automatic method of generating thematic summaries
US5918240A (en) * 1995-06-28 1999-06-29 Xerox Corporation Automatic method of extracting summarization using feature probabilities
US5778397A (en) * 1995-06-28 1998-07-07 Xerox Corporation Automatic method of generating feature probabilities for automatic extracting summarization
US6078915A (en) * 1995-11-22 2000-06-20 Fujitsu Limited Information processing system
US5892842A (en) * 1995-12-14 1999-04-06 Xerox Corporation Automatic method of identifying sentence boundaries in a document image
US5848191A (en) * 1995-12-14 1998-12-08 Xerox Corporation Automatic method of generating thematic summaries from a document image without performing character recognition
US5850476A (en) * 1995-12-14 1998-12-15 Xerox Corporation Automatic method of identifying drop words in a document image without performing character recognition
US7051024B2 (en) * 1999-04-08 2006-05-23 Microsoft Corporation Document summarizer for word processors
JP3530308B2 (en) 1996-05-27 2004-05-24 富士通株式会社 Broadcast program transmission device and terminal device connected thereto
JPH09322089A (en) 1996-05-27 1997-12-12 Fujitsu Ltd Broadcasting program transmitter, information transmitter, device provided with document preparation function and terminal equipment
JP3875310B2 (en) * 1996-05-27 2007-01-31 富士通株式会社 Broadcast program information transmitter
US5956468A (en) * 1996-07-12 1999-09-21 Seiko Epson Corporation Document segmentation system
GB9808712D0 (en) 1997-11-05 1998-06-24 British Aerospace Automatic target recognition apparatus and process
US6562077B2 (en) 1997-11-14 2003-05-13 Xerox Corporation Sorting image segments into clusters based on a distance measurement
US6665841B1 (en) 1997-11-14 2003-12-16 Xerox Corporation Transmission of subsets of layout objects at different resolutions
US5999664A (en) * 1997-11-14 1999-12-07 Xerox Corporation System for searching a corpus of document images by user specified document layout components
US6533822B2 (en) * 1998-01-30 2003-03-18 Xerox Corporation Creating summaries along with indicators, and automatically positioned tabs
JPH11306197A (en) * 1998-04-24 1999-11-05 Canon Inc Processor and method for image processing, and computer-readable memory
US6317708B1 (en) 1999-01-07 2001-11-13 Justsystem Corporation Method for producing summaries of text document
US6337924B1 (en) * 1999-02-26 2002-01-08 Hewlett-Packard Company System and method for accurately recognizing text font in a document processing system
US7475334B1 (en) 2000-01-19 2009-01-06 Alcatel-Lucent Usa Inc. Method and system for abstracting electronic documents
EP1128278B1 (en) * 2000-02-23 2003-09-17 SER Solutions, Inc Method and apparatus for processing electronic documents
US6581057B1 (en) 2000-05-09 2003-06-17 Justsystem Corporation Method and apparatus for rapidly producing document summaries and document browsing aids
US6941513B2 (en) 2000-06-15 2005-09-06 Cognisphere, Inc. System and method for text structuring and text generation
US7302637B1 (en) 2000-07-24 2007-11-27 Research In Motion Limited System and method for abbreviating information sent to a viewing device
US7386790B2 (en) * 2000-09-12 2008-06-10 Canon Kabushiki Kaisha Image processing apparatus, server apparatus, image processing method and memory medium
US7221810B2 (en) * 2000-11-13 2007-05-22 Anoto Group Ab Method and device for recording of information
WO2002099739A1 (en) * 2001-06-05 2002-12-12 Matrox Electronic Systems Ltd. Model-based recognition of objects using a calibrated image system
US6708894B2 (en) 2001-06-26 2004-03-23 Xerox Corporation Method for invisible embedded data using yellow glyphs
US20040034832A1 (en) * 2001-10-19 2004-02-19 Xerox Corporation Method and apparatus for foward annotating documents
US7712028B2 (en) * 2001-10-19 2010-05-04 Xerox Corporation Using annotations for summarizing a document image and itemizing the summary based on similar annotations
JP2003196270A (en) * 2001-12-27 2003-07-11 Sharp Corp Document information processing method, document information processor, communication system, computer program and recording medium
US7139004B2 (en) * 2002-01-25 2006-11-21 Xerox Corporation Method and apparatus to convert bitmapped images for use in a structured text/graphics editor
US7136082B2 (en) * 2002-01-25 2006-11-14 Xerox Corporation Method and apparatus to convert digital ink images for use in a structured text/graphics editor
US7590932B2 (en) 2002-03-16 2009-09-15 Siemens Medical Solutions Usa, Inc. Electronic healthcare management form creation
US7734627B1 (en) * 2003-06-17 2010-06-08 Google Inc. Document similarity detection
WO2005043415A1 (en) * 2003-10-29 2005-05-12 Trainum Michael W System and method for managing documents
US20080311551A1 (en) * 2005-08-23 2008-12-18 Mazer Corporation, The Testing Scoring System and Method
US7454063B1 (en) * 2005-09-22 2008-11-18 The United States Of America As Represented By The Director National Security Agency Method of optical character recognition using feature recognition and baseline estimation
JP2007304864A (en) * 2006-05-11 2007-11-22 Fuji Xerox Co Ltd Character recognition processing system and program
US7706613B2 (en) * 2007-08-23 2010-04-27 Kaspersky Lab, Zao System and method for identifying text-based SPAM in rasterized images
US7711192B1 (en) * 2007-08-23 2010-05-04 Kaspersky Lab, Zao System and method for identifying text-based SPAM in images using grey-scale transformation
JP5132416B2 (en) * 2008-05-08 2013-01-30 キヤノン株式会社 Image processing apparatus and control method thereof
US8233722B2 (en) * 2008-06-27 2012-07-31 Palo Alto Research Center Incorporated Method and system for finding a document image in a document collection using localized two-dimensional visual fingerprints
US8233716B2 (en) * 2008-06-27 2012-07-31 Palo Alto Research Center Incorporated System and method for finding stable keypoints in a picture image using localized scale space properties
US8144947B2 (en) * 2008-06-27 2012-03-27 Palo Alto Research Center Incorporated System and method for finding a picture image in an image collection using localized two-dimensional visual fingerprints
EP2449531B1 (en) * 2009-07-02 2017-12-20 Hewlett-Packard Development Company, L.P. Skew detection
US8548193B2 (en) * 2009-09-03 2013-10-01 Palo Alto Research Center Incorporated Method and apparatus for navigating an electronic magnifier over a target document
US9003531B2 (en) * 2009-10-01 2015-04-07 Kaspersky Lab Zao Comprehensive password management arrangment facilitating security
US8086039B2 (en) * 2010-02-05 2011-12-27 Palo Alto Research Center Incorporated Fine-grained visual document fingerprinting for accurate document comparison and retrieval
US9514103B2 (en) * 2010-02-05 2016-12-06 Palo Alto Research Center Incorporated Effective system and method for visual document comparison using localized two-dimensional visual fingerprints
EP2383970B1 (en) 2010-04-30 2013-07-10 beyo GmbH Camera based method for text input and keyword detection
EP2593920A4 (en) 2010-07-12 2016-05-04 Google Inc System and method of determining building numbers
US8750624B2 (en) 2010-10-19 2014-06-10 Doron Kletter Detection of duplicate document content using two-dimensional visual fingerprinting
US8554021B2 (en) 2010-10-19 2013-10-08 Palo Alto Research Center Incorporated Finding similar content in a mixed collection of presentation and rich document content using two-dimensional visual fingerprints
US9058352B2 (en) 2011-09-22 2015-06-16 Cerner Innovation, Inc. System for dynamically and quickly generating a report and request for quotation
JP5884560B2 (en) * 2012-03-05 2016-03-15 オムロン株式会社 Image processing method for character recognition, and character recognition device and program using this method
EP2637128B1 (en) 2012-03-06 2018-01-17 beyo GmbH Multimodal text input by a keyboard/camera text input module replacing a conventional keyboard text input module on a mobile device
US11176364B2 (en) * 2019-03-19 2021-11-16 Hyland Software, Inc. Computing system for extraction of textual elements from a document

Family Cites Families (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3659354A (en) * 1970-10-21 1972-05-02 Mitre Corp Braille display device
FR2453451B1 (en) * 1979-04-04 1985-11-08 Lopez Krahe Jaime READING MACHINE FOR THE BLIND
US4685135A (en) * 1981-03-05 1987-08-04 Texas Instruments Incorporated Text-to-speech synthesis system
JPS57199066A (en) * 1981-06-02 1982-12-06 Toshiyuki Sakai File forming system for cutting of newspaper and magazine
JPS5998283A (en) * 1982-11-27 1984-06-06 Hitachi Ltd Pattern segmenting and recognizing system
JPS59135576A (en) * 1983-01-21 1984-08-03 Nippon Telegr & Teleph Corp <Ntt> Registering and retrieving device of document information
JPS60114967A (en) * 1983-11-28 1985-06-21 Hitachi Ltd Picture file device
JPH07120355B2 (en) * 1986-09-26 1995-12-20 株式会社日立製作所 Image information memory retrieval method
US4972349A (en) * 1986-12-04 1990-11-20 Kleinberger Paul J Information retrieval system and method
JPS63223964A (en) * 1987-03-13 1988-09-19 Canon Inc Retrieving device
US4752772A (en) * 1987-03-30 1988-06-21 Digital Equipment Corporation Key-embedded Braille display system
US4994987A (en) * 1987-11-20 1991-02-19 Minnesota Mining And Manufacturing Company Image access system providing easier access to images
JPH01150973A (en) * 1987-12-08 1989-06-13 Fuji Photo Film Co Ltd Method and device for recording and retrieving picture information
JP2783558B2 (en) * 1988-09-30 1998-08-06 株式会社東芝 Summary generation method and summary generation device
JPH0371380A (en) * 1989-08-11 1991-03-27 Seiko Epson Corp Character recognizing device
JPH03218569A (en) * 1989-11-28 1991-09-26 Oki Electric Ind Co Ltd Index extraction device
US5048109A (en) * 1989-12-08 1991-09-10 Xerox Corporation Detection of highlighted regions
US5202933A (en) * 1989-12-08 1993-04-13 Xerox Corporation Segmentation of text and graphics
US5131049A (en) * 1989-12-08 1992-07-14 Xerox Corporation Identification, characterization, and segmentation of halftone or stippled regions of binary images by growing a seed to a clipping mask
US5181255A (en) * 1990-12-13 1993-01-19 Xerox Corporation Segmentation of handwriting and machine printed text
US5216725A (en) * 1990-10-31 1993-06-01 Environmental Research Institute Of Michigan Apparatus and method for separating handwritten characters by line and word
US5384863A (en) * 1991-11-19 1995-01-24 Xerox Corporation Methods and apparatus for automatic modification of semantically significant portions of a document without document image decoding
CA2077604C (en) * 1991-11-19 1999-07-06 Todd A. Cass Method and apparatus for determining the frequency of words in a document without document image decoding

Also Published As

Publication number Publication date
DE69229537D1 (en) 1999-08-12
EP0544432B1 (en) 1999-07-07
JPH05242142A (en) 1993-09-21
EP0544432A3 (en) 1993-12-22
EP0544432A2 (en) 1993-06-02
CA2077274C (en) 1997-07-15
DE69229537T2 (en) 1999-11-25
US5491760A (en) 1996-02-13
JP3292388B2 (en) 2002-06-17

Similar Documents

Publication Publication Date Title
CA2077274A1 (en) Method and apparatus for summarizing a document without document image decoding
CA2077565A1 (en) Methods and apparatus for automatic modification of semantically significant portions of a document without document image decoding
CA2077604A1 (en) Method and apparatus for determining the frequency of words in a document without document image decoding
CA2057243A1 (en) Segmentation of handwriting and machine printed text
EP0434429A3 (en) Image processing apparatus
CA2080966A1 (en) Method and apparatus for converting bitmap image documents to editable coded data using a standard notation to record document recognition ambiguities
AU5424498A (en) Automatic language identification system for multilingual optical character recognition
AU1474588A (en) Printed optical code reader and format
CA2033411A1 (en) Document revising system for use with document reading and translating system
AU3152897A (en) High speed image acquisition system and method
EP0302663A3 (en) Low cost speech recognition system and method
JPH07121664A (en) Automatic decision apparatus of european language
JPS56129981A (en) Optical character reader
ZA917532B (en) Character recognition methods including locating and extracting predetermined data from a document
EP1063606A3 (en) Automatic recognition of characters on a structured background by combining backgroundmodels and characters
EP0114515A3 (en) Method and apparatus for colour recognition
EP0239061A3 (en) Optical character reader apparatus and optical character reading method
EP0392460A3 (en) Relief image scanner
EP0350930A3 (en) Color image processing apparatus
WO1991008553A3 (en) Character normalization using an elliptical sampling window for optical character recognition
JPS59206985A (en) Mechanical translating system
CN204856534U (en) System of looking that helps is read to low eyesight based on OCR and TTS
EP0392781A3 (en) Image processing apparatus
JPS59158477A (en) Optical character reader
ANTONACOPOULOS Automatic reading of Braille documents

Legal Events

Date Code Title Description
EEER Examination request
MKLA Lapsed