CN101297318B - Data organization and access for mixed media document system - Google Patents

Data organization and access for mixed media document system Download PDF

Info

Publication number
CN101297318B
CN101297318B CN200680039477.4A CN200680039477A CN101297318B CN 101297318 B CN101297318 B CN 101297318B CN 200680039477 A CN200680039477 A CN 200680039477A CN 101297318 B CN101297318 B CN 101297318B
Authority
CN
China
Prior art keywords
document
mmr
fragment
word
page
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN200680039477.4A
Other languages
Chinese (zh)
Other versions
CN101297318A (en
Inventor
乔纳森·J·赫尔
李达祥
库尔特·皮索尔
彼得·E·哈特
杰米·格雷厄姆
伯纳·埃罗尔
丹尼尔·G·V·奥尔斯特
陆霄晔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ricoh Co Ltd
Original Assignee
Ricoh Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US11/461,164 external-priority patent/US9405751B2/en
Priority claimed from US11/461,147 external-priority patent/US9171202B2/en
Application filed by Ricoh Co Ltd filed Critical Ricoh Co Ltd
Priority claimed from PCT/JP2006/316812 external-priority patent/WO2007023993A1/en
Publication of CN101297318A publication Critical patent/CN101297318A/en
Application granted granted Critical
Publication of CN101297318B publication Critical patent/CN101297318B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

A Mixed Media Reality (MMR) system and associated techniques are disclosed. The MMR system provides mechanisms for forming a mixed media document that includes media of at least two types (e.g., printed paper as a first medium and digital content and/or web link as a second medium). In one particular embodiment, the MMR system includes a content-based retrieval database configured with an index table to represent two-dimensional geometric relationships between objects extracted from a printed document in a way that allows look-up using a text-based index. A ranked set of document, page and location hypotheses can be computed given data from the index table. The techniques effectively transform features detected in an image patch into textual terms (or other searchable features) that represent both the features themselves and the geometric relationship between them. A storage facility can be used to store additional characteristics about each document image patch.

Description

Be used for Organization of Data and the access of mixed media document system
Technical field
The present invention relates to the technology for generation of the mixed media document that forms from two media type at least, and more specifically, relate to the real border (MMR) of the blending agent system that uses the print media of being combined with electronic media to produce mixed media document.
Background technology
Document print and reproduction technology were used in many environment many years.For example, in private and commercial office, have in the home environment of personal computer and in document print and publication service environment, all use printer and duplicating machine.Yet, do not thought before that printing with reproduction technology had been connected the means of the function served as bridge of the gap between connection static dump medium (that is, paper document) and mutual " virtual world " that comprises digital communication system, network, information supply, advertisement, amusement and ecommerce and so on.
Print media is as the communication information, for example news and advertising message, main source last several centuries.Several years in the past, by so that with electronically readable and the form utilization that can search for, and by introducing the interactive multimedia performance, personal computer and personal electronic device, for example personal digital assistant (PDA) device and cellular phone (as, camera cell phone) appearance and growing popularity have enlarged the concept of print media, and for traditional print media, it is beyond example.
Unfortunately, there is gap between the addressable virtual physical world based on the multimedia world and print media of electricity.For example, although everyone every day of the addressable print media and electronic information of developed world almost, but print media is connected the user and is not had the necessary tools and techniques of between the two connection of formation (that is, being used for convenient mixed media document) with personal electronic device.
In addition, traditional print media provides special favourable attribute, sense of touch for example, and required power not, and lasting tissue and storage, it does not provide in virtual or digital media.Similarly, traditional digital media also provides special favourable attribute, for example portable (as, in the storage of mobile phone or portable computer, carry) and be easy to the transmission (as, pass through Email).
Because these reasons, exist about making it possible to develop and that print and virtual medium needs of the technology of related benefit both.
Summary of the invention
At least one aspect of one or more embodiment of the present invention is provided for tissue and visit information, computer implemented method in mixed media document system.Described method comprises the electronic representation that generates paper document, is identified in the feature (it catches the two-dimentional looks of paper document) on the paper document, the position of recognition feature, and come to index for feature by its position separately, thus generating indexes table.Described method can comprise the preliminary step that receives paper document.Described method comprises at least one related one or more characteristic of storing with in the feature.In such situation, one or more characteristics comprise one or more actions, and described action comprise retrieval, the graphical information of text message retrieval, implementation, fill order, order, retrieve video, retrieval sound, storage information, create at least one that creates in new document, printed document and the display document.In another special situation, the feature (it catches two-dimentional looks) that is identified on the paper document comprises (no matter whether being contiguous) and (no matter whether being contiguous) object of homeotropic alignment that identification is horizontal.In another special situation, be identified in feature (it catches two-dimentional looks) on the paper document and comprise that the identification level and vertical word are to (word whole or part to).In another special situation, during scanning or print procedure, carry out the electronic representation that generates paper document.In another special situation, the feature (it catches the two-dimentional looks of paper document) that is identified on the paper document comprises by detecting the quantity of lap between two coherent sequences, vertical, returns combinator capable the sequence of text.Described method can comprise based on the data from concordance list, receive one or more query terms (it is captured in the two-dimentional relation between the object in the destination document), and calculate potentially at least one mixed media document and hypothesis on location in response to query term.Such situation, by the receiving target document, create the image of at least one fragment of destination document, and generate one or more image-based query terms, come prior to receiving one or more query terms (it is captured in the two-dimentional relation between the object in the destination document).In such situation, generate one or more image-based query terms and comprise and generate that extract from image, level and vertical word pair.In another special situation, calculate at least one mixed media document and hypothesis on location and comprise page fragment, storage the location of most probable being mated destination document, and calculating most probable in the described page is the position at the center of fragment.In such situation, each word pair is associated with reverse document frequency, and the page location at least one fragment, storage of most probable being mated destination document comprises for the right reverse document frequency of each word, adds to by word and comes the totalizer of indexing for it to presenting thereon document file page.In response to surpassing maximal value threshold value, in described totalizer, described method continues to export document file page as to the coupling of fragment, corresponding.In such situation, calculating in the described page, most probable be the position at the center of fragment comprise with weight add to each word to around the district in each unit (for example, can determine weight for each unit by the right reverse document frequency of word and product between the center in described unit and district, normalized geometric distance), and search has totalizer Accum array peaked totalizer, corresponding for the unit.In response to the maximal value that surpasses threshold value, described method further comprises report coordinate as the position of fragment, the unit.In another special situation, at least one mixed media document is calculated in report and hypothesis on location is included in each that finds in the concordance list in one or more query terms, retrieve the one or more positions related with each query term, and for the position of each identification, identify one or more candidate regions that comprise described position.In a described situation, calculate at least one mixed media document and hypothesis on location and comprise in whole the most consistent, one or more candidate regions in identification and the one or more query terms one.In response to that satisfied predetermined match-on criterion of determining in one or more candidate regions, described method continues to confirm that the zone is the coupling to destination document.
The other aspect of at least one of one or more embodiment of the present invention provide the machine readable media of encoding with instruction (as, one or more compact disks, floppy disk, server, memory stick, or hard disk drive, ROM, RAM, or be suitable for the medium store electrons instruction, any type), and when being carried out by one or more processors, it impels processor to carry out for the process at mixed media document system tissue and visit information.For example, described process can be to similar in the method for this description or be the variant of described method.
Other aspects of at least one of one or more embodiment of the present invention are provided for accessing the method for the information in mixed media document system.Described method comprises the data of the concordance list of indexing based on the file characteristics of the mixed media document of doing for oneself and feature locations, receive one or more query terms (it is captured in the two-dimentional relation between the object in the destination document), and calculate potentially in response to query term, at least one mixed media document and hypothesis on location.In a special situation, by the receiving target document, create the image of at least one fragment of destination document, and generate one or more query terms based on described image, come prior to receiving one or more query terms (it is captured in the two-dimentional relation between the object in the destination document).In such situation, generate one or more query terms based on described image and comprise and generate that extract from image, level and vertical word pair.In another described situation, calculate at least one mixed media document and hypothesis on location and comprise page fragment, that the store location of most probable being mated destination document, and calculating most probable among the described page is the position at the center of fragment.In another described situation, each word pair is associated with reverse document frequency, and the page at least one fragment, that the store location of most probable being mated destination document comprises for the right reverse document frequency of each word, adds to by word and comes the totalizer of indexing for it to presenting thereon document file page.In response to maximal value in described totalizer, that surpass threshold value, described method continues output as coupling, corresponding document file page to fragment.In another described situation, calculating among the described page, most probable be the position at the center of fragment comprise weight added to each word to around the district in each unit (for example, can determine weight for each unit by the right reverse document frequency of word and product between the center in described unit and district, normalized geometric distance), and search has Accum array peaked totalizer, corresponding for the unit.In response to the maximal value that surpasses threshold value, described method comprises that report is as the coordinate position, the unit of fragment.In another special situation, calculate at least one mixed media document and hypothesis on location and be included in each of searching in the concordance list in one or more query terms, retrieve the one or more positions that are associated with each query term, and for each position of identifying, identification comprises described position, one or more candidate regions.Calculate at least one mixed media document and hypothesis on location and may further include in whole the most consistent, one or more candidate regions in identification and the one or more query terms one, and in response to determining that in one or more candidate regions that satisfies predetermined match-on criterion, coupling to destination document is confirmed as in the zone.
An alternative embodiment of the invention provide the machine readable media of encoding with instruction (as, one or more compact disks, floppy disk, server, memory stick, or hard disk drive, ROM, RAM, or be suitable for the medium store electrons instruction, any type), and when being carried out by one or more processors, it impels processor to carry out for the process at the mixed media document system visit information.For example, described process can be similar to method described herein or be the variant of described method.
At least another aspect of one or more embodiment of the present invention with the machine readable media of instruction encoding (for example provides, one or more compact disks, disk, server, memory stick or hard disk drive, ROM, RAM or be suitable for the medium of any type of store electrons instruction), when being carried out by one or more processors, it impels the processor execution to be used for the process of the information of access mixed media document system.For example, this process can be with method as described herein similar or its variant.
Do not comprise allly in these described characteristics and advantage, and especially, consider to draw and describe, for those of ordinary skills, many other characteristics and advantage will be clearly.In addition, should be noted that the purpose that is mainly legibility and directiveness has been selected employed language in the instructions, and is not limited to the scope of inventive concept.
Description of drawings
Figure 1A illustrates according to one embodiment of present invention and the functional block diagram of the real border (MMR) of the blending agent that disposes system;
Figure 1B illustrates according to another embodiment of the invention and the functional block diagram of the MMR system that disposes;
Fig. 2 A, 2B, 2C and 2D illustrate acquisition equipment according to an embodiment of the invention;
Fig. 2 E illustrates according to one embodiment of present invention and the functional block diagram of the acquisition equipment that disposes;
Fig. 3 illustrates according to one embodiment of present invention and the functional block diagram of the MMR computing machine that disposes;
Fig. 4 illustrates according to one embodiment of present invention and one group of included software part in the MMR software suite that disposes;
Fig. 5 illustrates expression according to one embodiment of present invention and the diagram of embodiment of the MMR document of configuration;
Fig. 6 illustrates document finger print matching method according to an embodiment of the invention;
Fig. 7 illustrates according to one embodiment of present invention and the document finger print matching system that disposes;
Fig. 8 illustrates the flow process that text according to an embodiment of the invention/non-text is distinguished;
Fig. 9 illustrates the example that text according to an embodiment of the invention/non-text is distinguished;
Figure 10 illustrates the flow process for the point size of the text of estimating images fragment according to an embodiment of the invention;
Figure 11 illustrates document fingerprint matching technology according to another embodiment of the invention;
Figure 12 illustrates document fingerprint matching technology according to another embodiment of the invention;
Figure 13 illustrates the example that interactive image according to an embodiment of the invention is analyzed;
Figure 14 illustrates document fingerprint matching technology according to another embodiment of the invention;
Figure 15 illustrates the example that literal bounding box according to an embodiment of the invention is surveyed;
Figure 16 illustrates Feature Extraction Technology according to an embodiment of the invention;
Figure 17 illustrates Feature Extraction Technology according to another embodiment of the invention;
Figure 18 illustrates Feature Extraction Technology according to another embodiment of the invention;
Figure 19 illustrates Feature Extraction Technology according to another embodiment of the invention;
Figure 20 illustrates document fingerprint matching technology according to another embodiment of the invention;
Figure 21 illustrates the multi-categorizer feature extraction of document fingerprint matching according to an embodiment of the invention;
Figure 22 and 23 illustrates the example of document fingerprint matching technology according to an embodiment of the invention;
Figure 24 illustrates document fingerprint matching technology according to another embodiment of the invention;
Figure 25 illustrates the flow process of the feedback of database-driven according to an embodiment of the invention;
Figure 26 illustrates document fingerprint matching technology according to another embodiment of the invention;
Figure 27 illustrates the flow process of the classification of database-driven according to an embodiment of the invention;
Figure 28 illustrates document fingerprint matching technology according to another embodiment of the invention;
Figure 29 illustrates the flow process of the multiple classifition of database-driven according to an embodiment of the invention;
Figure 30 illustrates document fingerprint matching technology according to another embodiment of the invention;
Figure 31 illustrates document fingerprint matching technology according to another embodiment of the invention;
Figure 32 illustrates document fingerprint matching technology according to another embodiment of the invention;
Figure 33 illustrates the flow process of multi-layer identification according to an embodiment of the invention;
Figure 34 A illustrates according to one embodiment of present invention and the functional block diagram of the MMR Database Systems that dispose;
Figure 34 B illustrates the example of the MMR feature extraction of the technology based on OCR according to an embodiment of the invention;
Figure 34 C illustrates example index table tissue according to an embodiment of the invention;
Figure 35 illustrates the method for generation of a MMR concordance list according to an embodiment of the invention;
Figure 36 illustrates the method for calculating about graduate a group of document, the page and hypothesis on location of destination document according to an embodiment of the invention;
Figure 37 A illustrates according to another embodiment of the invention and the functional block diagram of the MMR parts that dispose;
Figure 37 B illustrates one group of included in the MMR print software according to an embodiment of the invention software part;
Figure 38 illustrate according to an embodiment of the invention in document the process flow diagram of the method for embedding hot spots;
Figure 39 A illustrates the example of html file according to an embodiment of the invention;
Figure 39 B illustrates the example of marked version of the html file of Figure 39 A;
Figure 40 A illustrates the example of the html file of Figure 39 A shown in the browser according to an embodiment of the invention;
Figure 40 B illustrates the example of printing edition of the html file of Figure 40 A according to an embodiment of the invention;
Figure 41 illustrates symbol focus according to an embodiment of the invention and describes;
Figure 42 A and 42B illustrate the exemplary page_desc.xml file of the html file of Figure 39 A according to an embodiment of the invention;
Figure 43 illustrates according to an embodiment of the invention, corresponding to the hotspot.xml file of Figure 41,42A and 42B;
Figure 44 illustrates the process flow diagram of the employed process of forwarding DLL according to an embodiment of the invention;
Figure 45 illustrates conversion according to an embodiment of the invention corresponding to the process flow diagram of the method for the character of the focus in the document;
Figure 46 illustrates the example of the electronic edition of document according to an embodiment of the invention;
Figure 47 illustrates the example that document is revised in printing according to an embodiment of the invention;
Figure 48 illustrates the process flow diagram of the method for shared document note according to an embodiment of the invention;
Figure 49 A illustrates the sample source webpage in the browser according to an embodiment of the invention;
The sample that Figure 49 B illustrates in the browser according to an embodiment of the invention is revised webpage;
Figure 49 C illustrates sample printing network page according to an embodiment of the invention;
Figure 50 A illustrates interpolation focus according to an embodiment of the invention to the process flow diagram of the method for image conversion document;
Figure 50 B illustrates definition according to an embodiment of the invention for the process flow diagram of the method for the focus that is added into the image conversion document;
Figure 51 A illustrates the example of the user interface of the part that the newsprint page that scans according to an embodiment is shown;
Figure 51 B illustrates be used to defining data or reciprocation, with the user interface related with selected focus;
Figure 51 C illustrates the user interface that comprises Figure 51 B that assigns frame according to an embodiment of the invention;
Figure 51 D illustrates the user interface for the focus in the display document according to an embodiment of the invention;
Figure 52 illustrates the process flow diagram of the method for use MMR document according to an embodiment of the invention and MMR system;
Figure 53 illustrates the block diagram of one group of exemplary commercial entity of according to an embodiment of the invention and MMR system relationship;
Figure 54 illustrates according to an embodiment of the invention, as the general business method easily by using the MMR system, the process flow diagram of method.
Embodiment
The method of describing blending agent real border (MMR) system and being associated.The MMR system is provided for forming and comprises at least two types medium, such as print paper as the first medium, and digital photograph, digital movie, digital audio file, digital text file or network linking be as the second medium, the mechanism of mixed media document.MMR system and/or technology can be further used for convenient various utilize portable electron device (as, PDA or camera cell phone) with the combination of paper document, so that the business prototype of mixed media document to be provided.
In a particular embodiment, the MMR system comprises the content-based retrieval database, and it represents to allow to use the two-dimensional geometry relation between the target that the mode of text based index search extracts from printed document.The cumulative technology of evidence is combined the frequency that feature occurs with the possibility of its position in 2 dimensional region.In such embodiment, the MMR Database Systems comprise the concordance list that receives the description of being calculated by the MMR feature extraction algorithm.X-y position in those pages of concordance list identification document, the page and each feature appearance place.Provide the data from concordance list, the evidence accumulation algorithm is calculated graduate one group of document, the page and hypothesis on location.As expected, can use relational database (or other storage facility that is fit to) to store the other characteristic about each document, the page and position.
The MMR Database Systems also can comprise other parts, such as MMR processor, acquisition equipment, communication mechanism with comprise the storer of MMR software.Also the MMR processor can be connected to storer or source, input media and the output unit of media type.In such configuration, MMR software comprises the executable routine of MMR processor, be used for accessing MMR document, establishment or the modification MMR document with other digital content and use document to carry out other operation, such as business transaction, data query, report, etc.
The MMR system survey
With reference now to Figure 1A,, according to an embodiment of the invention real border (MMR) the system 100a of blending agent is shown.The 100a of MMR system comprises MMR processor 102, communication mechanism 104, has the acquisition equipment 106 of portable input device 168 and portable output unit 170, comprises the storer 108 of MMR software, basic medium storage 160, MMR medium storage 162, output unit 164 and input media 166.By providing the information of using from existing printed document (first medium type) as the second medium type, such as the mode of the index of the information of audio frequency, video, text, renewal and service, the 100a of MMR system creates mixed media environment.
Acquisition equipment 106 can produce the expression of printed document (for example, image, drawing or other such representation), and this expression is sent to MMR processor 102.Then the 100a of MMR system should represent and MMR document and other second medium type matching.The 100a of MMR system also takes action to be responsible for for input and the identification of response expression.The action that the 100a of MMR system takes can comprise for any type, for example, retrieving information, places an order, retrieve video or sound, storage information, creates new document, printed document, display document or image, etc.By in the use of this described content-based retrieval database technology, the 100a of MMR system provides print text is submitted to the mechanism of dynamic media that the inlet point of interested or valuable digital content or service is provided to the user.
MMR processor 102 process data signal, and can comprise various counting system structures, comprise the architecture of the combination of complex instruction set computer (CISC) (CISC) architecture, Reduced Instruction Set Computer (RISC) architecture or realization instruction set.In a particular embodiment, MMR processor 102 comprises ALU, microprocessor, general purpose computing machine or some out of Memory equipment that are equipped with for carrying out operation of the present invention.In another embodiment, MMR processor 102 comprises the general purpose computing machine with patterned user interface, this graphical user interface can by, for example, to produce in the program that Java was write on the operating system based on WINDOWS or UNIX operating system, moved.Although single processor only is shown in Figure 1A, can comprises a plurality of processors.Processor is connected to MMR storer 108, and carries out the instruction that is stored in the there.
Communication mechanism 104 is for any device or the system that acquisition equipment 106 are connected to MMR processor 102.For example, (for example can use network, WAN and/or LAN), wired link (for example, USB, RS232 or Ethernet), wireless link (for example, infrared ray, bluetooth or 802.11), the link of mobile device communication linkage (for example, GPRS or GSM), public switch telephone network (PSTN) or these any combination realize communication mechanism 104.Here can use many communication architectures and agreement.Acquisition equipment 106 comprises the equipment as transceiver, joining with communication mechanism 104, and is any device that can digitally catch by input media 168 image or data.Acquisition equipment 106 can optionally comprise output unit 170, and alternately is portable.For example, acquisition equipment 106 are camera cell phones, PDA device, digital camera, barcode reader, radio-frequency (RF) identification (RFID) reader of standard, such as the such computer peripheral of the web camera of standard or such as the such built-in of the video card of PC.With reference to figure 2A-2D, several examples of acquisition equipment 106a-d are described respectively in more detail.In addition, acquisition equipment 106 can comprise so that content-based retrieval can carry out and acquisition equipment 106 is connected to the software application of the infrastructure of the 100a/100b of MMR system.Can find the greater functionality details of acquisition equipment 106 with reference to figure 2E.According to this open invention, the acquisition equipment 106 of many tradition and customization, with and separately function and architecture will be clearly.
Storer 108 storages may be by instruction and/or the data of processor 102 execution.This instruction and/or data can comprise be used to the code that is executed in this described any and/or all technology.Storer 108 can be dynamic RAM (DRAM) device, static RAM (SRAM) device or any other suitable storage arrangement.With reference to figure 4, hereinafter storer 108 will be described in further detail.In a particular embodiment, storer 108 comprise MMR software suite, operating system and other application program (as, word-processing application, email application, financial applications and Web-browser application).
Basic medium storage 160 is for its original form storage second medium type, and MMR medium storage 162 is as described in this, with the information that creates the MMR environment for store M MR document, database and other.Although illustrate respectively, in another embodiment, basic medium storage 160 and MMR medium storage 162 can be the parts of the same memory device, or integrated.Data-carrier store 160,162 is further stored data or the instruction about MMR processor 102, and comprise one or more devices, it comprises, for example, hard disk drive, floppy disk, CD-ROM device, DVD-ROM device, DVD-RAM device, DVD-RW device, flash memory device or any other suitable mass storage device.
Output unit 164 may be operably coupled to MMR processor 102, and be expressed as output picture demonstration those, the data sound or the current content and any device of being equipped with.For example, output unit 164 can be as printer, display device and/or loudspeaker polytype any one.Exemplary demonstration output unit 164 comprises display device, screen or the monitor of cathode-ray tube (CRT) (CRT), liquid crystal display (LCD) or any other similar outfit.In one embodiment, output unit 164 is equipped with touch-screen, and wherein touch-sensitive, transparent panel cover the screen of output unit 164.
Input media 166 may be operably coupled to MMR processor 102, and be as keyboard and cursor control, scanner, multi-function printer, camera or video camera, keypad, touch-screen, detector, RFID tagging reader, switch or allow any mechanism of user and the 100a of system interaction polytype any one.In one embodiment, input media 166 is keyboard and cursor control.Cursor control can comprise, for example, mouse, trace ball, stylus, pen, touch-screen and/or Trackpad, cursor direction key or other impel the mechanism of cursor movement.In another embodiment, input media 166 are microphones, for using audio frequency plug card/expansion card, analogue-to-digital converters and digital signal processor design in the general purpose computer system, with convenient voice recognition and/or audio frequency processing.
Figure 1B illustrates according to another embodiment of the invention and the functional block diagram of the 100b of MMR system that disposes.In this embodiment, the 100b of MMR system comprises the printer 116 of MMR computing machine 112 (by user's 110 operations), network medium server 114 and generation printed document 118.The 100b of MMR system further comprises office's entrance 120, ISP's server 122, is electrically connected to electronic console 124 and the document scanner 127 of set-top box 126.Provide communication linkage between MMR computing machine 112, network medium server 114, printer 116, office's entrance 120, ISP's server 122, set-top box 126 and the document scanner 127 by network 128, network 128 can be LAN (for example, office or home network), the combination of WAN (for example, the Internet or company's network), LAN/WAN or any other data routing that can communicate by letter by its a plurality of calculation elements.
The 100b of MMR system further comprises can pass through cellular infrastructure 132, Wireless Fidelity (Wi-Fi) technology 134, Bluetooth technology 136 and/or infrared ray (IR) technology 138, with the acquisition equipment 106 of one or more computing machines 112, network medium server 114, user's printer 116, office's entrance 120, ISP's server 122, electronic console 124, set-top box 126 and document scanner 127 radio communications.Alternately, perhaps in addition, acquisition equipment 106 can pass through cable technology 140, communicates by letter with MMR computing machine 112, network medium server 114, user's printer 116, office's entrance 120, ISP's server 122, electronic console 124, set-top box 126 and document scanner 127 in wired mode.Although as the element that separates, Wi-Fi technology 134, Bluetooth technology 136, IR technology 138 and cable technology 140 are being shown among Figure 1B, such technology also can be integrated into processing environment (as, MMR computing machine 112, network medium server 114, acquisition equipment 106, etc.).In addition, the 100b of MMR system further comprises the geographic position mechanism 142 of or wire communication wireless with ISP's server 122 or network 128.This also can be integrated among the acquisition equipment 106.
MMR user 110 is for just using any individual of the 100b of MMR system.MMR computing machine 112 is any desktop PC, laptop computer, network computer or other such processing environment.User's printer 116 is for producing any family, office or the business printer of printed document 118, the paper document of printed document 118 for being formed by one or more printer pages.
Network medium server 114 passes through the information of network 128 access and/or the network computer of application program for the user who keeps by the 100b of MMR system.In a particular embodiment, network medium server 114 is centralized computer, and storage medium file on it is such as text source file, webpage, audio frequency and/or video file, image file (for example, still photo) and like that.Network medium server 114 is, for example, and Google's image and/or the video server of the Comcast ordering server of Comcast company, the Ricoh documentation center of Creative Company of Ricoh or Google.Generally speaking, network medium server 114 provide to may be attached on the printed document 118 via acquisition equipment 106, the access of any data integrated with it or associated with it.The event that office's entrance 120 occurs for the environment that is used for catching MMR user 110, the event that for example occurs in MMR user 110 the office, selectable mechanism.Office's entrance 120 is for example, to be located away from the computing machine of MMR computing machine 112.In this situation, office's entrance 120 is connected directly to MMR computing machine 112 or is connected to MMR computing machine 112 by network 128.Alternately, office's entrance 120 is placed in the MMR computing machine 112.For example, office's entrance 120 makes up from traditional personal computer (PC), and then enlarges substantial with the suitable hardware of supporting any acquisition equipment that is associated 106.Office's entrance 120 can comprise acquisition equipment, for example video camera and audio sound-recording machine.Alternately, the data from MMR computing machine 112 can be caught and store to office's entrance 120.For example, office's entrance 120 can receive and monitor function and the event that occurs on the MMR computing machine 112.As a result, office's entrance 120 can record all Voice ﹠ Videos in MMR user 110 the physical environment, and all events that occur on the record MMR computing machine 112.In a particular embodiment, office's entrance 120 is caught the event from MMR computing machine 112, and the video screen during such as positive Edit Document is caught.In doing so, office's entrance 120 is caught when creating given document, the website of browsing and other document of consulting.Can utilize for MMR user 110 by his/her MMR computing machine 112 or acquisition equipment 106 after a while.In addition, office's entrance 120 can be as the multimedia server of user add to the montage of its document.In addition, office's entrance 120 can be caught other office event, the talk that for example occurs on the table the time when paper document (as, phone or office), the discussion on the phone and the little meeting in the office.By the use of the identical content-based retrieval technology that develops for acquisition equipment 106, the video camera (not shown) on office's entrance 120 can be identified the paper document on MMR user 110 the physics desktop.
ISP's server 122 can be by the information of network 128 access or any commerce server of application program for the MMR user 110 who keeps the MMR 100b of system.Especially, ISP's server 122 is any ISP's related with the 100b of MMR system representative.ISP's server 122 is that for example, wired TV supplier's commerce server is such as Comcast company; Cellular telephone services supplier is such as Verizon Wireless; Internet service provider is such as the inferior communication of Ah's Delphi; The Online Music ISP is such as Sony; And the like, but be not limited to this.Electronic console 124 is any display device, for example, and standard analog or Digital Television (TV), pure flat TV, flat-panel monitor or optical projection system, but be not limited to this.As is known, set-top box 126 is for processing the acceptor device from the input signal of satellite dish, antenna, cable, network or telephone wire.An exemplary manufacturer of set-top box is Advanced Digital Droadcast (science and technology that rises far away).Set-top box 126 is electrically connected to the video input of electronic console 124.
Document scanner 127 is commercial available file scanning instrument apparatus, for example the KV-S2026C full color scanner of PANASONIC.To the conversion of MMR preparation document, use document scanner 127 at existing printed document.Cellular infrastructure 132 is representatives of a plurality of cell towers and the interconnection of other cellular network.Especially, by the use of cellular infrastructure 132, via being incorporated into device, for example be incorporated into the radio modem of acquisition equipment 106, two-way sound and data communication be provided for graspable, portable and vehicle-mounted phone.
Wi-Fi technology 134, Bluetooth technology 136 and IR technology 138 are the representative of the technology of the radio communication between the convenient electronic installation.As is known, Wi-Fi technology 134 be with based on the related technology of WLAN (wireless local area network) (WLAN) product of 802.11 standards.As is known, Bluetooth technology 136 is to describe the use that connects by short-distance radio, and cellular phone, computing machine and PDA be interconnected telecommunication industrial specification how.IR technology 138 allows electronic installation by short range radio signals communication.For example, IR technology 138 is that TV remote controller, laptop computer, PDAs and other install employed sight line wireless communication medium.IR technology 138 is worked to the frequency spectrum below the visible light at microwave therefrom.In addition, in one or more other embodiment, can use IEEE 802.15 (UWB) and/or 802.16 (WiMAX) standard support of wireless communication.
Cable technology 140 is any wire communication mechanism, and for example standard ethernet connects or USB (universal serial bus) (USB) connects.By using cellular infrastructure 132, Wi-Fi technology 134, Bluetooth technology 136, IR technology 138 and/or cable technology 140, acquisition equipment 106 can be two-wayly be communicated by letter with any or all the electronic installation of the 100b of MMR system.
Geographic position mechanism 142 is any mechanism that is applicable to determine the geographic position.For example, as is known, geographic position mechanism 142 is for providing the GPS artificial satellite of position data to tellurian gps receiver device.In the exemplary embodiment shown in Figure 1B, the ISP server 122 that be connected to network 128 of GPS artificial satellite by being combined with the gps receiver (not shown) offers position data the user of the 100b of MMR system.Alternately, one group of cell tower of 911 services that geographic position mechanism 142 is machine-processed for the triangulation that is provided as the parts of determining the geographic position, cell tower is identified (ID) mechanism and/or strengthen (as, a subset of cellular infrastructure 132).Alternately, provide geographic position mechanism 142 by the signal strength measurement from the known location of WiFi accessing points or blue-tooth device.
In operation, acquisition equipment 106 is used as the client computer that MMR user 110 has.Exist on it so that the content-based retrieval operation can be carried out, and acquisition equipment 106 is connected to the software application of the infrastructure of the 100b of MMR system by cellular infrastructure 132, Wi-Fi technology 134, Bluetooth technology 136, IR technology 138 and/or cable technology 140.In addition, MMR computing machine 112 exist carry out picture print catch operation, event capturing operate (as, the edit history of preservation document), server operation (as, the data and the event that are used for being supplied to other object after a while and on MMR computing machine 112, preserve) or the printer management operation (as, printer 116 can be installed as the needed data queue with the MMR as document layout and multimedia clips) such, but be not limited to the software application of this several operations.Network medium server 114 provides and is attached to printed document, the printed document 118 of printing such as the MMR computing machine 112 by belonging to MMR user 110, the access of data.In doing so, second medium such as video or audio frequency, with first medium, associates such as paper document.Hereinafter with reference Fig. 2 E, 3,4 and 5 describes and is used to form second medium to the more details of the related software application of first medium and/or mechanism.
Acquisition equipment
Fig. 2 A, 2B, 2C and 2D illustrate according to an embodiment of the invention exemplary acquisition device 106.More clearly, Fig. 2 A is depicted as the acquisition equipment 106a of camera cell phone.Fig. 2 B is depicted as the acquisition equipment 106b of PDA device.Fig. 2 C is depicted as the acquisition equipment 106c of computer peripheral devices.A web camera that example is any standard of computer peripheral devices.Fig. 2 D is depicted as it is placed to acquisition equipment 106d in the calculation element (such as, MMR computing machine 112).For example, acquisition equipment 106d is computer graphics card.Can find the exemplary details of acquisition equipment 106 with reference to figure 2E.
In the situation of acquisition equipment 106a and 106b, acquisition equipment 106 can for MMR user 110 all, and can follow the tracks of its physical location by geographic position mechanism 142 or by each cell tower in the cellular infrastructure 132 ID number.
With reference now to Fig. 2 E,, the functional block diagram according to an embodiment of acquisition equipment 106 of the present invention is shown.Acquisition equipment 106 comprises at least one of processor 210, display 212, keypad 214, memory storage 216, wireless communication link 218, wire communication link 220, MMR software suite 222, acquisition equipment user interface (UI) 224, document fingerprint matching module 226, third party software module 228 and multiple catch mechanism 230.Exemplary acquisition mechanism 230 comprises video camera 232, digital camera 234, phonographic recorder 236, the highlighted device 238 of electronics, laser instrument 240, GPS device 242 and RFI D reader 244, but is not limited to this.Processor 210 is CPU (central processing unit) (CPU), such as the Pentium microprocessor of Intel company's manufacturing, but is not limited to this.Display 212 is the video display mechanism of any standard, as in the graspable electronic installation employed those.More clearly, for example, display 212 is any digital indicator, such as liquid crystal display (LCD) or Organic Light Emitting Diode (OLED) display.Keypad 214 is the alphanumeric symbol input mechanism of any standard, employed keypad in criterion calculation device and the graspable electronic installation as the honeycomb fashion mobile phone.Memory storage 216 is any volatibility or Nonvolatile memory devices, for example, as the well-known, hard disk drive or random access memory (RAM) device.
Wireless communication link 218 for by as is well known access points (not shown) and LAN (as, IEEE 802.11 Wi-Fi or Bluetooth technology) the wireless data communications mechanism of direct point-to-point communication or radio communication is provided.Wire communication link 220 is for example, to connect the wired data communication mechanism that direct communication is provided by standard ethernet and/or USB.
The cura generalis software that MMR software suite 222 operates one type the MMR of medium with the merging of the second type for carrying out picture.Can find the more details of MMR software suite 222 with reference to figure 4.
Acquisition equipment user interface (UI) 224 is for being used for the user interface of operation acquisition equipment 106.By using acquisition equipment UI 224, for the selection of thereon function, various menus are presented to MMR user 110.More clearly, the menu of acquisition equipment UI 224 allows MMR user's 110 management roles, as with the paper document reciprocation, from existing document sense data, with data write existing document, check and with the associated augmented reality reciprocation of those documents and check and with on his/her MMR computing machine 112 as shown in the augmented reality reciprocation of document associations, but be not limited to this.
Document fingerprint matching module 226 is for being used for extracting from least one text image of catching of the catch mechanism 230 by acquisition equipment 106 software module of feature.Document fingerprint matching module 226 also can be carried out the pattern match between the database of the image of catching and document.In the most basic level, and according to an embodiment, document fingerprint matching module 226 is determined the position of the images fragment in the larger page-images, and wherein that page-images is to select from very large document sets.Document fingerprint matching module 226 comprises data that reception catches, from the expression of the extracting data image of catching, fragment identification and mobile routine and the program of analyzing, carry out a row x-y position of the page that decisive combination and output input picture be positioned in the perform document.For example, in order to identify document and the chapters and sections in the document that wherein extracts it, document fingerprint matching module 226 can be the algorithm in conjunction with the horizontal and vertical feature of extracting from the image of the fragment of text.In case extracted feature, for the distinguished symbol document, just inquiry for example, is positioned at the printed document index (not shown) on MMR computing machine 112 or the network medium server 114.Under the control of acquisition equipment UI 224, document fingerprint matching module 226 addressable printed document index.MMR computing machine 112 with reference to figure 3 is described the printed document index in further detail.Notice that in an alternative embodiment document fingerprint matching module 226 may be the part of MMR computing machine 112, is not positioned at acquisition equipment 106.In such embodiments, acquisition equipment 106 is sent to MMR computing machine 112 with original capture-data, so that image extraction, pattern match and document and location recognition.In another embodiment, document fingerprint matching module 226 is only carried out feature extraction, and the feature of extracting is sent to MMR computing machine 112, so that pattern match and identification.
Third party software module 228 is the representative for any third party software module that strengthens any operation that may occur on acquisition equipment 106.Exemplary third party software comprises fail-safe software, image perception software, image processing software and MMR database software.
As mentioned above, acquisition equipment 106 can comprise any amount of catch mechanism 230, will describe its example now.
Video camera 232 is such as the digital video recording device can finding in standard digital camera or some cellular handsets.
Digital camera 234 be can capture digital image any standard digital camera apparatus.
Phonographic recorder 236 is for can also exporting in digital form its any standard audio pen recorder (microphone and the hardware that is associated) by the capturing audio signal.
The highlighted device 238 of electronics is for providing scanning, storage and transmitting print text, bar code and little image to the highlighted device of electronics of the ability of PC, laptop computer or PDA device.For example, the quick links hand held scanner that the highlighted device 238 of electronics is WizCom Technologies company, its permission information is stored in pen upward or by serial port, infrared communication or USB adapter, directly is passed to computer applied algorithm.
As the well-known, laser instrument 240 be by stimulated emission produce relevant, near monochromatic light source.For example, laser instrument 240 is the laser diode of standard, and it launches the semiconductor device of coherent light for when applying forward bias.Related with laser instrument 240 and be included in wherein be to measure the detector that laser instrument 240 is guided in the total amount of the light that this image reflects.
GPS device 242 is to supply with position data, such as digital latitude and longitude data, any Portable GPS acceptor device.The example of Portable GPS device 242 is from the portable artificial satellite navigational system of the NV-U70 of Sony with from the serial GPS device of the Mai Zhelun board RoadMate of Thales North America company, Meridian series GPS device and Explorist series GPS device.As the well-known, GPS device 242 provides dependence triangulation for a plurality of geographic position mechanism 142, partly, in real time, determines the mode of the position of acquisition equipment 106.
RFID reader 244 is commercial available RFID label reader system, such as the TI rfid system of Texas Instrument's manufacturing.The RFID label is for being used for by using the wireless device of the unique project of radiowave identification.As the well-known, the RFID label is made of microchip, and this microchip is attached to antenna, and stores unique digital identification number thereon.
In a particular embodiment, acquisition equipment 106 comprises at least one of processor 210, display 212, keyboard 214, memory storage 216, wireless communication link 218, wire communication link 220, MMR software suite 222, acquisition equipment UI 224, document fingerprint matching module 226, third party software module 228 and catch mechanism 230.In doing so, acquisition equipment 106 is a global function device.Alternately, acquisition equipment 106 can have less function, and thereby can comprise one group of limited functional part.For example, MMR software suite 222 and document fingerprint matching module 226 can be remotely located at, for example, the MMR computing machine 112 of the 100b of MMR system or network medium server 114 places, and by acquisition equipment 106 by wireless communication link 218 or wire communication link 220 access.
The MMR computing machine
With reference now to Fig. 3,, the MMR computing machine 112 that disposes according to embodiments of the invention is shown.If see, MMR computing machine 112 is connected to the network medium server 114 that comprises one or more multimedias (MM) file 336, produces user's printer 116, the document scanner 127 of printed document 118 and comprise acquisition equipment UI 224 and the acquisition equipment 106 of the first example of document fingerprint matching module 226.Communication linkage between these parts can directly link or pass through network.In addition, document scanner 127 comprises the second example of document fingerprint matching module 226 '.
The MMR computing machine 112 of this exemplary embodiment comprises one or more source files 310, the first source document (SD) browser 312, the 2nd SD browser 314, printer driver 316, printed document (PD) trapping module 318, the document event database 320 of storage PD index 322, event capturing module 324, document analysis device module 326, multimedia (MM) montage browser/editor module 328, the printer driver 330 of MM, document-video paper (DVP) print system 332, with video paper document 334.
Source file 310 is the representatives for any source file of the electronic representation of document (or its part).Exemplary source file 310 comprises HTML (Hypertext Markup Language) (HTML) file, the Word of Microsoft file, the PowerPoint of Microsoft file, simple text file, portable document format (PDF) file, and like that, and it is stored on the hard disk drive (or other suitable storer) of MMR computing machine 112.
The one SD browser 312 and the 2nd SD browser 314 are independent PC application program or plug-in unit about the existing PC application program of access that the data that have been associated with source file 310 are provided.The first and second SD browsers 312,314 can be used for retrieving original html file or MM montage, to show at MMR computing machine 112.
As the well-known, printer driver 316 is the printer driver software of the communication linkage between controlling application program and page-description language or any special employed printer control language of printer.Especially, no matter when print a document, such as printed document 118, printer driver 316 all will have the data of correct control command, those of the printing equipment that is used for them that for example company of Ricoh provides, the printer 116 of feeding.In one embodiment, printer driver 316 is different from traditional print driver, because it catches the expression of x-y coordinate, font and the point size of each character on each printer page automatically.In other words, it catches the information of the content of relevant each document of printing, and with that data feedback to PD trapping module 318.
PD trapping module 318 represents for the printing of catching document, so that can retrieve the software application of the layout of character on the printer page and figure.In addition, by using PD trapping module 318, printing constantly, in real time, the printing of automatically catching document represents.More clearly, PD trapping module 318 is for to catch the two-dimensional arrangement of the text on the printer page, and this information is sent to the software routines of PD index 322.In one embodiment, PD trapping module 318 operates by the Windows text layout order that catches each character on the printer page.Text layout's order is to the x-y position of each character on operating system (OS) the indication printer page and font, point size, etc.In essence, the print data that is sent to printer 116 is listened in 318 monitorings of PD trapping module.In an example shown, PD trapping module 318 is connected to the output of a SD browser 312, so that the catching of data.Alternately, can be in the printer driver 316 interior functions that directly realize PD trapping module 318.According to this open invention, various configurations will be clearly.
According to one embodiment of present invention, document event database 320 is databases of any standard of changing for the relation between storage printed document and the event.(with reference to figure 34A, hereinafter further document event database 320 being described as the MMR database).For example, document event database 320 storage from source file 310 (as, Word, HTML, pdf document) to the bi-directional chaining of the event that is associated with printed document 118.Exemplary event comprises has printed the note of catching, with the client applications of acquisition equipment 106 multimedia be added into document or multimedia clips that namely is engraved in multimedia clips on the acquisition equipment 106 after the Word document.In addition, can be stored in the document event database 320, other event related with source file 310 comprises charges to daily record when opening, closing or removing given source file 310; When being in the applications active on the desktop of MMR computing machine 112, given source file 310 charges to daily record; Document " copied " and daily record is charged in time and the destination of " movement " operation; And the edit history of given source file 310 charged to daily record.Such event is caught and is stored in the document event database 320 by event capturing module 324.Connect document event database 320 and come the output of reception sources file 310, event capturing module 324, PD trapping module 318 and scanner 127, and also be connected to acquisition equipment 106, inquire about and data to receive, and output is provided.
Document event database 320 is also stored PD index 322.PD index 322 is the Feature Mapping that will extract from the image of the printed document software application on their sign format (such as the image of scanning to Word).In one embodiment, PD trapping module 318 is given x-y position that PD index 322 provides each character on the printer page and font, point size, etc.When printing given document, make up PD index 322.Yet, catch all print datas and can it being kept in the PD index 322 in the mode of after a while time inquiry.For example, if printed document 118 comprises the word " garden " that is physically located at delegation on the word on the page " rose ", then such inquiry (that is, word " garden " is on word " rose ") supported in PD index 322.PD index 322 comprises the record that word " garden " thereon appears at which position in which document, which page and those pages on the word " rose ".Thereby, organize PD index 322, to support based on feature or text based inquiry.By using PD trapping module 318 during the printing and/or by during scan operation, using the document fingerprint matching module 226 of document scanner 127, produce the content as the PD index 322 of the electronic representation of printed document.Hereinafter with reference to other architecture and the function of Figure 34 A-C, 35 and 36 descriptive data bases 320 and PD index 322.
Event capturing module 324 is for catching the software application of the event that is associated with given printed document 118 and/or source file 310 at MMR computing machine 112.These events are hunted down during the life cycle of given source file 310 and are stored in the document event database 320.In a specific example, by use case trapping module 324, catch the browser that relates to MMR computing machine 112, for example the event of movable html file in the SD browser 312.These events may be included in the time of html file shown on the MMR computing machine 112 or at the filename that shows or print other document of opening in the html file.For example, if MMR user 110 wants to know which document he/her checked or worked in when showing or printing html file (in the moment after a while), and then this event information is of great use.The exemplary event that event capturing module 324 is caught comprises that documents editing is historical; From the video (for example, as being caught by office's entrance 120) that approaches when office's meeting of the moment appearance of given source file 310 on the table the time; And when given source file 310 be the call that occurs when opening (as, caught by office's entrance 120).
The exemplary functions of event capturing module 324 comprises: 1) follow the tracks of-follow the tracks of activity file and application program; 2) thump catch-thump catches related with applications active; 3) frame buffer is caught with index and is demarcated-index to each frame buffer image with optical character identification (OCR) result of frame buffer data, so that the chapters and sections of printed document and its can be complementary in the time that screen shows.Graphical display interface (GDI) the shade dll of the text drawing command of the PC desktop of alternately, can enough seizure being issued by PC operating system catches text.MMR user 110 can point to document with acquisition equipment 106, and determines when that it is movable at the desktop of MMR computing machine 112; And 4) read history capture-in order how long to follow the tracks of, and which part of special document is visible for MMR user 110, and frame buffer is caught and is connected data with the index proving operation and is connected with the analysis of document time of activity on the desktop of his/her MMR computing machine 112.In doing so, whether reading document in order to infer MMR user 110, related with other event may occur, moving such as button or mouse.Being combined on the MMR computing machine 112 of document event database 320, PD index 322 and event capturing module 324 realizes partly, perhaps alternately, realizes as the database of sharing.If realize partly, and realize comparing in the mode of sharing, then need less security.
Document analysis device module 326 is software application, it analyzes the source file 310 relevant with each printed document 118, to be positioned at useful object wherein, such as URL(uniform resource locator) (URL), address, title, author, time or locative phrase, as, Hallidie Building.In doing so, determine the position of those objects in the printing edition of source file 310.Then receiving trap can use the output of document analysis device module 326, with the statement with other information amplification document 118, and the accuracy of raising pattern match.In addition, for example in the situation of URL, receiving trap also can take to move the webpage that the use location retrieval is associated with URL.Connect document analysis device module 326 with reception sources file 310, and this module offers document fingerprint matching module 226 with its output.Although only as being connected to the document fingerprint matching module 226 of acquisition equipment and illustrating, the output of document analysis device module 326 can be connected to all or any amount of document fingerprint matching module 226, and no matter where they are positioned at.In addition, the output of document analysis device module 326 also can be stored in the document event database 320, in order to use after a while.
MM montage browser/editor module 328 is for providing the software application of creation function.MM montage browser/editor module 328 be independent software application or, alternately,, be the plug-in unit (being represented by the dotted line to the 2nd SD browser 314) that moves on the document viewer.MM montage browser/editor module 328 is shown to the user with multimedia file, and is connected to the network medium server, to receive multimedia file 336.In addition, when MMR user 110 just creating document (as, multimedia clips is attached to paper document) time, MM montage browser/editor module 328 is the support facility of this function.MM montage browser/editor module 328 is for illustrating metadata, such as the information of analyzing from the document of printing close to the moment when catching multimedia, application program.
The printer driver 330 of MM provides the creation MMR ability of document.For example, highlight text among the UI that MMR user 110 can produce at the printer driver 330 by MM, and will comprise retrieving multimedia data or on network 128 or in the action that MMR computing machine 112 is carried out some other processes, be added into the text.The printer driver 330 of MM provides alternative output format of using bar code with the combination of DVP print system 332.This form must not need the content-based retrieval technology.The printer driver 330 of MM is for being used for supporting video paper technology, that is, video paper 334, printer driver.The printer driver 330 of MM creates and comprises that the papery of bar code represents, as the multimedia mode of access.Comparatively speaking, printer driver 316 creates and comprises that the papery of MMR technology represents, as the multimedia mode of access.The origination techniques that embodies in the combination of MM montage browser/editing machine 328 and SD browser 314 can create the output format identical with SD browser 312, thereby makes it possible to carry out the establishment of the MMR document prepared for content-based retrieval.Any data that DVP print system 332 is carried out in the document event database 320 that is associated with document are printed the attended operation that represents to it, perhaps with clear and definite or with implicit bar code.Implicit bar code refers to the pattern of the text feature that uses as bar code.
Video paper 334 is for being used in printable media, paper for example, on present the technology of audio frequency-visual information.In video paper, bar code as in computing machine, store or the index of addressable digital content wherein.Other content of multimedia of scanning input bar code and video clipping or the text dependent exported with system.There is the system that is used for printing audio or video paper, and the interface based on paper about multimedia messages is provided in these system natures.
The MM file 336 of network medium server 114 is the representative of any set of multiple file types and file layout.For example, MM file 336 is text source file, webpage, audio file, video file, audio/video file and image file (such as, digital photograph).
Described in Figure 1B, document scanner 127 is used for existing printed document to the conversion of MMR-preparation document.Yet, continuing with reference to figure 3, each page of the document that is applied to scan by the feature extraction operation with document fingerprint matching module 226 ' is used for the existing document that MMR-enables with document scanner 127.Subsequently, increase PD index 322 with scanning and the result of feature extraction operation, and thereby, the electronic representation of the document that scans is stored in the document event database 320.Then the information in the PD index 322 can be used for creation MMR document.Continuation is noticed the software function of MMR computing machine 112 and not only is confined to MMR computing machine 112 with reference to figure 3.Alternately, the software function shown in Fig. 3 can be distributed in any user-defined configuration between MMR computing machine 112, network medium server 114, ISP's server 122 and the acquisition equipment 106 of the 100b of MMR system.For example, printer driver 330 and the DVP print system 332 of source file 310, SD browser 312, SD browser 314, printer driver 316, PD trapping module 318, document event database 320, PD index 322, event capturing module 324, document analysis device module 326, MM montage browser/editor module 328, MM can be positioned within the acquisition equipment 106 fully, thereby and, the function of enhancing is provided for acquisition equipment 106.
The MMR software suite
Fig. 4 illustrates one group of included in the MMR software suite 222 according to an embodiment of the invention software part.Should be appreciated that, can comprise in MMR computing machine 112, acquisition equipment 106, network medium server 114 and other server in the MMR software suite 222 all or some.In addition, other embodiment of MMR software suite 222 may have from their one to all any amount of illustrated parts.The MMR software suite 222 of this example comprises: multimedia is explained software 410, and it comprises searching part 412 based on content of text, based on searching part 414 and the secret writing change parts 416 of picture material; Paper reads history log 418; Read online history log 420; Collaborative document is consulted parts 422, real-time informing parts 424, multimedia retrieval parts 426; Desktop video reminder feature 428; Webpage reminder feature 430, physics history log 432; Complete form is consulted device parts 434; Time transfer unit 436, position inform that parts 438, PC create parts 440; Document production parts 442; Acquisition equipment creation parts 444; Unconscious upload component 446; Documentation release searching part 448; PC document metadata parts 450; Acquisition equipment UI parts 452; With specific area parts 454.
According to a specific embodiment, multimedia is explained software 410 forms the MMR 100b of system in conjunction with the tissue of document event database 320 basic fundamental.More clearly, multimedia note software 410 is to explain for the multimedia of management paper document.For example, MMR user 110 points to any chapters and sections of paper documents with acquisition equipment 106, and then comes to add note to those chapters and sections with at least one of the catch mechanism 230 of acquisition equipment 106.In a specific example, the lawyer gives an oral account the record (establishment audio file) of the chapters and sections of relevant contract.Multi-medium data (audio file) is attached to automatically the original electron version of document.Text printout subsequently comprises the indication of the existence of those notes alternatively.
Searching part 412 based on content of text is the software application of retrieving content-based information from text.For example, by using the searching part 412 based on content of text, retrieval of content from the text fragment is identified original document and chapters and sections in the document, and perhaps identification is connected to the out of Memory of that fragment.Can utilize technology based on OCR based on the searching part 412 of content of text.Alternately, be used for execution from the two-dimensional arrangement that does not comprise the word length of text fragment based on the technology of OCR of the operation of the content-based retrieval of text.An example based on the searching part 412 of content of text is the algorithm in conjunction with the horizontal and vertical feature of extracting from the image of text fragments, with identification document and the chapters and sections in the document that wherein extracts it.Serially, concurrently or side by side usage level and vertical features.Use so not based on the feature set of OCR, so that when noise occurring, provide realization of High Speed and robustness.Searching part 414 based on picture material is the software application of retrieving content-based information from image.Carry out the data of catching and the image ratio between the image in the database 320 based on the searching part 414 of picture material, to produce the possible images match of row and the confidence level that is associated.In addition, each images match can have the data that are associated or respond user's input and the action of execution.In an example, by with image transitions for being used for inquiring about the vector representation about the image data base of image with identical feature placement, can retrieve based on the searching part 414 of picture material, for example, content based on raster image (for example, map).Selectable embodiment uses the color content of image or the geometric arrangement of the object in the image, to search matching image in database.
Secret writing change parts 416 are the software application of carrying out the secret writing change before printing.For the MMR application program is operated better, before print text and image, numerical information is added into described text and image.In alternative embodiment, secret writing change parts 416 produce and store M MR document, and the document comprises: 1) the original substance as text, audio frequency or video information; 2) with any picture text, audio frequency, video, the Applets of Java, hypertext link, etc. the other content that exists of such form.The secret writing change can be included in embed watermark in colour or the gray level image, the printing of the dot pattern on the document background, and perhaps the profile of printable character is to the trickle change of encoded digital information.Paper reads history log 418 and is the history log that reads of paper document.Paper reads history log 418 and is positioned at, for example, and in the document event database 320.Paper read history log 418 be based on by Creative Company of Ricoh exploitation from the document recognition technology of video, the history of the document that it reads for generation of MMR user 110.For example, for reminding reading and/or any event that is associated of MMR user's 110 documents, it is of great use that paper reads historical diary 418.Read online history log 420 and be the history log that reads of online document.Read online history log 420 and be based on the analysis of OS Events, and be positioned at, for example, in the document event database 320.Reading online history log 420 is records of MMR user 110 online document that reads and which part that reads document.Can in many ways the clauses and subclauses that read online history log 420 be printed in any printout subsequently, for example provide notes by the bottom at each page, perhaps by highlighting based on the text that reads every section time quantum that spends with different colors.In addition, multimedia note software 410 can enroll this data in the PD index 322.Alternatively, can read online history log 420 by MMR computing machine 112 assistance that are equipped with the device as the face detection system of monitoring MMR computing machine 112.
Collaborative document is consulted parts 422 for by his/her acquisition equipment 106 being pointed to any chapters and sections of documents, allows more than reader of the different editions of identical paper document to consult the software application of the applied note of other reader.For example, explain the overlayer that can be shown as on the document sketch map at acquisition equipment 106.Collaborative document is consulted parts 422 and can be realized with the existing cooperation software of any type, or with the existing cooperation software cooperation of any kind.
Real-time informing parts 424 are the software application of carrying out the real-time informing of the document that just is being read.For example, when MMR user 110 read document, his/her read trace and is posted on blog or online bulletin board.As a result, to identical topic interested other people can access and talk about the document.
Multimedia retrieval parts 426 are the software application of retrieving multimedia from paper document arbitrarily.For example, by acquisition equipment 106 is pointed to documents, MMR user 110 can retrieve when paper document arbitrarily and be presented on all sessions that MMR user 110 table occured when upper.There is office's entrance 120 (or other suitable mechanism) of catching multi-medium data in this hypothesis MMR user's 110 the office.
Desktop video reminder feature 428 is for reminding MMR user 110 at the software application of the event of MMR computing machine 112 appearance.For example, by acquisition equipment 106 being pointed to chapters and sections of paper document, MMR user 110 can see the video clipping of the variation of the desktop that the MMR computing machine 112 that occurs is shown when those chapters and sections are visible.In addition, desktop video reminder feature 428 can be used for other multimedia that retrieval MMR computing machine 112 records, for example audio frequency that presents of MMR computing machine 112 on every side.
The webpage of webpage reminder feature 430 for reminding MMR user 110 on his/her MMR computing machine 112, to be checked.For example, by wave the camera lens of acquisition equipment 106 at paper document, MMR user 110 can see the trace of the webpage of being checked when the corresponding chapters and sections of document are shown on the desktop of MMR computing machine 112.Can be in the browser as SD browser 312,314, perhaps the display 212 at acquisition equipment 106 illustrates webpage.Alternately, webpage is presented on as original URL on the display 212 of acquisition equipment 106 or on the MMR computing machine 112.
Physics history log 432 is present in, for example, and in the document event database 320.Physics history log 432 is the physics history log of paper document.For example, MMR user 110 points to paper document with his/her acquisition equipment 106, and by using institute's canned data in the physics history log 432, can determine other document that interested document was adjacent sometime with the past.For example, the similar tracker of RFID can convenient this operation.In this situation, acquisition equipment 106 comprises RFID reader 244.
Complete form is consulted the software application for the information of improving form that device parts 434 obtain for retrieval is previous.For example, MMR user 110 points to blank form (for example, the medical claim form of printing from the website) with his/her acquisition equipment 106, and provides the history of the information of before inputting.Subsequently, consult this previous this form of information automatic filling of inputting of device parts 434 usefulness by this complete form.
Time transfer unit 436 is the source file of the version search file past and future, and retrieves and show the software application of a row event that is associated with those versions.This operation compensation printed document at hand may be from most important external event associated therewith (as, discuss or meeting) afterwards a document creating of several months and the fact that produces.
The position informs that parts 438 are the software application of the known paper document in management position.For example, the similar tracker of RFID facilitates the management of the paper document of location aware.For example, acquisition equipment is caught the trace in MMR user 110 geographic position 106 all day, and scans the RFID label that is attached to document or comprises the file of document.The RFID scan operation is carried out by the RFID reader 244 of acquisition equipment 106, to survey any RFID label in its scope.Can be by the identifier of each cell tower in the cellular infrastructure 132, perhaps alternately, via with the GPS device 242 of the acquisition equipment 106 of geographic position mechanism 142 combinations, the geographic position of following the tracks of MMR user 110.Alternately, can finish document recognition with the video camera 232 of " video of always opening " or acquisition equipment 106.Position data provides " geo-location reference " document, and it enables to illustrate the interface that document is positioned at map-based where all day.An application may be the lawyer who carries file visit remote client.In alternative embodiment, document 118 comprises when mobile document and perception mechanism attached to it that can perception when carrying out some preliminary face detection operations.Perceptional function is via the one group of gyrostat that is attached to paper document or similarly installs.Position-based information, the 100b of MMR system indication is " calling " possessory cellular phone when, to tell him/her document is just mobile.Cellular phone can be added into that document its virtual briefcase.In addition, this is the concept of " invisible " bar code, and it is that video camera 232 or the digital camera 234 of acquisition equipment 106 is visible, but is sightless or very faint machine readable mark for the people.Can consider can be decoded on acquisition equipment 106 various black mark and secret writing or, the print image digital watermark is determined the position.
PC creation parts 440 are on PC, as carry out the software application of creation operation at MMR computing machine 112.PC creation parts 440 are as existing creation application program, such as Microsoft Word, PowerPoint and webpage creation bag, plug-in unit and provide.PC creation parts 440 allow MMR users 110 prepare to have with from the event of his/her MMR computing machine 112 link or with his/her environment in the paper document that links of event; Allow automatically to produce the paper document with link, for example automatically linked to the printed document 118 that produces its Word file from it; Perhaps allow MMR user's 110 retrieval Word files, and give other people with it.Be called the MMR document at this paper document that will have link.Further describe the more details of MMR document with reference to figure 5.
The software application that document production parts 442 operate for the creation of carrying out existing document.Can, for example, perhaps as individual's version, perhaps realize document production parts 442 as enterprise version.In individual version, MMR user's 110 scanned documents also are added into MMR document database (for example, the document event database 320) with them.In enterprise version, (perhaps soft-proof originally) creates the MMR document from the original electron source in publisher (perhaps third party).This function can be embedded into high-end issue bag and (for example, AdobeReader) and with background service that another entity provides be connected.Acquisition equipment creation parts 444 are the software application of directly carrying out the creation operation at acquisition equipment 106.Use acquisition equipment creation parts 444, MMR user 110 extracts key phrase from his/her paper document at hand, and the other content of this key phrase and Dynamical capture is together stored, to create interim MMR document.In addition, by using acquisition equipment creation parts 444, MMR user 110 can be back to his/her MMR computing machine 112, and the interim MMR profile download that he/her is created is to existing document application program, such as PowerPoint, then its editor is become other type of the document of the final version of MMR document or Another application program.In doing so, image and text automatically can be inserted in the page of existing document, in the page that inserts the PowerPoint document.
(automatically, no user intervene ground) is uploaded to printed document the software application of acquisition equipment 106 to unconscious upload component 446 for unconsciously.Because most of time acquisition equipment 106 all be MMR user 110 all, comprise when MMR user 110 is on his/her MMR computing machine 112, except document being sent to the printer 116, in conjunction with Wi-Fi technology 134 or Bluetooth technology 136, wireless communication link 218 via acquisition equipment 106, if perhaps acquisition equipment 106 is connected/docks with MMR computing machine 112, then by wired connection, printer driver 316 also can the document that those are identical be pressed into the memory storage 216 of acquisition equipment 106.By this way, after printed document, MMR user 110 never can forget and picks the document, because it automatically is uploaded to acquisition equipment 106.Documentation release searching part 448 is the past of the given source file 310 of retrieval and the software application of version in future.For example, MMR user 110 points to printed documents with acquisition equipment 106, and then the current source file 310 (for example, Word file) in documentation release searching part 448 location and source file 310 other in the past and future version.In a particular embodiment, this operation is used and is followed the tracks of the Windows file tracking software that source file 310 is copied and is displaced into this position.Equally here also can use other such file to follow the tracks of software.For example, the word can be enough selected from source file 310 of Google WDS or Microsoft Windows search assistant and the current version of file is found in the inquiry that forms.
PC document metadata parts 450 are the software application of the metadata of search file.For example, MMR user 110 points to printed documents with acquisition equipment 106, and PC document metadata parts 450 determine that who has printed the document, the document of when printing, and document is printed wherein, and at the file path of printing given source file 310 constantly.
Acquisition equipment UI parts 452 are the software application of the operation of the UI of management acquisition equipment 106, and it allows MMR user 110 and paper document reciprocation.Acquisition equipment UI parts 452 allow MMR user 110 from existing document reading out data with the combination of acquisition equipment UI 224, and data are write existing document, check and with the associated augmented reality reciprocation of those documents (namely, by acquisition equipment 106, MMR user 110 can check when creating document or what occur during at Edit Document), and check and the augmented reality reciprocation associated with the document that shows at his/her acquisition equipment 106.
Specific area parts 454 are the software application of management specific area function.For example, in music application, specific area parts 454 be via, for example, the phonographic recorder 236 of acquisition equipment 106 is with the software application of the music that detects and title, artist or composer's coupling.By this way, can with interested project, such as the sheet music relevant with the music of surveying or music CD, present to MMR user 110.Similarly, specific area parts 454 are adapted to operate in the similar mode about video content, video-game and any entertainment information.Specific area parts 454 also can be adapted to the electronic version of any mass medium content.
Continuation notices that with reference to figure 3 and 4 software part of MMR software suite 222 can be present on one or more MMR computing machines 112, network medium server 114, ISP's server 122 and the acquisition equipment 106 of the 100b of MMR system completely or partially.In other words, can be with the operation of the 100b of MMR system, the for example performed any operation of MMR software suite 222 is distributed in any user-defined configuration between MMR computing machine 112, network medium server 114, ISP's server 122 and the acquisition equipment 106 (perhaps among the 100b of system included other such processing environment).According to this open invention, will clearly, can with the software part of MMR software suite 222 some in conjunction with and the basic function of the execution MMR 100a/100b of system.For example, the basic function of the embodiment of the 100a/100b of MMR system comprises:
Create or increase and comprise first medium part and second medium MMR document partly;
Information in first medium part (for example, paper document) the access second medium part of use MMR document;
Use first medium part (for example, the paper document) triggering of MMR document or the process in the startup electronic applications;
Use the first medium part (for example, paper document) of MMR document to create or increase second medium part;
Use the second medium of MMR document partly to create or increase the first medium part;
Use the second medium of MMR document partly to trigger or start process in the electronic applications or relevant with the first medium part;
The MMR document
Fig. 5 illustrates the diagram of MMR document 500 according to an embodiment of the invention.More clearly, Fig. 5 illustrates the MMR document 500 of the electronic representation 508 of expression 502, action or second medium 504, index or the focus 506 of a part that comprises printed document 118 and whole document 118.Although typically MMR document 500 is stored in document event database 320, also it can be stored in acquisition equipment or be connected to network 128 any other the device in.In one embodiment, a plurality of MMR documents can be corresponding to printed document.In another embodiment, the structure shown in the reconstructed chart 5 is to create a plurality of focuses 506 in single printed document.In a particular embodiment, MMR document 500 comprises expression 502 and the focus 506 with the position in the page and the page; Second medium 504 and electronic representation 508 are optional and as drawing by dotted line.Notice, if need in this way, can add after a while second medium 504 and electronic representation 508 creating the MMR document.This basic embodiment can be used for locating document or corresponding to the specific position in the document of expression.
The expression 502 of the part of printed document 118 can be that any form (image, vector, pixel, text, code, etc.) that is applicable to pattern match and identifies at least one position in the document exists.Expression 502 positions of preferably identifying uniquely in the printed document.In one embodiment, expression 502 is text fingerprints as shown in Figure 5.During printing, automatically catch text fingerprints 502 via PD trapping module 318, and it is stored in the PD index 322.Alternately, during scan operation, automatically catch text fingerprints 502 via the document fingerprint matching module 226 ' of document scanner 127, and it is stored in the PD index 322.If but it is the unique example in the document, the some of image, unique attribute or any other expression of document compatible portion, represent that then 502 alternately can be whole document, text fragment, word.Action or second medium 504 are preferably the data structure of digital document or any type.Second medium 504 among the most basic embodiment can be one or more orders that the text that will present maybe will be carried out.Second medium type 504 more typically for relevant text, audio file or the video file of a part by expression 502 documents of identifying.Second medium type 504 may be note or comprise a plurality of different medium types, and the data structure of a plurality of files of same type or file.For example, second medium 504 can be text, order, image, pdf document, video file, audio file, application file (as, spreadsheet or word processing file), etc.Index or focus 506 is linking between expression 502 and action or the second medium 504.Focus 506 makes expression 502 related with second medium 504.In one embodiment, index or focus 506 comprise as the x in the document and the positional information the y coordinate.Focus 506 may be point, zone or even whole document.In one embodiment, focus is the pointer with expression 502, the pointer of second medium 504 and the data structure of the position in the document.Should be understood that MMR document 500 may have a plurality of focuses 506, and in such circumstances, data structure creates the link between a plurality of positions in a plurality of expressions, a plurality of second medium file and the printed document 118.
In alternative embodiment, MMR document 500 comprises the electronic representation 508 of whole document 118.This electronic representation can be used to determine the position of focus 506, and also can be used for display document on acquisition equipment 106 or MMR computing machine 112 by user interface.
The exemplary use of MMR document 500 is as follows.By analyzing text fingerprints or representing 502, identify the text fragments of catching via the document fingerprint matching module 226 of acquisition equipment 106.For example, MMR user 110 points to printed document 118 with video camera 232 or the digital camera 234 of his/her acquisition equipment 106, and catches image.Subsequently, document fingerprint matching module 226 is carried out it at the image of catching and is analyzed, to determine whether there are the clauses and subclauses that are associated in the PD index 322.If find occurrence, on the display 212 of his/her acquisition equipment 106, be the existence of MMR user's 110 highlighted demonstration focuses 506.As shown in Figure 5, highlighted demonstration word or expression.Each focus 506 in the printed document 118 are as to other user-defined or predetermined data, as are present in of MM file 336 on the network medium server 114, link.The text fingerprints of storing in the PD index 322 or represent that 502 access allows electronic data is added into any focus 506 in any MMR document 500 or the document.As described with reference to figure 4, comprise at least one focus 506 (as, the link) paper document be called as MMR document 500.
Continuation with reference to Figure 1B, 2A until 2D, 3,4 and the exemplary operation of the 100b of 5, MMR system as follows.MMR user 110 or any other entity, for example publishing house opens given source file 310 and starts printing, to produce paper document, such as printed document 118.During printing, automatically perform some action, as: (1) via PD trapping module 318, is automatically caught print format, and it is passed to acquisition equipment 106 printing constantly.Be positioned at by use, for example, the PD trapping module 318 of output place of SD browser 312 is being printed the electronic representation 508 of automatically catching document constantly.For example, the content that MMR user 110 prints from SD browser 312, and this content filters PD trapping module 318.As discussed previously, when showing document for printing, can determine the two-dimensional arrangement of the text on the page; (2) printing constantly, via PD trapping module 318, automatically catch given source file 310; And (3) maybe can increase other interesting information that multimedia on the acquisition equipment 106 is explained interface in order to locate " entity of appointment ", via document analysis device module 326, analyzes print format and/or source file 310.The entity of appointment is, for example, is used for adding multimedia " anchor " after a while, that is, and the focus 506 that automatically generates.Document analysis device module 326 receives the incoming source document 310 relevant with given printed document 118.Document analysis device module 326 is to identify the application program of the expression 502 of using with focus 506 in the document 118, for example, and title, author, time or position, and thereby, point out the information that will receive at acquisition equipment 106; (4) automatically give print format and/or source file 310 for content-based retrieval and index, that is, set up PD index 322; (5) in document event database 320, make clauses and subclauses about document and the event that is associated with source file 310, for example, edit history and current location; And (6) at printer driver 316 interior execution interactive sessions, and it allows MMR user 110 before printing focus 506 described focus 506 to be added into document, and thereby form MMR document 500.Be stored in the data that are associated on the MMR computing machine 112 or be uploaded to network medium server 114.
Exemplary alternate embodiments
MMR system 100 (100a or 100b) is not limited to the configuration shown in Figure 1A-1B, 2A-2D and the 3-5.MMR software can be allocated between acquisition equipment 106 and the MMR computing machine 112 whole or in part, and need to be far fewer than above with reference to figure 3 and 4 described all modules.A plurality of configurations all are possible, comprise as follows:
The first alternative embodiment of MMR system 100 comprises acquisition equipment 106 and acquisition equipment software.Acquisition equipment software be acquisition equipment UI 224 and document fingerprint matching module 226 (as, shown in Fig. 3).On acquisition equipment 106, perhaps alternately, on the external server as network medium server 114 or ISP's server 122 of addressable acquisition equipment 106, carry out acquisition equipment software.In this embodiment, can utilize the network service that the data that are connected to publication are provided.Can use graduate identifying schemes, at first identify therein publication, and then identify the page and chapters and sections in the publication.
The second alternative embodiment of MMR system 100 comprises that acquisition equipment 106, acquisition equipment software and document use software.As shown with reference to figure 4 and describe, the second alternative embodiment comprises and catches and index to printed document, and connects basic document event, such as the edit history of document, software.This allows MMR user 110 that his/her acquisition equipment 106 is pointed to any printed document, and determines name and the position of the source file 310 of generation the document, and definite when and where of printing.
The 3rd alternative embodiment of MMR system 100 comprises that acquisition equipment 106, acquisition equipment software, document use software and event capturing module 324.Event capturing module 324 is added into MMR computing machine 112, the event that this computer capture is associated with document, for example time when they are visible on the desktop of MMR computing machine 112 (determining by monitoring GDI character line generator), the URL that when document is opened, accesses or the character of when document is opened, keying at keyboard.The 4th alternative embodiment of MMR system 100 comprises acquisition equipment 106, acquisition equipment software and printer 116.In this 4th alternative embodiment, the similar communication linkage that printer 116 is equipped with bluetooth transceiver or communicates by letter with near any MMR user 110 the acquisition equipment 106 being in it.No matter when any MMR user 110 picks up document from printer 116, and printer 116 is pressed into MMR data (document layout and multimedia clips) that user's acquisition equipment 106.For the multi-medium data that obtains to be associated with specific document, user's printer 116 comprises keypad, passes through its user's login and input code.The printing that the document can be included in the code of its footer represents that it can insert by printer driver 316.
The 5th alternative embodiment of MMR system 100 comprises acquisition equipment 106, acquisition equipment software and office's entrance 120.Office's inlet device is preferably the individualized version of office's entrance 120.Office's entrance 120 is caught the event in the office, such as session, talks/call and meeting.Specific paper document on 120 identifications of office's entrance and the tracking physics desktop.Office's entrance 120 other perform document identification softwares (that is, document fingerprint matching module 226 and main frame document event database 320).This 5th alternative embodiment can be used for from MMR computing machine 112 unloading computational workload, and provide the facilitated method that the 100b of MMR system packing is become consumer devices (software product of for example, the 100b of MMR system being carried out as hardware with at the mini computing machine of the Mac of Apple Computer and sell).
The 6th alternative embodiment of MMR system 100 comprises acquisition equipment 106, acquisition equipment software and network medium server 114.In this embodiment, multi-medium data is present in network medium server 114, for example the Comcast ordering server.When MMR user 110 passes through to use his/her acquisition equipment 106 scanned document text fragment, with consequent look-up command or transfer to the set-top box 126 that is associated with MMR user 110 CATV (cable television) (cable TV) and (pass through the Internet, wirelessly, perhaps by beeper top box 126 on the phone), perhaps transfer to the Comcast server.In two kinds of situations, multimedia is all from the Comcast server flows to set-top box 126.System 100 knows and whither sends data, because MMR user 110 had before registered his/her phone.Thereby, acquisition equipment 106 can be used for access and the control of set-top box 126.
The 7th alternative embodiment of MMR system 100 comprises acquisition equipment 106, acquisition equipment software, network medium server 114 and location-based service.In this embodiment, the location aware service is distinguished between a plurality of destinations from the output of Comcast system (or other suitable communication system).Perhaps by automatic identification cell phone towers ID, perhaps will show the keypad interface of the position of data by allowing MMR user 110 to select, carry out this function.Thereby when visit during another position, as long as that other position has access in radio, the user just can access program and other wired TV feature that their cable television operators provide.
Document fingerprint matching (" identification of image-based fragment ")
As described earlier, the document fingerprint matching relates to a part or " fragment " of identifying uniquely the MMR document.With reference to figure 6, document fingerprint matching module/system 610 receives the image 612 of catching.Then document finger print matching system 610 inquires about the page set in the document database 3400 (for example, hereinafter with reference Figure 34 A further describes), and returns a row page and the document that comprises them, comprises the image 612 of catching in it.Each result is the x-y position of the input picture 612 that occurs catching.Those skilled in the art will notice that outside that database 3400 can be in document fingerprint matching module 610 (for example, as shown in Figure 6), but (for example also can be in the inside of document fingerprint matching module 610, as Fig. 7,11,12,14,20,24,26,28 and 30-32 as shown in, wherein document fingerprint matching module 610 comprises database 3400).
Fig. 7 illustrates the according to an embodiment of the invention block diagram of document finger print matching system 610.Acquisition equipment 106 is caught image.The image of catching is sent to quality assessment modules 712, and it is based on needs and the ability of downstream, effectively carries out the preliminary judgement about the content of the image of catching.For example, if the image of catching is quality so, so that can not process it in downstream document finger print matching system 610, then quality assessment modules 712 impels acquisition equipment 106 with higher resolution recapture image.In addition, for example, quality assessment modules 712 can be surveyed many other relevant features of the image of catching, the sharpness of the text that comprises in the image of for example catching, and it is the whether indication of " focusing " of the image of catching.In addition, quality assessment modules 712 can be determined whether the image of catching comprises and may be the something of the part of document.For example, the images fragment indicating user that comprises non-file and picture (for example, desk, outdoor scene) just with the new perspective of acquisition equipment 106 to new document.
In addition, in one or more embodiments, quality assessment modules 712 can be distinguished by execution contexts/non-text, consequently only by comprising the image of discernible text.Fig. 8 illustrates the flow process that text/non-text is distinguished according to one or more embodiment.Extract many row pixels in step 810 from the input picture fragment.Typically, input picture is gray-scale map, and each value in the row is the integer (for 8 pixels) from zero to 255.In step 812, survey the local peaking in every row.This can carry out with the method for common " moving window " be familiar with, and the window of regular length (for example, the N pixel) slides along row in the method, each M pixel, wherein M<N.In each step, determine the existence of peak value by the marked difference (for example, greater than 40) of seeking grey level's value.If peak value is positioned at a position of window, then no matter when moving window and this position crossover all suppress the detection of other peak value.Also can survey gap between the continuous peak value in step 812.Step 812 is applicable to the many row (C) in the images fragment, and in step 814 with the cumulative gap width of histogram.
Other histogram of deriving in the training data with gap histogram and the known classification (in step 816) of storing from have database 818 is compared, and exports together the measurement of the degree of confidence of the decision of classification (perhaps text or non-text) of relevant fragment and that decision.The histogrammic typical outward appearance that the histogram classification consideration of step 816 is derived from the image of text, and it comprises two closely peaks, center places the distance between the row upper, and is wherein may integral multiple high away from other one or two less peak at those peaks in histogram.With the measurement of statistical variance, this classification can be determined histogrammic shape, and perhaps it can use range observation, and for example, Hamming or Euclidean distance are compared histogram one by one with the prototype of storing.
Equally with reference to figure 9, it illustrates the example that text/non-text is distinguished now.Process input picture 910, with many row of sampling, indicate its subset with dotted line.Grey level's histogram of typical row 912 shown in 914.Y value is the grey level in 910, and the X value is the row in 910.The gap of detecting between the peak value shown in 916 in the histogram.The histogram of the gap width that is listed as from all samplings shown in 918.This example is illustrated the histogrammic shape that derives from the fragment that comprises text.
The flow process of point size that is used for the text of estimated image fragment shown in Figure 10.This flow process utilizes the blur level of image to be inversely proportional to acquisition equipment from the fact of the distance of the page.By the ambiguous estimation amount, can estimated distance, and that distance can be used for, with respect to known " standardized " highly, with the scaled of the object in the image.This behavior can be used for estimating the point size of the text in the new image.
In the training stage 1010, in step 1012, the image capture apparatus that is used in known distance obtains to have the image (being called " calibration " image) of fragment of the text of known font and point size.The height of the text character in that image that step 1014 measurement is expressed with many pixels.For example, this can manually carry out with the imagery annotation instrument as Microsoft's photo editor.Blur level in step 1016 estimation calibration chart picture.For example, this can measure with the frequency spectrum cut-off of known two-dimensional fast fourier transform and carry out.This also can unit formal representation be many pixels 1020.
When presenting " new " image in step 1024, in time of running MMR recognition system, process image in step 1026, to cut apart and Character segmentation method localization of text with the row that around each character, produces bounding box of usually knowing.Can express with pixel the height of those square frames.In step 1028, with step 1016 similarly mode estimate the blur level of new images.In conjunction with these measurements, estimate 1032 with first of the point size that produces each character (perhaps equally, every row) in step 1030.This can be undertaken by calculating down to establish an equation: (calibrating the fuzzy size of image blurring size/new images) * (new images text height/calibration image text height) * (calibration chart is as font weight).This determines the point size of the text in the calibration chart picture in proportion, to produce the point size of the text in the input picture fragment through estimating.Identical scale function can be applied to the height of the bounding box of each character.This produces the decision about each character in the fragment.For example, if fragment comprises 50 characters, then this process will produce 50 votings about the point size of the font in the fragment.Then can derive with the intermediate value of this voting the single estimation about point size.
In addition, more clearly return with reference to figure 7, in one or more embodiments, quality assessment modules 712 to the feedback of acquisition equipment 106 can be conducted to the user interface (UI) of acquisition equipment 106.For example, feedback may comprise the indication that exists with sound or vibration mode, and it is indicated the image of catching to comprise and looks like text but ambiguous something, and indicating user should make acquisition equipment 106 firm.Feedback may also comprise the parameter of the optical devices that change acquisition equipment 106, with the order of the quality that improves the image caught.For example, can focus, F f-stop and/or time shutter, so that improve the quality of the image of catching.
In addition, by the needs of employed special feature extraction algorithm, can make quality assessment modules 712 to the feedback specialization of acquisition equipment 106.As described further below, feature extraction becomes symbolic representation with image transitions.In the recognition system of the length of calculating word, making catch image blurring may be very desirable for the optical devices of acquisition equipment 106.Although those skilled in the art will notice that such adjusting may produce the mankind or optical character identification (OCR) process is perhaps unrecognizable, be suitable for well the image of Feature Extraction Technology.By with instruction feedback to acquisition equipment 106, impel acquisition equipment 106 that its camera lens is defocused, thereby and produce fuzzy image, quality assessment modules 712 can realize this point.
By control structure 714 change feedback procedures.Generally speaking, control structure 714 other parts receive data and symbolic information from document finger print matching system 610.Control structure 714 determines the execution sequence of the various steps in the document finger print matching system 610, and can make the computational load optimization.The x-y position of the images fragment that control structure 714 identifications are received.More specifically, the information of the parameter of the needs of control structure 714 reception features relevant leaching process, the result of quality assessment modules 712 and acquisition equipment 106, and can suitably change them.This can dynamically carry out on a frame connects the basis of a frame.Among joining in the system that uses a plurality of feature extracting methods, the blurred picture that may need the large fragment of text, and another may need the high resolving power sharp focus figure of paper texture.In such circumstances, control structure 714 can send a command to quality assessment modules 712, indicates it to work as when having text in its visual field, produces suitable picture quality.Quality assessment modules 712 and acquisition equipment 106 reciprocations are to produce correct image (for example, N blurred picture of large fragment, M image of sharp focus paper texture (high resolving power) thereafter).The progress that control structure 714 is followed the tracks of by those images of processing pipeline is to guarantee to have used corresponding feature extraction and classification.
Based on the needs of recognition system, the quality of image processing module 716 change input pictures.The example of the type of image change comprises sharpening, offset correction and binarization.Such algorithm comprises as mask size, the rotation of expectation and the many adjustable parameter the threshold value.
As shown in Figure 7, the feedback that document finger print matching system 610 uses from feature extraction and sort module 718,720 (hereinafter described) is dynamically to change the parameter of image processing module 716.Feasible like this, because the same position that the user typically can continuous several seconds point to their acquisition equipment 106 in the document.For example, suppose that acquisition equipment 106 per seconds process 30 frames, the frame of then how processing after a while to be caught with the possibility of result impact of the initial several frames of any sequential processes.Characteristic extracting module 718 becomes symbolic representation with the image transitions of catching.In an example, characteristic extracting module 718 location words, and calculate their bounding box.In another example, characteristic extracting module 718 is located the parts that are linked togather, and calculates the descriptor of their shape.In addition, in one or more embodiments, document finger print matching system 610 is shared the result's of features relevant extraction metadata with control structure 714, and uses that metadata to regulate the parameter of other system unit.Those skilled in the art will notice that this may reduce computation requirement significantly by suppressing the identification of poor qualitative data, and improve accuracy.For example, the characteristic extracting module 718 of identified word bounding box can be told the quantity of control structure 714 its row that find and " word ".If the quantity of word too high (for example, the indication input picture is segment), then control structure 714 can indicate quality assessment modules 712 to produce fuzzyyer image.Then quality assessment modules 712 can be sent to suitable signal acquisition equipment 106.Alternately, control structure 714 can order image processing module 716 to use smoothing filter.
Sort module 720 will be described x in those pages that one or more pages that be for conversion in the document and input picture fragment occur, the identification of y position from the feature of characteristic extracting module 718.As describing successively, depend on from the feedback of database 3400 and carry out this identification.In addition, in one or more embodiments, confidence value can be associated with each decision.Document finger print matching system 610 can use such decision to determine the parameter of other parts in the system.For example, control structure 714 can determine whether a degree of confidence of two decisions is close to each other, whether should change the parameter of image processing algorithm.This may cause increasing the scope of the size of median filter, with and as a result following current transporting to remaining parts.
In addition, as shown in Figure 7, between sort module 720 and database 3400, can there be feedback.In addition, those skilled in the art will remember that database 3400 can be in the outside of module 610 as shown in Figure 6.Can use the conforming decision of relevant fragment, about having other fragment of similar outward appearance, and Query Database 3400.This will compare the perfect view data of the fragment stored in the database 3400 with other image in the database 3400, rather than the input picture fragment is compared with database 3400.This can provide the other affirmation level about the decision of sort module 720, and can allow some pre-service of matched data.
Also can be at fragment, and be not only symbolic representation on the view data, carry out database relatively.For example, best decision possibility indicating image fragment comprises No. 12 Arial fonts of double pitch.The database comparison can be located the fragment in other document with similar font spacing, and only uses text meta-data, rather than image ratio, locates the word layout.
Database 3400 can be supported the content-based inquiry of several types.Sort module 720 can pass to database 3400 feature placement, and receives the x-y position of a row document and that layout appearance.For example, feature may be or the trigram of word length level or vertical.Can tissue database 3400, return a row result to respond every type inquiry.Sort module 720 or control structure 714 can in conjunction with those ranking compositors, determine to produce the single row through screening.
In addition, database 3400, sort module 720, and control structure 714 between can have feedback.Enough from the information of eigenvector recognizing site except storing, database 3400 can be stored the original image that comprises document, with and the relevant information of the symbolic representation of graphics part.This allows control structure 714 dynamically to change the behavior of other system unit.For example, if exist two kinds to seem possible decision about given images fragment, then database 3400 may be indicated the existence about image, by dwindle and checks on the right of the zone, can eliminate their ambiguity.Control structure 714 can send suitable message to acquisition equipment 106, indicates it to dwindle.Characteristic extracting module 718 and sort module 720 can be about the right of the image inspection image printed on the document.
In addition, notice that the hypothesis fragment is arranged in document exactly, then database 3400 storage is about the details around the data of images fragment.This can be used for further triggering unexpected hardware and software image analysis step in the prior art.In a situation, provide that detailed information by the printing capture systems of the detailed denotational description of preserving document.In one or more other embodiment, by scanned document, can obtain similar information.Still with reference to figure 7, tracking module 724 receptions in position are from the conforming information about images fragment of control structure 714.Position tracking module 724 uses it from the copy of the whole document file page of database 3400 retrievals or the data structure of description document.Reference position is the anchor that the position tracing process begins.When quality assessment modules 712 determines that the image of catching is fit to follow the tracks of, the view data that position tracking module 724 receives from acquisition equipment 106.Position tracking module 724 also has the information about the time that has passed since successfully having identified previous frame.Position tracking module 724 is used optic flow technique, and it allows it to estimate the distance that has moved at acquisition equipment on the document 106 between continuous frame.The sampling rate of given acquisition equipment 106 even the data that it is seen may be unrecognizable, also can be estimated its target.By the comparison of its view data with the respective image data that from database document, derives, can confirm the estimated position of acquisition equipment 106.Simple example calculates the image of catching and the crossing dependency of the desired image in the database 3400.
Thereby position tracking module 724 provides the mutual use of database images, with the process of guide position track algorithm.This permission is attached to non-text object with the electronic reciprocal effect, such as figure and image.In addition, in one or more other embodiment, can do not exist image ratio as described above/confirm to realize such depending in the situation of step.In other words, move by the moment of estimating the acquisition equipment 106 on the page, can estimate be in the electronic link in the visual field that is independent of the image of catching.
Figure 11 illustrates according to an embodiment of the invention document fingerprint matching technology." feedforward " technology shown in Figure 11 is processed each fragment independently.Extract feature the images fragment of its x-y position from those pages that are used to locate one or more pages and fragment and occur.For example, in one or more embodiments, the feature extraction of document fingerprint matching may depend on the horizontal and vertical stack features (for example, word, character, piece) of the image of catching.Then can search the document (with the fragment in those documents) that comprises the feature of extracting with the extraction feature of these groups.Can identify horizontal word pair in the image of catching with the OCR function.Then use the horizontal word of each identification to forming the search inquiry of database 3400, be used for determining to comprise all right documents of horizontal word of identifying, and the right x-y position of the word in those documents.For example, to " the, cat ", database 3400 may return (15, x, y) for horizontal word, and (20, x, y) indicate horizontal word that " the, cat " appeared at x-y position indicated in document 15 and 20.Similarly, for each vertical adjacent word pair, about all documents of comprising the right example of word and the right x-y position of word in those documents, Query Database 3400.For example, to " in, hat ", database 3400 may return (15, x, y) for vertical adjacent word, and (7, x, y), the vertical adjacent word of indication appears at x-y position indicated in document 15 and 7 to " in, hat ".Then, the document that usage data storehouse 3400 is returned and positional information, can carry out as to which document from the various horizontal word that extracts the image of catching to and vertical adjacent word between determining maximum position crossovers appears.Response can be determined the existence of which focus and the medium that links, and this may cause identifying the document that comprises the image of catching.
Figure 12 illustrates according to an embodiment of the invention another document fingerprint matching technology." interactive image analysis " technology shown in Figure 12 relates to that image is processed and reciprocation that may be between the feature extraction that occurs before the recognition image fragment.For example, image processing module 716 may at first be estimated the blur level in the input picture.Then, characteristic extracting module 718 is calculated from the distance of the page and the point size of image text.Then, image processing module 716 may use the feature of the font of that point size, carries out the template matches step at image.Subsequently, characteristic extracting module 718 may thereby be extracted character or word feature from the result.In addition, those skilled in the art will recognize that font, point size and feature may be limited by the font in database 3400 documents.
The example of analyzing with reference to the described interactive image of Figure 12 as mentioned shown in Figure 13.Process the input picture fragment in step 1310, the font of the text in the estimated image fragment and point size and its distance from camera.Those skilled in the art will notice that can carry out font with known technology estimates (that is, the identification of the candidate of the font of the text in the fragment).For example, can carry out point size and distance estimations with the described flow process of reference Figure 10.In addition, can use other technology, for example can easily be adapted to the known method of distance of the focal point of acquisition equipment.
Still with reference to Figure 13, use row partitioning algorithm, tectonic boundary frame around its line of text in fragment in step 1312.Using the known technology as convergent-divergent in step 1314 is fixing size with the highly standardized of each row image.Will about the consistance of the font that detects in the image with and point size transmit 1324 to font prototype collection 1322, wherein retrieve the image prototype of the character in the font of each appointment with them.
Font database 1322 can from be used for by operating system and other software application on the custom system of printed document font set (for example, the raster font among TrueType, OpenType or the Microsoft Windows) and construct.In one or more other embodiment, the original image of document that can be from database 3400 produces font set.Database 3400xml file provides the x-y bounding box coordinate that can be used for extracting the prototype figure picture of character from original image.The xml file is correctly identified the title of font and the point size of character.
Based on the function in the employed parameter of step 1314, in step 1320 with the character prototype size criteria in the selected font.Images Classification in step 1316 can be compared the character after the size criteria of exporting in step 1320 with the output of step 1314, produce with each the x-y position in images fragment to determine.About each the character i that detects in the images fragment, i=1...n can use the known method of image template coupling to produce picture (ci, xi, yi, wi, hi) such output, wherein ci is the consistance of character, (xi, yi) be the upper left corner of its bounding box, and hi, wi is its width and height.In step 1318, execution geometric relationship restricting data library lookup that can be as described above, but in a situation, can specific adaptation in character pair, rather than word pair.In such circumstances: " a-b " possibility pointing character a and b are that level is adjacent; " a+b " may indicate them is vertical adjacent; " a/b " may indicate a the southwest of b; And " a b " may indicate a at the southeast of b.Can be from the xi of every pair of character, the yi value derives geometric relationship.Can organize MMR database 3400, thus its return comprise character to rather than the right row document file page of word.Step 1326 is output as the candidate list that is complementary with the input picture that is expressed as the n-tuple (documenti, pagei, xi, yi, actioni, scorei) that sorts by grading system.
Figure 14 illustrates according to an embodiment of the invention another document fingerprint matching technology." producing and test " technology shown in Figure 14 is processed each fragment independently.It extracts feature from images fragment, it is used to locate many page-images that may comprise given images fragment.In addition, in one or more embodiments, can carry out other extraction classifying step, with the possibility that comprises this images fragment by them page be classified.Still with reference to above with reference to Figure 14 described " produce and test " technology, can extract the Characteristic of Image of catching, and comprise the document fragment of feature of these extractions of maximum quantity in can identification database 3400.Then further process initial X the document fragment (" candidate ") with maximum matching characteristics.In this processing, the relative position of the feature in the relative position of the feature in the coupling document fragment candidate and the query image is compared.Relatively calculate scoring based on this.Then, identification is corresponding to the highest scoring of best coupling document fragment P.If then the highest scoring then finds document fragment P, as the coupling to query image greater than adapting to threshold value.Threshold value is adapted to many parameters, comprises, for example, the quantity of the feature of extracting.In database 3400, what known document fragment P come from, and thereby, determine that query image is from identical position.
Figure 15 illustrates the example of word boundary frame probe algorithm.Be illustrated in the image of making rotation correction and process input picture fragment 1510 afterwards.Usually be known as the slant correction algorithm, this class technology is rotated text image, so that it is arranged along transverse axis.In the bounding box probe algorithm next step is the calculating of horizontal projection profile diagram 1512.By this way, by the threshold value that known adaptation threshold value or sliding window algorithm select 1516 row to survey, consequently zone " on threshold value " is corresponding to line of text.1514 and 1518 extract and process the interior zones of every row in a similar fashion, with the zone on threshold value of the word in the location indication row.The example of the bounding box that in a line of text, detects shown in 1520.
In order to compare with the document fragment candidate, can extract various features.For example, can extract yardstick invariant features conversion (SIFT) feature, Corner Feature, salient point, ascender, and descender, word boundary, and interval are so that coupling.One of the feature that can extract from file and picture reliably is word boundary.In case extracted word boundary, they just can form group as shown in Figure 16.In Figure 16, for example, all having the such mode of crossover word boundary forms orthogonal sets down with it thereon with word boundary, and the total quantity of crossover word boundary is 3 (noticing that in one or more other embodiment the minimum number of crossover word boundary may be different) at least.For example, first unique point (second word square frame in the second row, length is 6) has two word boundaries (length is 5 and 7) thereon, and has a word boundary (length is 5) under it.Second unique point (the 4th word square frame in the third line, length is 5) has two word boundaries (length is 4 and 5) thereon, and has two word boundaries (length is 8 and 7) under it.Thereby as shown in Figure 16, with the length of middle word boundary, the length of word boundary on thereafter its, and thereafter the length of its lower word boundary then represent indicated feature.In addition, the length of noticing the word square frame can be based on any module.Thereby for some word square frames, it is possible having alternative length.In such situation, can extract and comprise all or some their features of Res fungibiles.
In addition, in one or more embodiments, can extract feature, with the 0 expression interval of applying, and represent that with 1 word is regional.Example shown in Figure 17.Piece on the right represents the word/interval region corresponding to the document fragment on the left side.
The feature of extracting can be compared with various range observations, comprise, for example, standard and Hamming distance.Alternately, in one or more embodiments, the document fragment that can use Hash table identification to have the feature identical with query image.In case identified such fragment, the angle of calculating from each unique point to further feature point that just can be as shown in Figure 18.Alternately, the angle between can calculated characteristics point group.1802 illustrate the angle 1803,1804 and 1805 that calculates from ternary unique point.Then the angle from each unique point to further feature point in the angle that calculates and the query image can be compared.If any angle of match point is similar, then then can increase the similarity scoring.Alternately, if the use angle group, and if similar on the angle group numeral between similar group the unique point in two images, then then increase the similarity scoring.In case between query image, calculated the scoring of each search file fragment, just select to cause the document fragment of the highest scoring, and with its with adapt to threshold, whether satisfy some predetermined standards to determine coupling.If satisfy standard, then then coupling document path has been found in indication.
In addition, in one or more embodiments, the feature of extracting can be based on the length of word.Based on word height and width, each word is divided into estimated letter.When scanning on given word and under widow the time, according to it on its under row in interval information, binary value is divided each that task estimated letter.Then represent binary code with the integer number.For example, with reference to Figure 19, it illustrates each layout that represents the word square frame of a word that detects in the image of catching.Word 1910 is divided into estimated letter.With the length of (i) word 1910, (ii) text of the row on the word 1910 is arranged, and (iii) text of the row under the word 1910 is arranged, describes this feature.Length with the takeoff word 1910 of estimated letter.On current estimated letter or under the binary coding of interval information extract the text placement information.In word 1910, only last estimated letter is on the interval; The second and the 3rd estimated letter is under the interval.In addition, be (6,100111,111110) with the feature coding of word 1910, wherein 0 expression interval, and 1 expression continuously every.Rewrite with integer form, word 1910 is encoded to (6,39,62).
Figure 20 illustrates according to an embodiment of the invention another document fingerprint matching technology.By they are classified independently, and with as a result combination, the complementary information that " a plurality of sorter " technology shown in Figure 20 utilizes different characteristic to describe.The example that is applied to this model of text fragments matching is to extract the adjacent right length of word of horizontal and vertical, and distinguishes the ranking compositor of fragment in the computational data storehouse.More specifically, for example, in one or more embodiments, by the position of " sorter " appurtenances by sort module 720 and definite feature.Use the combination of the sorter of the horizontal and vertical feature that is used for definite image of catching, to the image adding fingerprint of catching.This consider the image of text comprise two independently information source carry out as its conforming observation, except the video sequence of word, also can use the vertical layout identification of word to extract the document of image from it.For example, as shown in Figure 21, by horizontal classification device 2112 and vertical classification device 2114, with image 2110 classification of catching.Except the input image of catching, sorter 2112,2114 each from database 3400, obtain information, can use the ranking compositor of those document file pages of each classification with output successively.In other words, the multi-categorizer utilization horizontal and vertical feature independence ground shown in Figure 21 is with the Images Classification of catching.Then according to combination algorithm 2118 in conjunction with a graduate row document file page (hereinafter further describing example), it exports a graduate row document file page successively, this tabulation be based on the image 2110 of catching the horizontal and vertical feature both.Especially, in one or more embodiments, use the feature that detects in the Copyright Law About Databases 3400 how to work in coordination with the information of generation in conjunction with minute other ranking compositor from horizontal classification device 2112 and vertical classification device 2114.
Equally with reference to Figure 22, it illustrates about feature extraction now, the example how vertical layout is combined with horizontal layout.In (a), the image of catching 2200 of cutting apart with word is shown.From the image 2200 of catching, determine horizontal and vertical " n-grams "." n-gram " describes the sequence of n quantity of some characteristic quantities for each.For example, the quantity of the character in each word of three words of horizontal trigram specified level sequence.For example, for the image 2200 of catching, (b) horizontal trigram: 5-8-7 (being the quantity of the character in each of the word " upper " flatly arranged in the first row of the image 2200 of catching, " division " and " courses ") is shown; 7-3-5 (being the quantity of the character in each of the word " Project " flatly arranged in the second row of the image 2200 of catching, " has " and " begun "); 3-5-3 (being the quantity of the character in each of the word " has " flatly arranged in the second row of the image 2200 of catching, " begun " and " The "); 3-3-6 (being the quantity of the character in each of the word " 461 " flatly arranged in the third line of the image 2200 of catching, " and " and " permit "); And 3-6-8 (being the quantity of the character in each of the word " and " flatly arranged in the third line of the image 2200 of catching, " permit " and " projects ").Vertical trigram specify on the given word and under the quantity of character in each word of word of homeotropic alignment.For example, for the image 2200 of catching, (c) vertical trigram: 5-7-3 (for the quantity of the character in each of the word " upper " vertically arranged, " Project " and " 461 ") is shown; 8-7-3 (being the quantity of the character in each of the word " division " vertically arranged, " Project " and " 461 "); 8-3-3 (being the quantity of the character in each of the word " division " vertically arranged, " has " and " and "); 8-3-6 (being the quantity of the character in each of the word " division " vertically arranged, " has " and " permit "); 8-5-6 (being the quantity of the character in each of the word " division " vertically arranged, " begun " and " permit "); 8-5-8 (being the quantity of the character in each of the word " division " vertically arranged, " begun " and " projects "); 7-5-6 (being the quantity of the character in each of the word " courses " vertically arranged, " begun " and " permit "); 7-5-8 (being the quantity of the character in each of the word " courses " vertically arranged, " begun " and " projects "); 7-3-8 (being the quantity of the character in each of the word " courses " vertically arranged, " The " and " projects "); 7-3-7 (being the quantity of the character in each of the word " Project " vertically arranged, " 461 " and " student "); And 3-3-7 (being the quantity of the character in each of the word " has " vertically arranged, " and " and " student ").
Based on determined horizontal and vertical trigram from the image of catching 2200 shown in Figure 22, produce each (d) and the lists of documents (e) of document that indication comprises the horizontal and vertical trigram.For example, in (d), horizontal trigram 7-3-5 appears in the document 15,22 and 134.In addition, for example, in (e), vertical trigram 7-5-6 appears in document 15 and 17.Use (d) and lists of documents (e), at (f) with the graduate tabulation of all related documents is shown respectively (g).For example, in (f), five horizontal trigrams in (d) all relate to document 15, and (d) only a horizontal trigram relate to document 9.In addition, for example, in (g), 11 vertical trigrams in (e) all relate to document 15, and (e) only a vertical trigram relate to document 18.Equally with reference to Figure 23, it illustrates the technology for the horizontal and vertical trigram information combination of describing with reference to Figure 22 now.Use is about the information of the known physical position of the trigram on the original printer page, and this technology will be from the tabulation combination of the voting of horizontal and vertical feature extraction.For each M that exports each document that has among selecting by the horizontal and vertical sorter, compare with each vertical trigram of deciding by vote about that document in the position of each the horizontal trigram that will decide by vote about document.Document receives the many votings equal with the quantity of the horizontal trigram of any vertical trigram crossover, here when the bounding box crossover of two trigrams, and " crossover " appearance.In addition, with hereinafter with reference Figure 34 A 3406 and the version after suitable change of the evidence accumulation algorithm described calculates the x-y position at the center of crossover part.For example, as shown in Figure 23, the tabulation in (a) and (b) (being respectively (f) and (g) among Figure 22) is intersected, the page listings (c) that all relates to definite horizontal and vertical trigram.Use cross tabulating (c), tabulation (d) and (e) (only illustrate by the related intersection document of the trigram of identifying) and printed document database 3400, determine the crossover part of document.For example, horizontal trigram 3-5-3 relates to document 6 with vertical trigram 8-3-6, and in the image 2200 of catching, this crossover on word " has " of those two trigrams; Thereby document 6 receives a voting about this crossover part.As shown in (f), for the special image of catching 2200, document 15 receives the voting of maximum quantity, and thereby is identified as comprising the document of the image 2200 of catching.Identification (x1, y1) is as the position of the input picture in the document 15.Thereby, sum up above with reference to Figure 22 and 23 described document fingerprint matching technology, the horizontal classification device uses the feature that derives from the horizontally disposed of the word of text, and the vertical classification device uses the feature that derives from being arranged vertically of those words, here based on the crossover part of those features in the original document, and the result is combined.Such feature extraction is provided for identifying uniquely the mechanism of document, because when the horizontal aspect of this feature extraction was limited by suitable grammer and language constraint, vertical aspect was not limited by such constraint.
In addition, although be particularly suitable for the use of trigram with reference to the description of Figure 22 and 23, for one of horizontal and vertical feature extraction/classification or both, can use any n-gram.For example, in one or more embodiments, for the multi-categorizer feature extraction, can use vertical and horizontal n-gram, here n=4.In one or more other embodiment, the horizontal classification device can extract feature based on n-gram, n=3 here, and the vertical classification device can extract feature based on n-gram, here n=5.
In addition, in one or more embodiments, it not is strictly horizontal or vertical syntople that classification can be based on.For example, NW, SW, NW and SE syntople can be used for extracting/classification.
Figure 24 illustrates according to an embodiment of the invention another document fingerprint matching technology." feedback of database-driven " technology shown in Figure 24 is considered by utilizing the image of the document that can be complementary with input, with the subsequent step of the definite graphical analysis that will mate from subimage and the input picture of original document therein, can improve the accuracy of file and picture matching system.This technology comprises the conversion that copies the noise that presents in the input picture.After this stencil matching analysis can be arranged.
Figure 25 illustrates the according to an embodiment of the invention flow process of the feedback of database-driven.As described above, in step 2510,2512, at first pre-service and identify the input picture fragment (for example, use word OCR and word to search, character OCR and character to search, the configuration of word boundary frame), with many candidates of the identification that produces images fragment 2522.Under can comprising, each candidate in this tabulation lists (doci, pagei, xi, yi), here doci is the identifier of document, pagei is the page in the document, and (xi, yi) is the x-y coordinate at the center of the images fragment in that page.
Use from the range information of the page the size criteriaization of whole input picture fragment to optional fixed size at the original fragment searching algorithm of step 2514, to guarantee to be converted into known spatial resolution, for example, 100dpi.Font size algorithm for estimating as described above can be adapted to this task.Similarly, can use known distance from focus or from the degree of depth technology of focus.Equally, can be based on their height of word boundary frame, size criteriaization is the zoomed image fragment pari passu.With the identifier of each document of receiving about it and the page and MMR database with the center of the bounding box of the fragment that produces together, original fragment searching algorithm retrieval MMR database 3400.The scope of the fragment that produces depends on the size of standardized input fragment.By this way, can obtain the fragment of same spatial resolution and dimension.For example, when being normalized to 100dpi, the input fragment extends 50 pixels on every one side of the heart therein.In this situation, order MMR database generation center is placed the x-y value of appointment, the original fragment of the 100dpi of 100 pixel height and width.
Each the original image fragment that returns from MMR database 2524 can with under list (doci, pagei, xi, yi, widthi, heighti, actioni) be associated, here (doci, pagei, xi, yi) be as described above, widthi and heighti are width and the height of the original fragment that calculates with pixel, and the optional action of actioni for being associated with the respective regions in the clauses and subclauses of doci in the database.Original fragment searching algorithm is exported this tabulation of 2518 images fragments and data 2518, and exports together the output fragment of the size criteria of its structure.
In addition, in one or more embodiments, fragments matching algorithm 2516 is compared the input fragment of size criteria with each original fragment, and assigns the scoring 2520 of measuring them and how to mate each other.Those skilled in the art will recognize that owing to the comparable size mechanism that is used to guarantee fragment, under many situations, just enough with the simple crossing dependency of Hamming distance.In addition, this process may comprise the introduction of the noise in the original fragment that imitates the picture noise that detects in the input.More also may be complicated arbitrarily, and may comprise the comparison of any feature group, this feature group comprise two fragments OCR result and based on character, character to or the ranking compositor of the right quantity of word, wherein word is to being limited by such as former geometric relationship.Yet in this situation, the right quantity of geometry total between input fragment and the original fragment can be estimated as or be used as the ranking compositor module.
In addition, output 2520 can be the form with n-tuple (doci, pagei, xi, yi, actioni, scorei), and marking is here provided by the fragments matching algorithm, and tolerance input fragment and doci, the degree that the respective regions of pagei is complementary.
Figure 26 illustrates according to an embodiment of the invention another document fingerprint matching technology.The initial classification of " sorter of database-driven " utilization shown in Figure 26, generation may comprise one group of hypothesis of input picture.In database 3400, search those hypothesis, and automatically add classification policy for those hypothesis design features extract.Example be identification input fragment for or comprise the Times font, perhaps comprise the Arial font.In this situation, control structure 714 is called serif/sans serif and is distinguished special-purpose feature extractor and sorter.
Figure 27 illustrates the according to an embodiment of the invention flow process of the classification of database-driven.And then First Characteristic extracts 2710, by any or identification methods as described above with input picture fragment classification 2712, to produce document, the page, and the ranking compositor of the x-y position in those pages.Each candidate in this tabulation can comprise, for example, and lower (doci, the pagei of lising, xi, yi), doci is the identifier of document here, pagei is the page in the document, and (xi, yi) is the x-y coordinate at the center of the images fragment in that page.Can use the original fragment searching algorithm 2714 of describing with reference to Figure 25 to produce fragmentation pattern picture about each candidate.
Still with reference to Figure 27, the Second Characteristic extraction is applied to original fragment 2716.This may be different from the First Characteristic extraction, and may comprise, for example, and one or more font probe algorithms, character recognition technologies, bounding box and SIFT feature.The feature that detects in each original fragment is inputed to automatic categorizer method for designing 2720, and the method comprises, for example, is the neural network that designs, support vector machine and/or a nearest neighbor classifier of original fragment for the sample classification with the unknown.Identical Second Characteristic can be extracted and be applied to 2718 input picture fragments, and may be the sorter of original fragment special use with what the feature that it detects inputed to that this designs recently.
Output 2714 may be the form of n-tuple (doci, pagei, xi, yi, actioni, scorei), and the sorting technique 2722 of marking here by 2720 Automated Designs provides.One of skill in the art will appreciate that scoring tolerance input fragment and doci, the degree that the respective regions of pagei is complementary.
Figure 28 illustrates according to an embodiment of the invention another document fingerprint matching technology." multi-categorizer of database-driven " technology shown in Figure 28 is by spreading all over a plurality of candidates of decision process, and reduces the chance of irrecoverability mistake early stage in the identifying.Carry out several preliminary classification.Each produces the different brackets ordering of different feature extractions and the input fragment that can distinguish of classification.For example, one of those groups may be produced by horizontal n-grams, and the uniquely identification by distinguishing serif from sans serif.Another example may be produced by Vertical n-grams, and the uniquely identification by the accurate Calculation of row separation.
Figure 29 illustrates the according to an embodiment of the invention polytypic flow process of database-driven.Shown in this flow process and Figure 27 that is similar, but it uses a plurality of different feature extraction algorithms 2910 and 2912, to produce the independently ranking compositor of input picture fragment with sorter 2914 and 2916.The example of feature and sorting technique comprises horizontal and vertical word length n-grams as described above.Each sorter can produce the lower (doci that lists that comprises at least about each candidate, pagei, xi, yi, scorei) the graduate tabulation of fragment identification, doci is the identifier of document here, pagei is the page in the document, (xi, yi) is the x-y coordinate at the center of the images fragment in that page, and scorei tolerance is inputted the degree that the relevant position in fragment and the database document is complementary.
Can use the original fragment searching algorithm of above describing with reference to Figure 25 to produce one group of original image fragment of the clauses and subclauses in the tabulation of identifying corresponding to the fragment in 2914 and 2916 the output.Can as before the third and fourth feature extraction 2918 and 2920 be applied to original fragment and the as mentioned such Automated Design described in Figure 27 and the sorter of application.
Still with reference to Figure 29, the ranking compositor combination with those sorters produce has clauses and subclauses (doci, pagei with generation, xi, yi, actioni, scorei) single ranking compositor 2924, here i=1..., the quantity of candidate, and the value in each clauses and subclauses is as described above.For example, can by based on its common location in two ranking compositors and the known Borda counting method of a scoring of assignment project is measured, carry out ranking compositor in conjunction with 2922.This can be combined with the scoring of being assigned by independent sorter, to produce synthetic scoring.In addition, those skilled in the art will notice the method that can use other ranking compositor combination.
Figure 30 illustrates according to an embodiment of the invention another document fingerprint matching technology." video sequence image cumulative " technology shown in Figure 30 is by will being integrally combined near the data of or adjacent frame, and the design of graphics picture.An example relates to " super-resolution ".N interim adjacent frame of its record, and use the knowledge execution of the some expanded function of lens to be essentially the operation that the sub-pixel edge strengthens.Effect is the spatial resolution that increases image.In addition, in one or more embodiments, can make the super-resolution method specialization, to emphasize as hole, corner and the text special characteristic putting.Further expansion will be used the feature of candidate image fragment, as determining from database 3400, so that the super-resolution integrated functionality is specialized.
Figure 31 illustrates according to an embodiment of the invention another document fingerprint matching technology." video sequence characteristics is cumulative " technology shown in Figure 31 is before making a decision, and feature adds up on many interim adjacent frames.This utilizes the high sampling rate (for example, per second 30 frames) of acquisition equipment and user's intention, and it keeps several at least seconds of identical point on the acquisition equipment sensing document.On every frame, carry out independently feature extraction, and with as a result combination, to produce single unified characteristic pattern.Cohesive process comprises the registration hiding step.In the inspection of the video clipping of text fragment, be quite obvious for the needs of this technology.Automatic focusing and contrast adjustment in typical capture device can produce visibly different result in adjacent frame of video.
Figure 32 illustrates according to an embodiment of the invention another document fingerprint matching technology." video sequence determine in conjunction with " technology shown in Figure 32 will be from the decision combination of many interim adjacent frames.This utilizes the high sampling rate of typical acquisition equipment and user's intention, and it keeps several at least seconds of identical point on the acquisition equipment sensing document.Process independently every frame, and produce the graduate row decision of itself.Those are determined combination, to produce the single unified ranking compositor of input picture group.This technology comprises that control determines the registration hiding method of cohesive process.In one or more embodiments, above can be used for being combined with one or more known matching techniques with reference to figure 6 to 32 described one or more various document fingerprint matching technology, such combination is referred to herein as " multi-layer (or multifactor) identification ".Usually, in multi-layer identification, use the first matching technique in document database, to locate one group of page with specific criteria, and identify uniquely fragment among then using the page of the second matching technique from this group.
Figure 33 illustrates the according to an embodiment of the invention example of the flow process of multi-layer identification.At first, in step 3310, use acquisition equipment 106 to catch/scan at interested document " select " feature.This feature of selecting can be any feature, and it catches the selection that effectively causes one group of document in the document database.For example, the feature of selecting can be only for the numeral bar code (for example, univeraal product code (UPC)), the alphanumeric bar code (for example, code 39, code 93, code 128) or two-dimensional bar (for example, QR code, PDF 417, Datamatrix, Maxicode).In addition, the feature of selecting can be, for example, and combination, key word or the phrase of figure, image, trade mark, sign, special color or color.In addition, in one or more embodiments, the feature of selecting can be confined to be suitable for the feature of acquisition equipment 106 identifications.
In step 3312, in case caught the feature of selecting in step 3310, related based on the feature of selecting of catching selected in the document database one group of document and/or document file page.For example, if the sign of the company that is characterized as that selects that catches is then selected to index in the database for comprising all documents of that sign.In another example, database can comprise the image of selecting and its trade mark storehouse relatively of will catch.When " hitting " arranged, select all documents be associated with the trade mark that hits, to be used for such as described coupling subsequently hereinafter in this storehouse.In addition, in one or more embodiments, can depend on the position of that feature of selecting on the feature of selecting of catching and the document that scans in the selection of the document/page of step 3312.For example, the information that is associated with the feature of selecting of catching can specify the image of selecting whether to be positioned at the upper right corner of document, rather than the lower left corner of opposed document.
In addition, those skilled in the art will notice, can be made by acquisition equipment 106 or some other parts that receive original view data from acquisition equipment 106 and catch especially the determining of image that image comprises the feature of selecting.For example, database self can be determined to comprise and select feature from the specific object of catching that acquisition equipment 106 sends, as its associated databases is selected and one group of document selecting feature association of catching.
In step 3314, after step 3312 had been selected the particular group document, acquisition equipment 106 continued scanning and catches thus the image of interested document.Then, by using one or more with reference in the described different document fingerprint matching technology of figure 6-32, the image of this document of catching and document in step 3312 selection are mated.For example, step 3310 will be indexed as the one group of document selecting feature that comprises the footwear figure based on catching of the footwear graph image on the interested document after, can be with the document coupling of catching image and a described group selection of foregoing a plurality of classifier technique with subsequently interested document.
Thus, use the realization of processing with reference to the multilayer identification stream of the description of Figure 33, the quantity by the initial reduction page/document can reduce fragment identification number of times, wherein with the described page/document and subsequently the images match of catching.In addition, by at first scanning the locational document of the feature of selecting that has image, bar code, figure or other type, the user can utilize so improved identification number of times.By carrying out such action, the user can reduce the number with the document of subsequently the images match of catching rapidly.
The MMR Database Systems
Figure 34 A illustrates according to one embodiment of present invention and the functional block diagram of the MMR Database Systems 3400 that dispose.System 3400 is for content-based retrieval disposes, here so that can carry out two-dimensional geometry relation between the mode indicated object of searching of text based index (or any index that other can be searched for).System 3400 adopts evidence cumulative, to pass through, for example, frequency possibility of its position in 2 dimensional region that feature occurs is combined, and improves search efficiency.In a particular embodiment, Database Systems 3400 are the detailed realization of document event database 320 (comprising PD index 322), and its content comprises the electronic representation of the printed document that is produced by trapping module 318 and/or the document fingerprint matching module 226 of discussing with reference to figure 3 as mentioned.According to this open invention, other application program of system 3400 and configuration will be clearly.
If see, Database Systems 3400 comprise MMR concordance list module 3404, evidence accumulator module 3406 and the relational database 3408 (or any other suitable storage facility) that receives the description of being calculated by MMR characteristic extracting module 3402.The concordance list of the x-y position in document, the page and those pages that concordance list module 3404 each feature of inquiry identification occur.Can pass through, for example, MMR concordance list module 3404 or some other special-purpose modules produce concordance list.Evidence accumulator module 3406 programs are turned to or are configured to, given data from concordance list module 3404, and calculate graduate one group of document, the page and hypothesis on location 3410.Relational database 3408 can be used for storing the other feature 3412 of relevant each fragment.These comprise 504 among Fig. 5 and 508, but are not limited to this.By in deriving about the signature of fragment or fingerprint (that is, unique search terms), using the two-dimensional arrangement of the text in the fragment, can increase to a large extent even the uniqueness of the fragment of very little text.Other embodiment can utilize any two-dimensional arrangement of the object/feature in the fragment similarly in deriving about the signature of fragment and fingerprint, and about identifying uniquely fragment, embodiments of the invention are not intended to be limited to the two-dimensional arrangement of text.Other parts of illustrated Database Systems 3400 and function comprise that signature search module 3418, the document of feedback guiding present application program module 3414 and subimage extraction module 3416 among Figure 34 A.These parts and other system 3400 parts reciprocations are with signature search and the dynamically original image generation that the feedback guiding is provided.In addition, system 3400 comprises the action processor 3413 that receives action.Action and its output that provides that action specified data storehouse system 3400 carries out.Each of these other parts will be explained successively.
Utilize the example of MMR characteristic extracting module 3402 of the two-dimensional arrangement of the text in this fragment shown in Figure 34 B.In such embodiment, MMR characteristic extracting module 3402 programs are turned to or be configured to adopt based on the technology of OCR from images fragment, extract feature (text or other target signature).In this specific embodiment, characteristic extracting module 3402 is extracted the x-y position of the word in the image of fragment of texts, and with those positional representations for its level that comprises or vertical adjacent word to organizing.If it is adjacent that they are levels, then images fragment is converted to effectively the word that connected by "-" to (for example, the-cat, in-the, the-hat, and is-back), and if their crossovers vertically then (are for example connected by "+", the+in, cat+the, in+is, and the+back).This x-y position can be, for example, and based on some point of fixity in file and picture (from the upper left corner or the center of document), the pixel of in x and y in-plane, calculating.The level in this example noticed adjacent to can appearing at continually in many other text fragments, and vertical crossover to may be more rare in other text fragment.Can be similarly with other geometric relationship coding between the characteristics of image, for example between the word with the SW-NE of "/" in abutting connection with, with " " the NW-SE adjacency, etc.Equally, " feature " can be generalized to the word boundary frame (perhaps further feature bounding box) of string encoding that can be arbitrarily enough but consistent.For example, can enough strings " 4rusl " expression have coarse upper outline line but smooth lower whorl profile, with four times of high the same long bounding boxes.In addition, geometric relationship can be generalized to arbitrarily angled and distance between the feature.For example, can with " 4rusl 4rusl " expression NW-SE adjacent but by two words that " 4rusl " describes that have of the high separation of two words.According to this open invention, many encoding schemes will be clearly.In addition, notice and to use numeral, Boolean, geometric configuration and other such file characteristics, replace word pair, identify fragment.
Figure 34 C illustrates example index table tissue according to an embodiment of the invention.If see, the MMR concordance list comprises is inverted entry index table 3422 and document index table 3424.As discussing successively, the item that each is unique or feature are (for example, key point 3421) all points to position in the entry index table 3422, this entry index table 3422 (for example keeps sensing one row record 3423, Rec#1, Rec#2, etc.) the functional value (for example, key point x) of feature, and the candidate region on the page in the document identified in each record.In an example, the functional value (key point x) of key point and key point is identical.In another example, hash function is applied to key point, and this function is output as key point x.
A given row query term checks each record of indexing through key point, and the identification zone the most consistent with all query terms.If hypothesis is then confirmed in the sufficiently high coupling scoring of this district inclusion (for example, based on predetermined matching threshold).Otherwise, announce that it fails to match, not return area.In this exemplary embodiment, as described earlier, key point be or the word that separates by "-" or by "+" to (as, " the-cat " or " cat+the ").Geometric relationship is integrated with this technology itself in the key point allow use about traditional text search technology of two-dimensional geometry inquiry.
Thereby the concordance list tissue becomes the Feature Conversion that detects in the images fragment text items of representative feature itself and the geometric relationship between them.This allows traditional text index demarcation and the utilization of searching method.For example, as will be clearly according to this open invention, by the vertical adjacent item " cat " and " the " of the symbol that can be called as " query term " " cat+the " expression.Traditional text search data structure and the utilization of method facilitate MMR technology described herein moving on the Internet text search system (for example, Google, Yahoo, Microsoft, etc.) and connect.
In the inversion entry index table 3422 of this exemplary embodiment, each record uses six parameters: the width of document recognition (DocID), page number (PG), x/y side-play amount (being respectively X and Y) and rectangular area and height (being respectively W and H), the candidate region in the identification document on the page.DocID is for when printed document, time-based mark (or other metadata) and unique string of generation.But it can be any string of coupling apparatus ID and personnel ID.In any situation, document is all identified by unique DocIDs, and has the record that is stored in the document index table.Page number is the page-number marker corresponding to paper output, and since 1.By the X-Y coordinate in the upper left corner, and the width of the bounding box in the standardized coordinate system and highly be the Parametric Representation matrix area.According to this open invention, many document interior location/coordinate schemes will be clearly, but the present invention is not intended to be limited to any special one.
According to one embodiment of present invention and the exemplary record structure of configuration is used 24-position DocID and 8-position page number, allow until 16,000,000 documents and 4,000,000,000 pages.About each X of bounding box and Y side-play amount one without symbol-byte provide the 30dpi level the spatial resolution vertical with 23dpi (suppose 8.5 " * 11 " the page, although can use other page size and/or spatial resolution).About the similar disposal of the width of bounding box and height (for example, about each W and H one without symbol-byte) permission is the same little with point on fullstop or " i ", perhaps with full page (for example, 8.5 " * 11 " or other) expression in equally large zone.Therefore, eight of each record bytes (1 byte of 1 byte of 3 bytes of DocID, 1 byte of PG, X, 1 byte of Y, W and 1 byte of H for altogether 8 bytes) can comprise a large amount of zones.
Document index table 3424 comprises the relevant information of relevant each document.In a particular embodiment, this information comprises the relevant field of document in the XML file, comprises print resolution, date printed, paper size, shadow file name, page-images position, etc.Because when indexing to document, will print coordinate conversion becomes standardized coordinate system, calculates the search hypothesis and does not relate to this table.Thereby, only consult document index table 3424 about the candidate region that is complementary.Yet this determines some losses of information in the hint index, because standardized coordinate is in the resolution lower than print resolution usually.If need in this way, when calculating the search hypothesis, alternative embodiment can use document index table 3424 (the perhaps higher resolution of standardized coordinate).
Thereby, concordance list module 3404 running, effectively to provide so that the image index that the content-based retrieval of the interior x-y position of the object of given image querying nidus (for example, document file page) and those objects can carry out.The feature that the combination of such image index and relational database 3408 allows to make images fragment and fragment (for example, be attached to " action " of fragment, perhaps can scan to impel the bar code of the retrieval of other content relevant with fragment) position of the object that is complementary.The method of " the opposite link " of the feature of other fragment in concordance list relational database 3408 also provides from a fragment to document.Opposite link provides to be found when its part from file and picture moves to another part, the mode of the feature that recognizer is seen expectation, and it can improve the performance of front end image analysis algorithm in the MMR system as discussed in this to a large extent.
The signature search of feedback guiding
The x-y coordinate (for example, the x-y coordinate at the center of images fragment) of images fragment and the identification of document and the page can be inputed to the signature search module 3418 that feedback leads equally.Signature search module 3418 search of feedback guiding are from the entry index table 3422 of giving the record 3423 that occurs in the set a distance at the center of images fragment.For example, can be by will being stored in about the record 3423 of each DocID-PG combination in the storage adjacent block with the series classification of X and Y value, and convenient this search.By the binary search about set-point (depending on when storage data X or the Y how to classify), and certainly have the serial search of that position of all records of given X and Y value, search and carry out.Typically, this will comprise the x-y coordinate in the M inch ring of periphery of the wide and fragment that the H inch is high of in the given document of tolerance and page W inch.Locate the record that occurs in this ring, and by antitracking pointer location their key point or feature 3421.As Figure 34 A 3417 shown in, the tabulation of feature and their x-y position in the report ring.Can be based on the size of input picture, dynamically be arranged on the value of the W shown in 3415, H and M by recognition system, so that feature 3417 is in the outside of input picture fragment.
For example, for the ambiguity of eliminating a plurality of hypothesis, such feature of image database system 3400 is of great use.If the more than document of Database Systems 3400 reports may be complementary with the input picture fragment, the feature that then centers in the ring of fragment (for example will allow recognition system, the recognition system that fingerprint matching module 226 or other are fit to) by the guiding user at the direction of the ambiguity that can eliminate decision moving images acquisition equipment slightly, and determine that the document which document and user hold mates most.For example (suppose to use the feature based on OCR, although this concept can extend to the feature group of indexing on any geometry), the images fragment among the document A may be positioned at word directly under " blue-xylophone ".Images fragment among the document B may be positioned at word directly under " blue-thunderbird ".Database Systems 3400 will be reported the position of the expectation of these features, and recognition system may order user (for example, passing through user interface) that camera is moved up by the indicated amount of difference at the top of the y coordinate of feature and fragment.Recognition system can be calculated the feature in that difference zone, and uses and determine from the feature of document A and document B which mates most.For example, " dictionary " aftertreatment of the feature that can be enough be comprised of (xylophone, thunderbird) of recognition system is from the OCR result in difference zone.The word that mates most with OCR result is corresponding to the document that mates most with input picture.The example of post-processing algorithm comprises usually known spelling correction technology (for example word processor and email application employed those).
Illustrate such as this example, the design of Database Systems 3400 allows recognition system, describes by the mode matching characteristic with the needs of avoiding carrying out further database access, eliminates the ambiguity of a plurality of candidates in effective mode.Alternative solution will be for processing each image independently.
Dynamically original image generates
Equally can be with the x-y coordinate of the position in the images fragment (for example, the x-y coordinate at the center of images fragment) and the identification of document and the page input to relational database 3408, can retrieve with them therein the electronics original text of storing of that document and the page.Then, can present application program module 3414 by document presents that document and becomes bitmap images.Equally, subimage extraction module 3416 uses other " square frame size " value that is provided by module 3414 to extract around the part of the bitmap at center.This bitmap is " original " expression of the desired outward appearance of images fragment, and it comprises the characteristic accurate expression that should present in the input picture.Then can return original fragment as fragment feature 3412.This solution has overcome the desired excessive storage of prior art, and the prior art can be converted into the non-image expression of compression of data bitmap subsequently by storage when requiring, and the memory image bitmap.
Such storage scheme is useful because its make it possible to suppose-and-use of check recognition strategy, use therein the character representation retrieval of from image, extracting by one group of candidate after the detailed signature analysis disambiguation.Usually, prediction will eliminate best arbitrarily that the feature of one group of candidate is impossible, but determine that from the original image of those candidates this point is very desirable.For example, can locate word to the image of " the cat " in two data database documents, one of them is printed with Times Roman font at first, and another is printed with the Helvetica font.Determine simply whether input picture comprises the database document that will identify correct coupling of these fonts.With the template matches comparison measuring standard as Euclidean distance, the original fragment of those documents is compared the candidate that identification is correct with the input picture fragment.
Example comprises that (similarly method is suitable for other document format as the XML paper specificationXPS of postscript, PCL, pdf. or Microsoft for the relational database 3408 of store M icrosoft Word " .doc " file, perhaps by the application program that presents as ghostscript, or at XPS, have in the situation of Internet Explorer of Microsoft of the WinFX parts of installing, can be converted into other such form of bitmap).Supposing document, the page, x-y position, square frame dimension and indicating preferred resolution is the identification of the systematic parameter of 600 dpi (dots per inch)s (dpi), then can call the Word application program, to produce bitmap images.This will provide the bitmap of 6600 row and 5100 row.Other parameter x=3 ", y=3 ", height=1 " and width=1 " referred database should return the center and place fragment away from 600 pixel height and width of the point of the upper left corner x of the page and upper 1800 pixels of y.
A plurality of databases
When using a plurality of Database Systems 3400, its each can comprise different document sets, can use original fragment to determine whether two databases return identical document or which database has returned and the input candidate of coupling preferably.
If two databases return identical document, perhaps have different identifier 3410 (that is, and original document be identical be unconspicuous because their respectively inputs in different databases) and feature 3412, then original fragment will be almost completely identical.This can pass through, and for example, with the Hamming distance of the quantity of calculating different pixels, original fragment is compared to each other and determines.If it is identical that original document is pixel to pixel, then Hamming distance will be zero.If fragment is different slightly, such as what may be caused by small font difference, then Hamming distance will be slightly greater than zero.When the image difference in the calculating Hamming operator, this can cause " ring of light " effect around character edge.Different editions original presents the operating system of different editions on the server of application program, runtime database, different printer driver or different font sets, can both cause such a font difference.
Can carry out original fragment comparison algorithm at the fragment from the more than x-y position in two documents.What they were all should be identical, but such a sampling process will allow to overcome the redundancy that presents difference between the Database Systems.For example, when be current two systems, it is different up hill and dale that a kind of font may seem, but another kind of font may be identical.
If two or more databases return different documents as its optimum matching about input picture, then can original fragment be compared with input picture by the comparison measuring standard based on pixel as Hamming distance, to determine which is correct.
Be used for relatively alternative strategy from the result of a more than database and be comparing and measuring the content of the cumulative array that how much of feature of the document that each database reports distribute.Directly providing this totalizer by database, with the needs of searching of the primitive character group of avoiding carrying out separation, is very desirable.Equally, this totalizer should be independent of the content of Database Systems 3400.In the embodiment shown in Figure 34 A, derivation activity array 3420.Can distribute by the inside of their value of measurement, relatively two movable arrays.
In more detail, if two or more databases return identical document, perhaps has different identifier 3410 (namely, original document be identical be unconspicuous, because they are respectively input in different databases) and feature 3412, will be almost completely identical from the movable array 3420 of each database then.This can pass through, and for example, with the Hamming distance of the quantity of calculating different pixels, array is compared to each other and determines.If original document is identical, then Hamming distance will be zero.
If two or more databases return different documents as its optimum matching about input feature vector, then can compare their movable array 3420, to determine which document and input picture " best " coupling.The movable array that correctly mates with images fragment will comprise the ethnic group that the center is similar to the high numerical value of the position that places fragment appearance place.The movable array that mates inadequately with images fragment will comprise the numerical value of stochastic distribution.The strategy that has many randomnesss that are used for measurement dispersion or image of knowing, for example entropy.Can be with such algorithm application in movable array 3420, to obtain the measurement that exists of indication group variety.For example, comprise the entropy that entropy corresponding to the movable array 3420 of the group variety of images fragment will considerably be different from the movable array 3420 that its numerical value distributes randomly.
In addition, notice, independent client computer 106 may be at any time addressable its content a plurality of databases 3400 that must not conflict each other all.For example, enterprise may have each and relates to the privately owned fragment of the addressable fragment of disclosing of single document and enterprise.In such situation, client apparatus 106 will be kept a column data storehouse D1, D2, the D3... that consults in order, and will generate unified user's demonstration through movable array 3420 and the identifier 3410 of combination.Given client apparatus 106 may show from the available fragment of all databases, perhaps allows user selection database subset (for example, only D1, D3 and D7), and only shows the fragment from those databases.Can database be added into tabulation by subscribed services, perhaps when client apparatus 106 is in certain position, so that database can wirelessly obtain, perhaps because database is several one that has been loaded on the client apparatus 106, perhaps because current this device that using of verified certain user perhaps even because this just installs operates with certain pattern.For example, because the audio tweeter of special client apparatus opens or cuts out, perhaps because of the current client computer that is attached to of peripheral unit as video frequency projector, some database just may be available.
Action
Further with reference to figure 34A, MMR database 3400 receives action, and from a stack features of MMR characteristic extracting module 3402.Action specified command and parameter.In such embodiments, the definite fragment feature 3412 of returning of order and its parameter.Can easily be translated into as the comprising of text, for example, http, form receive action.
Action processor 3413 receives by evidence accumulator module 3406 determined identifiers about the x-y position in document, the page and the page.It also receives order and its parameter.Action processor 3413 is turned to or is configured to by program order is converted into or retrieve data or use relational database 3408 to store data in instruction corresponding to the position of given document, the page and x-y position.
In such embodiment, order comprises: RETRIEVE, INSERT_TO<DATA 〉, RETRIEVE_TEXT<RADIUS, TRANSFER<AMOUNT, PURCHASE, PRISTINE_PATCH<RADIUS[DOCID PAGEID X Y DPI] and ACCESS_DATABASE<DBID.Now each will be discussed successively.
The RETRIEVE-retrieval is connected to the data of the x-y position in the given document file page.Action processor 3413 is converted into the relation data library inquiry that retrieval may be stored near the data this x-y position with RETRIEVE order.This can require the issue of a more than data library inquiry, surrounds the zone of x-y position with search.The data of retrieval are exported as fragment feature 3412.The exemplary application of RETRIEVE order is the multimedia viewer applications of retrieve video montage or multidate information object (for example, can retrieve the electronic address of current information).The data of retrieval can comprise the menu of the step subsequently that appointment will be carried out at the MMR device.It also may be can be at the static data of phone (or other display device) demonstration, for example jpeg image or video clipping.Parameter can be offered the RETRIEVE order, it determines the zone of search fragment characteristic.
INSERT_TO<DATA 〉-at the x-y position of images fragment appointment insertion<DATA 〉.Action processor 3413 is ordered INSERT_TO the instruction that is converted into about the x-y position that data is added into appointment of relational database.Being successfully completed of INSERT_TO order taken as really for fragment feature 3412 return.The exemplary application of INSERT_TO order is for allowing the user data to be attached to software application on the MMR device of any x-y position in the paragraph of text.Data can be static multi-medium datas, such as jpeg image, video clipping or audio file, but the arbitrarily electronic data of its action that also can be the appointment as menu be associated with given position.
RETRIEVE_TEXT<RADIUS 〉-retrieval by the determined x-y of images fragment position<RADIUS in text.Can be with<RADIUS〉be appointed as, for example, the many pixels in the image space perhaps can be appointed as it around the character by many words of evidence accumulator module 3406 determined x-y positions.<RADIUS〉also can relate to text object by analysis.In this specific embodiment, action processor 3413 is converted into the RETRIEVE_TEXT order relation data library inquiry of the suitable text of retrieval.If<RADIUS〉specify text object by analysis, then action processor only returns text object by analysis.If text object by analysis is not positioned near the x-y position of appointment, then action processor returns zero indication.In alternative embodiment, action processor calls the signature search module of feedback guiding, to retrieve the text that occurs in the radius of given x-y position.Text string is returned as fragment feature 3412.The optional data that are associated with each word in the text string comprise x-y bounding box in the original document.The exemplary application of RETRIEVE_TEXT order is for to select text phrases, in order to be included among another document from printed document.This may be used for, for example, and at the synthetic presentation document (for example, with the PowerPoint form) of MMR system.
TRANSFER<AMOUNT 〉-the whole document of retrieval and be connected to its some data can be loaded on form on another database.<AMOUNT〉specify quantity and the type of the data retrieve.If<AMOUNT〉be ALL, then action processor 3413 issue an orders are to database 3408, and it retrieves all data that are associated with document.The example of such order comprises DUMP or Unix TAR.If<AMOUNT〉be SOURCE, the original source file of search file then.For example, this will retrieve the Word file of printed document.If<AMOUNT〉be BITMAP, then retrieve the JPEG compressed version (or other common employed form) of the bitmap of printed document.If<AMOUNT〉be PDF, then the PDF of search file represents.Rely on command name, with the known form of invokes application, the data of retrieval are exported as fragment feature 3412.The exemplary application of TRANSFER order is for allowing the user by the zonule imaging that makes text the PDF of document to be represented to be passed to " document is seized device " of MMR device.
The PURCHASE-retrieval is connected to the description of product of the x-y position in the document.Action processor 3413 is at first carried out a series of one or more RETRIEVE orders, to obtain near the description of product the given x-y position.The description of product comprises, for example, and seller's name, the identification of product (for example, stock number) and seller's electronic address.Have precedence near other data type may being positioned at, and the retrieval description of product.For example, if jpeg is stored in the position by the determined x-y of images fragment, then alternatively retrieve the next immediate description of product.The description of product of retrieval is exported as fragment feature 3412.The exemplary application of PURCHASE order is associated with the advertisement in the printed document.Software application on the MMR device receives the description of product that is associated with advertisement, and before the seller of the appointment that sends it to specified electronic address place, interpolation user's personally identifiable information (for example, name, Shipping Address, credit card number, etc.).
PRISTINE_PATCH<RADIUS[DOCID PAGEID X Y DPI] 〉-electronic representation of the specified document of retrieval, and extract and have radius R ADIUS, the center places the images fragment of x-y.RADIUS can specify the radius of annular, but it also can specify rectangle fragment (for example, 2 inches high * 3 inch wide).It also can specify whole document file page.(DocID, PG, x, y) information can provide as a part of moving expressly, and perhaps it can be derived from the image of text fragment.The original expression of action processor 3413 search file from relational database 3408.That expression can be bitmap, but it also can be the electronic document that can present.Original expression is passed to document presents application program 3414, it is for conversion into bitmap (resolution that provides among the parameter DPI such as dpi (dots per inch) is provided) at this, and then it is offered the subimage extraction 3416 of extracting desired fragment at it.The fragmentation pattern picture is returned as fragment feature 3412.
ACCESS_DATABASE<DBID 〉-database 3400 is added into the Database Lists of client computer 106.Except when front in tabulation outside any existing database, client computer can be consulted this database 300 now.DBID or specified file, perhaps appointment relates to the telecommunication network of the database of appointment.
The index table generating method
Figure 35 illustrates according to an embodiment of the invention the method 3500 for generation of the MMR concordance list.Can, for example, implement this method by the Database Systems 3400 of Figure 34 A.In such embodiment, for example, by MMR concordance list module 3404 (or some other special-purpose modules), from scanning or printed document, produce the MMR concordance list.Can be with software, hardware (for example, gate-level logic), firmware (for example, disposing the microcontroller for the embedding routine of implementing the method), perhaps their some combinations are as other module described herein.
The method comprises reception 3510 paper documents.Paper document can be any document, for example have any amount of page informal letter (as, work is relevant, individual's mail), Product labelling (as, canned commodity, medicine, case dress electronic installation), the description of product (as, snowblower, computer system, manufacturing system), product manual or show and colour (as, automobile, ship, the holiday resort), the service describing material (as, Internet service provider, cleaning service), one or more pages of book, magazine or other such publication, the page of printing from the website, hand-written notes, the notes of catching and printing from blank, perhaps from any disposal system (as, desktop PC or portable computer, camera, smart mobile phone, remote terminal) page of printing.
The method continues to produce the electronic representation of 3512 paper documents, and this expression comprises the x-y position of the feature shown in the document.Target signature can be, for example, and the character in independent word, letter and/or the document.For example, if the scanning original document, then at first with its OCR and extract word (perhaps other target signature) and its x-y position (for example, the operation of the document fingerprint matching module 226 ' by scanner 127).If the printing original document, then the index calibration process receives the Precise Representation (for example, the operation of the print driver 316 by printer 116) of XML form of font, point size and the x-y bounding box of each character (or other target signature).In this situation, concordance list generates and starts from step 3514, because receive electronic document (for example, from print driver 316) with the x-y feature locations of accurately identification.According to this open invention, the form except XML will be clearly.By their " printings " to its output being directed to the print driver of file, consequently must not produce paper, can be with the electronic document input database as Microsoft Word, Adobe Acrobat and postscript.The generation of the XML file structure shown in this triggers hereinafter.In all situations, XML and original document form (Word, Acrobat, postscript, etc.) all divide and send out identifier (being added into the doc i about i document of database), and to pass through that identifier, but also based on comprising the time of catching it, the date of printing, the application program that triggers printing, the title of output file, etc. the feature of other " metadata " of document, make it possible to carry out the mode of their retrievals after a while, be stored in the relational database 3408.
The example of XML file structure is shown here:
$docID.xml:
<?xml?version=“1.0”?>
<doclayout?ID=″00001234″>
<setup>
<url>file?url/path?or?null?if?not?known</url>
<date>file?printed?date</date>
<app>application?that?triggered?print</app>
<text>$docID.txt</text>
<prfile>name?of?output?file</prfile>
<dpi>dpi?of?page?for?x,y?coordinates,eg.600</dpi>
<width>in?inch,like?8.5</width>
<height>in?inch,eg.11.0</height>
<imagescale>0.1?is?1/10th?scale?of?dpi</imagescale>
</setup>
<page?no=″1>
<image>$docID_1.jpeg</image>
<sequence?box=“x?y?w?h”>
<text>this?string?of?text</text>
<font>any?font?info</font>
<word?box=″x?y?w?h″>
<text>word?text</text>
<char?box=″x?y?w?h″>a</char>
<char?box=″x?y?w?h″>b</char>
<char>1?entry?per?char,in?sequence</char>
</word>
</sequence>
</page>
</doclayout>
In a specific embodiment, word can comprise any character from a-z, A-Z, 0-9, and any one of@% $ #; All other be separator.Can catch by the employed printing of index calibration process the original description of software (for example, carrying out at the server as database 320 servers) establishment .xml file.Along with system obtains new document, actual format often develops, and comprises a plurality of elements.
The original series of the text that preservation print driver (for example, print driver 316) receives, and except " _@% $ # ", force logic word structure based on punctuation mark.Use the XML file as input, concordance list module 3404 is observed page boundary, and at first attempts by checking the quantity of two vertical crossovers between the continuous sequence sequence of packets to be become logical line.In a particular embodiment, if two sequence crossovers are less than their half of average height, then use row to interrupt the trial method that occurs.For typical text document (for example, the Microsoft Word document), such trial method quite works.The html page for having complex topology may need other geometric analysis.Yet, as long as can demarcate item as producing consistent index by query script, just must not extract perfect semantic documents structure.
Based on the structure of the electronic representation of paper document, the method continues 3514 and indexs to the position of each target signature on each page of papery document.In a particular embodiment, this step comprises to the position of the adjacent word of every pair of horizontal and vertical on each page of papery document and indexing.As previously explained, the word that level is adjacent is the adjacent words pair in the delegation.Vertical adjacent word is the word in the adjacent lines of vertically arranging.Can utilize similarly other multidimensional aspect of the page.
The method further comprises the storage 3516 fragment features that are associated with each target signature.In a particular embodiment, the fragment feature comprises the action that is attached to fragment, and is stored in the relational database.As previously explained, the position of the object that is complementary of the feature in conjunction with permission and images fragment and fragment of such image index and storage facility.Feature can be any data with path-dependent, for example metadata.Feature also can comprise, for example, will implement the action of specific function, can be selected to provide to the linking and/or can be scanned or process of the access of other content relevant with fragment, with the bar code of the retrieval that impels other content relevant with fragment.Generate about search terms, provide more precise definition, only observe the row structure here one section.For adjacent right of level, by connecting word with "-" separator, form query term.It is vertically right to use "+" to connect.If need in this way, can use word with its primitive form, to preserve capitalization (the more unique items of this establishment, but the same larger index with other inquiry issue that produces are with the thing of consideration as case sensitivity).The index scaling scheme allows identical search strategy is applied to or word level or vertical pair, perhaps both combinations.But the resolving ability of the reverse document frequency descriptive item of any situation.
The evidence accumulation method
Figure 36 illustrates according to an embodiment of the invention be used to calculating graduate one group of document, the page and about the method 3600 of the hypothesis on location of destination document.Can, for example, implement the method by the Database Systems 3400 of Figure 34 A.In such embodiment, evidence accumulator module 3406 is used from the data of concordance list module 3404 as discussed previously and is calculated hypothesis.The method begins as the images fragment of larger file and picture or the destination document image the whole file and picture to receive 3610.The method continues to produce one or more query terms of the two-dimentional relation between the object in the 3612 target acquisition file and pictures.In a particular embodiment, by as the right characteristic extraction procedure of word of the previous generation horizontal and vertical of discussing with reference to figure 34B, and generation query term.Yet, as will clearly using any amount of characteristic extraction procedure as the described herein according to this open invention, produce query term, the two-dimentional relation between its target acquisition objects in images.For example, can use the identical Feature Extraction Technology of the index of construction method 3500, produce query term, for example refer step 3512 those (produce paper document electronic representations) of discussing.In addition, notice, the two dimension of query term shows that picture can (for example be applied to each query term individually, both single query items of horizontal and vertical object in the expression destination document), perhaps be applied to last set item (for example, being right the first query term second query term right with being vertical word of horizontal word).The method continues to search each query term in the 3614 entry index tables 3422, to retrieve a column position that is associated with each query term.About each position, the method continues many zones that generation 3616 comprises the position.After processing all inquiries, the method further comprises identifies 3618 zones the most consistent with all query terms.In such embodiment, increase the scoring of each candidate region with weight (for example, based on each zone degree consistent with all query terms).The method continues to determine whether 3620 zones of identifying satisfy predetermined match-on criterion (for example, based on predetermined matching threshold).If like this, the method continue to confirm 3622 should the zone as the coupling of destination document image (for example, the most probable page that comprises described zone can accessed or otherwise be used).Otherwise the method continues refusal 3624 should the zone.
Word is stored in the entry index table 3422 position with the coordinate space of " standardization ".This provides different printer and the consistance between the resolution of scanner.In a particular embodiment, 85 * 110 coordinate spaces are used for 8.5 " * 11 " the page.In such situation, by its each word of location recognition in this 85 * 110 space pair.
In order to improve the efficient of search, can carry out two step processes.The first step comprises that the location most probable comprises the page of input picture fragment.Second step comprises that the calculating most probable is the interior x-y position of that page at the center of fragment.Such approach is introduced the real preferably possibility of coupling that may miss in the first step.Yet, demarcate the space in sparse index, such possibility is rarely found.Thereby, depend on the size of index and desired performance, can use such efficient to develop skill.
In such embodiment, the right page of word that uses following algorithm to find most probable to comprise to detect in the input picture fragment.
For?each?given?word-pair?wp
idf=1/log(2+num_docs(wp))
For?each(doc,page)at?which?wp?occurred
Accum[doc,page]+=idf;
end?/*?For?each?(doc,page)*/
end?/*?For?each?wp*/
(maxdoc,maxpage)=max(Accum[doc,page]);
if(Accum[maxdoc,maxpage]>thresh_page)
return(maxdoc,maxpage);
This technology will be added into the totalizer that the page that is occurred by document and it is demarcated index thereon about the right reverse document frequency (idf) of each word.Num_docs (wp) returns and comprises word to the quantity of the document of wp.Realize totalizer by evidence accumulator module 3406.If the maximal value in that totalizer surpasses threshold value, then its as be fragment optimum matching the page and export.Thereby this algorithm computing is with the page of word to mating most in identification and the inquiry.Alternately, can screen the Accum array, and conduct is reported a N page with " N best " page that the input document is complementary.
According to one embodiment of present invention, the evidence of the cumulative position about the input picture fragment in the single page of following evidence accumulation algorithm.
For?each?given?word-pair?wp
idf=1/log(2+num_docs(wp))
For?each(x,y)at?which?wp?occurred
(minx,maxx,miny,maxy)=extent(x,y);
maxdist=maxdist(minx,maxx,,miny,maxy);
For?i=miny?to?maxy?do
For?j=minx?to?maxx?do
norm_dist=Norm_geometric_dist(i,j,x,y,
maxdist)
Activity[i,j]+=norm_dist;
weight=idf*norm_dist;
Accum2[i,j]+=weight;
end?/*?for j*/
end?/*?for?I*/
end?/*?For?each(y,y)*/
end?/*?For?each*/
This algorithm computing is take the unit of most probable in 85 * 110 spaces at the center of input picture fragment, location.Among the embodiment shown here, by weight being added into the unit in the right fixed area of each word (be called ring district), this algorithm can be accomplished this point.To the given x of extent function, y pair, and its return about around fixed size zone (1.5 " high and 2 " wide be typical) minimum and maximal value.Extent function CONSIDERING BOUNDARY CONDITIONS, and guarantee its value of returning can not drop on outside the totalizer (that is, less than zero or x greater than 85 or y greater than 110).The maxdist function finds by the maximum Euclidean distance between two points in the described bounding box of bounding box coordinate (minx, maxx, miny, maxy).About each unit in the ring district, and calculate the determined weight of product by the standardized geometric distance between the center in the right reverse document frequency of word and unit and ring district.This makes the unit weight close to the center be higher than the unit of distant place.By each word of this algorithm process to after, search has a peaked unit in the Accum2 array.If that value has surpassed threshold value, then its coordinate is reported the coordinate of described unit as the position of images fragment.The norm_dist value that the activity array stores is cumulative.Because not by idf with they convergent-divergents, they do not consider to comprise the quantity of the document in the right database of special word.Yet they really provide with one group of given word the two dimensional image of the x-y position of coupling are represented.In addition, the clauses and subclauses in the movable array are independent of the document of storing in the database.Can be with the usually inner this data structure derivation 3420 of using.
According to one embodiment of present invention, the geometric distance of normalized as shown here.
Norm_geometric_dist(i,j,x,y,maxdist)
begin
d=sqrt((i-x) 2+(j-y) 2);
return(maxdist-d);
end
Calculate the Euclidean distance between the center that the right position of word and ring distinguish, and return this and may ultimate range as calculated between poor.
Process by the evidence accumulation algorithm each word to after, search has a peaked unit in the Accum2 array.If that value has surpassed predetermined threshold value, then its coordinate is reported as the position at the center of images fragment.
MMR type-script architecture
Figure 37 A illustrates the functional block diagram of MMR parts according to an embodiment of the invention.Basic MMR parts comprise having the computing machine 3705 that the printer 116 that is associated and/or shared document are explained (SDA) server 3755.
As known in the art, computing machine 3705 is desktop PC, laptop computer or the network computer of any standard.In one embodiment, computing machine is with reference to the described MMR computing machine 112 of Figure 1B.As the described herein, user's printer 116 is family, office or the business printer of any standard.User's printer 116 produces printed document 116, and it is the paper document that is comprised of one or more printer pages.
SDA server 3755 is the network of the standard of the file that has information, application program and/or multiple method with sharing note and be associated or the computing machine of centralization.For example, the shared note that is associated with webpage or other document is stored on the SDA server 3755.In this example, as the described herein, explain and be employed data or reciprocation among the MMR.SDA server 3755 is by addressable according to the network connection of an embodiment.In one embodiment, SDA server 3755 is with reference to the described network medium server 114 of Figure 1B.
Computing machine 3705 further comprises multiple parts, and according to various embodiment, what they were some or all of all is optional.In one embodiment, computing machine 3705 comprises file 3730, trapping module 3735, page_desc.xml 3740, hotspot.xml 3745, data storage 3750, SDA server 3755 and the MMR printer software 3760 that source file 3710, browser 3715, plug-in unit 3720, symbol focus are described 3725, more corrected one's mistakes.
Source file 3710 is the representatives for any source file of the electronic representation of document.Exemplary source file 3710 comprise HTML (Hypertext Markup Language) (HTML) file, File,
Figure S2006800394774D00652
File, simple text file, portable document format (PDF) file and like that.As the described herein, in many cases, all originate from source file 3710 at browser 3715 received documents.In one embodiment, source file 3710 is equal to as with reference to figure 3 described source files 310.
Browser 3715 is the application program of access that the data that have been associated with source file 3710 are provided.For example, can use browser 3715 retrievals from webpage and/or the document of source file 3710.In one embodiment, browser 3715 is as with reference to figure 3 described SD browsers 312,314.In one embodiment, browser 3715 is the explorer as Internet Explorer.
Plug-in unit 3720 is for providing the software application of creation function.Plug-in unit 3720 is software application independently, perhaps alternately, is the plug-in unit of operation on the browser 3715.In one embodiment, plug-in unit 3720 is and the interactive computer program of the application program as browser 3715, so that specific function described herein to be provided.According to various embodiment, the various conversions of shown webpage and other change in plug-in unit 3720 perform documents or the browser 3715.For example, plug-in unit 3720 with independent recognizable reference mark around the focus sign, to create focus, and html file that will " mark " version is back to browser 3715, transformation rule is applied to the part of document shown in the browser 3715, and retrieves and/or receive the shared note of document shown in the browser 3715.In addition, plug-in unit 3720 can be carried out other function, for example creates the document through changing and create symbol focus as the described herein to describe 3725.With reference to trapping module 3735, plug-in unit 3720 facilitates with reference to Figure 38,44,45,48 and the described method of 50A-B.
The symbol focus is described 3725 files for the focus in the identification document.The symbol focus is described the 3725 hot period of identification and contents.In this example, the symbol focus is described 3725 and be stored in data-carrier store 3750.The example that the symbol focus is described is shown among Figure 41 in further detail.
The document and the webpage that create for the result as the change of the source file 3710 by plug-in unit 3720 and conversion through the file 3730 of change.For example, as mentioned above the html file through mark is an example through the file 3730 of change.As openly invention will be clearly according to this, in some situation, the file 3730 through changing is back to browser 3715, to be shown to the user.
Trapping module 3735 represents to carry out feature extraction for the printing at document and/or coordinate is caught, so that can retrieve feature on the printer page and the layout of figure, software application.Can constantly automatically catch layout in printing, that is, and the two-dimensional arrangement of the text on the printer page.For example, trapping module 3735 is carried out all text and printer order, and in addition, x-y coordinate and the further feature of each character and/or image during intercepting and record are printed and represented.According to an embodiment, trapping module 3735 is caught DLL for printing as the described herein, allows the interpolation of function of existing DLL or the forwarding dynamic link libraries (DLL) of change.The more detailed description of the function of trapping module 3735 is described with reference to Figure 44.
Those skilled in the art will identify the output that trapping module 3735 is connected to browser 3715, so that data capture.Alternately, can in printer driver, directly realize the function of trapping module 3735.In one embodiment, trapping module 3735 is equal to as with reference to figure 3 described PD trapping modules 318.
Page_desc.xml 3740 is extend markup language (" XML ") file, and the function call for the text-dependent of processing by trapping module 3725 can write the output of text-dependent wherein.Page_desc.xml 3740 comprises the one by one coordinate information about the document of all print texts of character of word one by one, and hot information, printer port title, browser title, the date and time of printing and counting (dpi) and resolution (res) information of per inch.Page_desc.xml 3740 is stored in, for example, and in the data-carrier store 3750.Data-carrier store 3750 is equal to the described MMR database 3400 with reference to figure 34A.Figure 42 A-B illustrates the example of the page_desc.xml3740 of html file in more detail.
Hotspot.xml 3745 is for when printed document (for example, as discussed previously, by the operation of print driver 316), the XML file that creates.Hotspot.xml is for describing the symbol focus 3725 results that merge with page_desc.xml 3740.Hotspot.xml comprises the focus identifier information as the content of hot period, coordinate information, dimensional information and focus.Illustrate the example of hotspot.xml file among Figure 43.
Data-carrier store 3750 for known in the art for storage in order together to use any database of the file of changing with method described herein.For example, according to an embodiment, data-carrier store 3750 storage source files 3710, symbol focus describe 3725, page_desc.xml 3740, the page layout through presenting, share explain, document, hotspot's definition and the character representation of image conversion.In one embodiment, data-carrier store 3750 is equal to as with reference to figure 3 described document event databases 320, and is equal to as with reference to the described Database Systems 3400 of figure 34A.
MMR print software 3760 is the software of convenient MMR printing as the parts by computing machine 3705 as described earlier are performed described herein.Hereinafter with reference to Figure 37 B MMR print software 3760 is described in further detail.
Figure 37 B illustrates one group of included in the MMR print software 3760 according to an embodiment of the invention software part.Should be understood that, in computing machine 112,905, acquisition equipment 106, network medium server 114 and other server as the described herein, can comprise all or some MMR print softwares 3760.Although will describe now MMR print software 3760 is the parts that comprise that these are different, those skilled in the art will identify, and MMR print software 3760 can have all any amount of these parts to them.MMR print software 3760 comprises conversion module 3765, merge module 3768, analysis module 3770, modular converter 3775, characteristic extracting module 3778, explains module 3780, focus module 3785, presents/display module 3790 and memory module 3795.
Conversion module 3765 makes it possible to carry out source document is converted into the document of image conversion, from wherein can extracting character representation, and is a kind of method of doing like this.
Merge module 3768 makes it possible to carry out the embedding corresponding to the mark of the sign of the focus in the electronic document, and is a kind of method of doing like this.In a particular embodiment, the starting point of the mark of embedding indication focus and the end point of focus.Alternately, can use the predetermined zone around the embodiment mark, identify the focus in the electronic document.Can use various such tagging schemes.
Analysis module 3770 makes it possible to carry out the mark about the starting point of indication focus, and analytical electron document (being sent to printer), and be a kind of method of doing like this.
Modular converter 3775 makes it possible to proceed to the application program of transformation rule of the part of electronic document, and is a kind of method of doing like this.In a particular embodiment, part is the character stream between the mark of the mark of the starting point of indication focus and the end point of indicating focus.
Characteristic extracting module 3778 makes it possible to carry out feature extraction and coordinate that the printing corresponding to document and focus represents and catches, and is a kind of method of doing like this.Coordinate is caught and is comprised that use transmitting dynamic link libraries branches to print command, and analyzes corresponding to focus or through the printing of the coordinate subset of the character of conversion and represent.Characteristic extracting module 3778 makes it possible to realize the function according to the trapping module 3735 of an embodiment.
Note module 3780 makes it possible to receive the sign of a part of sharing note and its attached document that is associated with shared note, and is a kind of method of doing like this.Receive to share to explain and comprise from the terminal user and from SDA server reception note.
Focus module 3785 makes it possible to carry out the related of one or more montages and one or more focuses, and is a kind of method of doing like this.Focus module 3785 also makes it possible to carry out by at first indicating the position of the focus in the document, and definition montage emerging with the hotspot's definition that is associated with focus.
Present/display module 3790 makes it possible to present or the printing of display document or document represents, and be a kind of method of doing like this.Memory module 3795 makes it possible to carry out various files, comprises page layout, the storage of the document of image conversion, hotspot's definition and character representation, and is a kind of method of doing like this.
Software section 3765-3795 does not need the software module of separating.Shown software configuration only means as an example; As will be clearly according to this open invention, by with can expect other configuration within the scope of the invention.
Embedding hot spots in document
Figure 38 illustrate according to an embodiment of the invention in document the flow process of the method for embedding hot spots.
According to the method, embedding 3810 is corresponding to the mark of the sign of the focus in the document in document.In one embodiment, receive the document that comprises the focus mark position, in browser, to show, for example, receive document at browser 3715 from source file 3710.Focus comprises other document object that some texts or image pattern or photo are such, and electronic data.Electronic data can comprise the multimedia as audio or video, and perhaps it can be one group of step will carrying out at acquisition equipment during focus when access.For example, if document is HTML (Hypertext Markup Language) (HTML) file, then browser 3715 can be InternetExplorer, and sign can be the URL(uniform resource locator) (URL) in the html file.Figure 39 A illustrates the example of the such html file 3910 with URL 3920.Figure 40 A illustrates such as browser 4010, for example, Internet Explorer, in the text of html file 3910 of shown Figure 39 A.
In order to embed 3810 marks, the plug-in unit 3720 of browser 3715 with independent recognizable reference mark around each focus mark position, to create focus.In one embodiment, shown document in the plug-in unit 3720 change browsers 3715, for example, shown HTML among the Internet Explorer of continuation example above, and insert the focus mark position (for example, URL) is placed mark or label in the bracket.Perhaps in browser 3715 or check that in the printing edition of document the terminal user of document discovers less than mark, but in print command, can detect this mark.In this example, use the new font that is referred to herein as MMR Courier New, add beginning and finish reference mark.In MMRCourier New font, represent about the exemplary glyph of character " b ", " e " or dot pattern represents and numeral by the space.
Refer again to the exemplary html page shown in Figure 39 A and the 40A, plug-in unit 3720 inserts 3810 reference marks " b0 " in the beginning (" here ") of URL, and inserts 3810 reference marks " e0 " in the ending of URL, to indicate focus with identifier " 0 ".Because b, e and numerical character all illustrate as the interval, the user only can see the change that maybe can not see the outward appearance of document seldom.In addition, as shown in Figure 41, plug-in unit 3720 creates the symbol focus of these marks of indication and describes 3725.It is 0 4120 that the symbol focus is described the hot period of 3725 identifications, and it is corresponding to 0 in " b0 " and " e0 " reference mark.In this example, the symbol focus is described 3725 and is stored in, for example, and data-carrier store 3750.
As shown in Figure 39 B, plug-in unit 3720 returns the version of " through mark " of HTML3950 to browser 3715.Through the HTML3950 of mark take with Font Change as the leap label 3960 of No. 1 MMR Courier New around reference mark i.Because b, e and numerical character illustrate as the interval, the user only can see the change that maybe can not see the outward appearance of document seldom.Be the example of file 3730 through change through the HTML 3950 of mark.For the sake of simplicity, this example uses the single page model, yet the multi-page model uses identical parameter.For example, if focus is crossed over page boundary, then it will have the reference mark corresponding to each page location, be identical about each focus identifier.
Next, the response print command catches 3820 corresponding to the coordinate of printing expression and focus.In one embodiment, trapping module 3735 " branches to " text and the drawing command in the print command.Trapping module 3735 is carried out all text and drawing command, and in addition, x-y coordinate and the further feature of each character and/or image during intercepting and record are printed and represented.In this example, trapping module 3735 relates to the device scene (DC) of printing expression, it will depend on output format (namely for definition, printer, window, file layout, memory buffer unit, etc.) and the handle of the structure that the printing of the text of output and/or the attribute of image represents.In the process of the coordinate of catching 3820 printing expressions, use the reference mark that embeds among the HTML can identify at an easy rate focus.For example, when running into beginning label, if recorded all characters, can find until the x-y position of end mark.
According to an embodiment, trapping module 3735 is referred to herein as " DLL is caught in printing " for transmitting DLL, and it allows interpolation or the change of the function of existing DLL.Transmit DLL In the view of the client fully as original DLL, yet, will call be forwarded to target (original) DLL before, other code (" branching to ") is added into some or all of functions.In this example, print and to catch DLL and be the forwarding DLL about Windows Graphics Device Interface (Windows GDI) DLL gdi32.dll.Gdi32.dll has and surpasses 600 output functions, and they are all all needs to be forwarded.DLL is caught in printing, is referred to herein as gdi32_mmr.dll, allows the client to catch printout from any window application that uses DLL gdi32.dll to draw, and it only need to carry out at local area computer, even be printed to remote server.According to an embodiment, with gdi32_mmr.dll RNTO gdi32.dll, and be copied to C: Windows system32, impel its monitoring from the almost printing of each window application.According to another embodiment, with gdi32_mmr.dll called after gdi32.dll, and be copied to the master catalogue of monitoring the application program of printing about it.For example, be used on the monitoring Windows XP InternetExplorer C: Program Files Internet Explorer.In this example, only this application program (for example, Internet Explorer) will automatically be called and print the function of catching among the DLL.
Figure 44 illustrates the process flow diagram of the employed process of forwarding DLL according to an embodiment of the invention.The function call that DLL gdi32_mmr.dll at first receives 4405 sensing gdi32.dll is caught in printing.In one embodiment, gdi32_mmr.dll receives all function calls of pointing to gdi32.dll.Approximate 200 of total function call that gdi32.dll monitoring is about 600, it is used for affecting in some mode the function of the outward appearance of printer page.Thereby, print catch DLL next determine 4410 receive whether call be monitored function call.If what receive calls the function call that is not monitored, then this calls and walks around step 4415 until 4435, and transmits 4440 to gdi32.dll.
If it is monitored function call, then next the method determines whether 4415 function calls specify the print apparatus scene (DC) of " newly ", that is, and and the printer DC that does not before also receive.This is by checking that with respect to the internal DC table printer DC determines.As previously mentioned, the target that DC encapsulation is used for drawing (it may be printer, memory buffer unit, etc.), and picture font, color, etc. the same drawing setting.Carry out all mapping operations (for example, LineTo (), DrawText (), etc.) at DC.If printer DC is not new, so there has been the memory buffer unit corresponding to printer DC, and skips steps 4420.If printer DC is new, then create the 4420 memory buffer unit DC corresponding to new printer DC.This memory buffer unit DC mirrors the outward appearance of printer page, and in this example, is equal to above related printing and represents.Thereby, when printer DC is added into the internal DC table, create the memory buffer unit DC (and memory buffer unit) of identical dimensional, and make it to be associated with printer DC in the internal DC table.
Whether next gdi32_mmr.dll determines 4425 to call be the function call of text-dependent.Approximate 12 of calling of 200 monitoring gdi32.dll are text-dependents.If it is not that then skips steps 4430.If function call is text-dependent, then the output with text-dependent writes the 4430xml file, is referred to herein as page_desc.xml3740, as shown in Figure 37 A.Page_desc.xml3740 is stored in, for example, and data-carrier store 3750.
Figure 42 A and 42B illustrate the exemplary page_desc.xml3740 about html file 3910 examples of discussing with reference to figure 39A and 40A.Page_desc.xml3740 comprises one by one x, y, width and height, one by one word 4210 (for example, character 4220 (for example, G) the coordinate information of all print texts Get) and one by one.Coordinate is that the form of putting exists, and it is the printing equivalent with respect to the pixel in the upper left corner of the page, except as otherwise noted.Page_des c.xml3740 also comprises the hot information that is in " sequence " form, for example beginning label 4230 and end mark 4240.For the focus (for example, page N is to page N+1's) of crossing over page boundary, it all illustrates on two pages (N and N+1); Focus identifier in two kinds of situations all is identical.In addition, comprise the information that other is important among the page_desc.xml3740, for example the printer port title 4250, and it can be to following generation significant impact: the .xml that produces and .jpeg file, browser 3715 (or application program) title 4260 and the date of printing and time 4270 and counting (dpi) and resolution (res) about the per inch of the page 4280 and printable area 4290.
Refer again to Figure 44, and then call and be not determining of text-dependent, perhaps and then the output of text-dependent is write 4430 page_desc.xml3740, gdi32_mmr.dll carries out 4435 about the function call on the memory buffer unit of DC.This step 4435 provides the output to printer, obtains equally the output of the memory buffer unit to the local area computer.Then, when increasing the page, compress the content of memory buffer unit, and write out with the form of JPEG and PNG.Then function call is transmitted 4440 to gdi32.dll, it is as normally carrying out it.Refer again to Figure 38, present 3830 and comprise the page layout that the printing that comprises focus represents.In one embodiment, present 3830 and comprise printed document.Figure 40 B illustrates the example of printing edition 4011 of the html file 3910 of Figure 39 A and 40A.Notice that for the terminal user, reference mark is not obviously perceptible.The layout that presents is saved to, for example, and data-carrier store 3750.
According to an embodiment, print and to catch DLL the symbol focus is described data and page_desc.xml3740 in 3725, for example, as shown in Figure 42 A-B, integrate with hotspot.xml 3745, as shown in Figure 43.In this example, when printed document, create hotspot.xml 3745.Example among Figure 43 illustrates focus 0 and appears at x=1303, y=350, and be that 190 pixels are wide and 71 pixels high.The content of focus is shown equally, that is, and http://www.ricoh.com.
Alternative embodiment according to trapping module 3820, filtrator in Microsoft XPS (XML the prints explanation) print driver, usually be known as " XPSDrv filtrator ", receive the text drawing command, and create page_desc.xml file as described above.
Obvious perceptible focus
Figure 45 illustrates conversion according to an embodiment of the invention corresponding to the process flow diagram of the method for the character of the focus in the document.The method is changed printed document with indicating terminal user and the mode that presents the MMR identification software of focus.
At first, receive 4510 electronic documents that will print as character stream.For example, can receive 4510 documents at printer driver or in the software module that can filter character stream.In one embodiment, receive 4510 documents at browser 3715 from source file 3710.Figure 46 illustrates the example of the electronic edition of document 4610 according to an embodiment of the invention.Document 4610 in this example has two focuses, and one is associated with " listing hereinafter ", and one is associated with " possible prior art ".According to an embodiment, for the terminal user, focus is not obviously perceptible.Can by with reference to the described coordinate catching method of Figure 38, perhaps according to other method described herein any one, set up focus.
For beginning label is analyzed 4520 documents, the beginning of indication focus.Beginning label can be reference mark as described earlier, or the mark of any other independent recognizable identification focus.In case find beginning label, just transformation rule is applied to the part of 4530 documents, that is, the character of beginning label and then is until find end mark.According to an embodiment, transformation rule impels the visible change corresponding to the part of the document of focus, for example by change character font or color.In this example, can be with original font, for example, Times New Roman is for conversion into different known fonts, for example, OCR-A.In another example, present text with different font colors, for example, blue #F86A.According to an embodiment, process and the process as described above of conversion font are similar.For example, if document 4610 is html files, then when in document 4510, running into reference mark, instead of fonts in html file just.
According to an embodiment, finish switch process by the plug-in unit 3720 of browser 3715, output is through the document 3730 of change.Figure 47 illustrates the example of the document 4710 of printing change according to an embodiment of the invention.As illustrated in, from remaining text, focus 4720 and 4730 is visually recognizable.Especially, based on its different font, focus 4720 is visually recognizable, and based on its different colors and underscore, focus 4730 is visually recognizable.
Next, the document that will have the part of conversion presents 4540 becomes page layout, comprises the position of the focus in electronic document and the electronic document.In one embodiment, presenting document is printed document.In one embodiment, any according to the method for doing so described herein presents on the document that is included in the part with conversion and carries out feature extraction.In one embodiment, feature extraction comprises that according to an embodiment, the response print command is caught the page coordinates corresponding to electronic document.Then be the subset analytical electron document corresponding to the coordinate of the character of changing.According to an embodiment, the trapping module 3735 of Figure 37 A carries out feature extraction and/or coordinate is caught.
The MMR identification software uses identical each image of transformation rule pre-service.At first it seeks the text that follows the principles, and for example, it is OCR-A or blue #F86A, and then it uses the recognizer of its standard.
This aspect of the present invention is favourable, because it has reduced the computational load of MMR identification software fully, because it uses the very simple image pre-service routine of eliminating a large amount of computing costs.In addition, for example, as discussing with reference to figure 51A-D, such as the encirclement square frame on the part of document, by eliminating a large amount of alternative solution that from select, to use, and improve the accuracy of feature extraction.In addition, visible which text of change indicating terminal user (or other document object) of text is the part of focus.
Shared document is explained
Figure 48 illustrates the process flow diagram of the method for shared document note according to an embodiment of the invention.The method so that the user can in the environment of sharing, append notes to document.Among the described embodiment, shared environment is the webpage that various users are just consulting hereinafter; Yet according to other embodiment, shared environment can be therein any environment of shared resource, for example working group.
According to the method, at browser, for example browser 3715, middle demonstration 4810 source documents.In one embodiment, from source file 3710 reception sources documents; In another embodiment, source document is by network, for example, and Internet connection, received webpage.Use the webpage example, Figure 49 A illustrates the sample source webpage 4910 in the browser according to an embodiment of the invention.In this example, webpage 4910 is about the game relevant with popular child's books characteristic, the Jerry Butter Game, html file.
After the demonstration 4810 of source document, receive the sign of the part of the 4820 shared source documents of explaining and being associated with shared note, this is shared note and is associated with source document.For being described clearly, use in this example single note, yet a plurality of note is possible.In this example, explain and be data or employed reciprocation among the MMR as discussed in this.According to an embodiment, explain and to be stored in shared document annotation server (SDA server), 3755 shown in Figure 37 A for example, and receive by the retrieval from this server.In one embodiment, by the addressable SDA server 3755 of network connection.The plug-in unit of shared retrieval of explaining facilitates this ability in this example, for example, and the plug-in unit 3720 as shown in Figure 37 A.According to another embodiment, receive note and sign from the user.The user can create the shared note about the document that does not have any note, perhaps can add or change existing the sharing of document and explain.For example, the user can highlight the part of source document, about indicating it with sharing explain related, is also provided via various methods described herein by the user.
Next, in browser, show 4830 documents through change.Document through changing comprises the focus corresponding to the part of the source document of indicating in step 4820.Focus is specified and is shared the position of explaining.According to an embodiment, through the part of file 3730 through change of document for being created by plug-in unit 3720 of change, and be back to browser 3715.Figure 49 B illustrates the webpage 4920 of sample through changing in the browser according to an embodiment of the invention.The note 4940 that webpage 4920 illustrates the sign of focus 4930 and is associated, it is the video clipping in this example.Can visually distinguish sign 4930 from remaining webpage 4920 texts, for example, by highlighting.According to an embodiment, when clicking sign 4930 or mouse being moved past tense, explain 4940 and show.
The response print command is caught 4840 text coordinate and the focuses that represent corresponding to the printing of document through change.The details that coordinate is caught is any one according to about the method for that purpose described herein.
Then, present 4850 and comprise the page layout that the printing of focus represents.According to an embodiment, present 4850 and be printed document.Figure 49 C illustrates sample printing network page 4950 according to an embodiment of the invention.Printing network page layout 4950 comprises such as the focus 4930 of indicating, yet the row in the printing layout 4950 interrupts being different from webpage 4920.In this example, the border of focus 4930 is sightless at printing layout 4950.
In optional last step, will share note and be stored in partly, for example, data-carrier store 3750, and use its with printed document 4950 in the related demarcation index of focus 4930.Also can preserve partly printing expression.In one embodiment, the printing behavior triggers partial copy download and establishment originally.
The focus of the document of image conversion
Figure 50 A illustrates interpolation focus according to an embodiment of the invention to the process flow diagram of the method for image conversion document.The method allows after scanning focus to be added into paper document, perhaps after printing presents, focus is added into the symbol electronic document.
At first, be the document of image conversion with source document conversion 5010.According to an embodiment, at browser 3715 from source file 3710 reception sources documents.Conversion 5010 is by producing any method of the document that can carry out feature extraction thereon, to produce character representation.According to an embodiment, paper document is scanned to become the document of image conversion.According to another embodiment, use suitable application program to present presented in the page sample of electronic document.For example, be the PostScript form if can present page sample, then use Ghostscript.Figure 51 A illustrates the example of the user interface 5105 of the part that the newsprint page 5110 that scans according to an embodiment is shown.Main window 5115 illustrates the part of the amplification of the newsprint page 5110, and sketch map 5120 illustrates which part of positive display page.
Next, feature extraction is used 5020 in the document of image conversion, to create character representation.For this purpose can be used any of various feature extracting methods described herein.According to an embodiment, by carrying out feature extraction with reference to the described trapping module 3735 of figure 37A.Then one or more focuses 5125 are added 5030 documents to image conversion.According to diversified embodiment, can pre-definedly maybe can need to define focus.If defined focus, then definition comprise the focus on page number, the page bounding box coordinate position and electronic data or be attached to the reciprocation of focus.In one embodiment, as illustrated among Figure 43, hotspot's definition is taked the form of hotspot.xml file.
If undefined focus also, then the terminal user can define focus.Figure 50 B illustrates definition according to an embodiment of the invention for the process flow diagram of the method for the focus that is added into the image conversion document.At first, select 5032 candidate's focuses.For example, in Figure 51 A, the terminal user has used bounding box 5125 to select the part of document as focus.Next, about given database, determine in optional step 5034 whether unique focus is.For example, n around " * n " should there be enough texts in the fragment, to identify uniquely focus.The example of the representative value of n is 2.If for database, focus is not sufficiently unique, then about how processing among the unclean embodiment, option is presented to the terminal user.For example, it is alternative that user interface can provide, and for example selects larger zone, and it is ambiguous perhaps to accept, but its description is added into database.Other embodiment can use other method of definition focus.
In case select 5032 hotspot location, just define 5036 data or reciprocation, and it be attached to focus.Figure 51 B illustrates be used to defining data or reciprocation, with the user interface related with selected focus.For example, in case the user has selected bounding box 5125, just show edit box 5130.The button that use is associated, the user can cancel 5135 operations, only preserves simply 5140 bounding boxes 5125, perhaps data or reciprocation is assigned 5145 to focus.Task focus if user selection divides data or reciprocation, then show and assign frame 5150, as shown in Figure 51 C.Assign frame 5150 and allow the terminal user that image 5155, various other medium 5160 and network linking were tasked focus in 5165 minutes, it is by ID number 5170 identifications.Then the user can select to preserve 5175 hotspot's definitions.Although for the sake of simplicity, single focus has been described, a plurality of focuses are possible.Figure 51 D illustrates the user interface for the focus 5125 in the display document.In one embodiment, the bounding box of different colours is corresponding to different data and reciprocation type.
In optional step, the document of image conversion, hotspot's definition and character representation are stored in 5040 together, for example, and data-carrier store 3750.Figure 52 illustrates the method 5200 of use MMR document 500 according to an embodiment of the invention and the 100b of MMR system.
The method 5200 begins by the expression of obtaining 5,210 first documents or the first document.The illustrative methods of obtaining the first document comprises following: (1) obtains the first document by the text layout of the printed document in the operating system of automatically catching MMR computing machine 112 via PD trapping module 318; (2) by the text layout of the printed document in the printer driver 316 of automatically catching MMR computing machine 112, obtain the first document; (3) by via being connected to, for example, the standard document scanner device 127 scanning paper documents of MMR computing machine 112 obtain the first document; And (4) by transmitting automatically or manually, upload or download, for the file of the expression of printed document to MMR computing machine 112, obtain the first document.Although described obtaining step and be the great majority that obtain printed document or all, should be understood that, can be only carry out obtaining step 5210 about the part of the minimum of printed document.In addition, although with regard to obtaining single document, described the method, can carry out this step, obtaining many documents, and create the first document library.
In case carried out obtaining step 5210, the method 5200 is just carried out 5212 index proving operations at the first document.The index proving operation allows the respective electronic of document to represent and about the identification of the second medium type that is associated of the input that is complementary with the first document that obtains or its part.In an embodiment of this step, carry out the document index proving operation that produces PD index 322 by PD trapping module 318.Exemplary index proving operation comprises following: (1) indexs for the x-y position of the character of printed document; (2) index for the x-y position of the word of printed document; (3) index for the x-y position of the part of the image in image or the printed document; (4) carry out the operation of OCR image conversion, and index correspondingly for the x-y position of character and/or word; (4) carry out from the Characteristic of Image extraction that presents the page, and index for the x-y position of feature; And the feature extraction on the symbol version of (5) simulation page, and index for the x-y position of feature.Index proving operation 5212 can comprise any one or group of the above-mentioned index proving operation that depends on application program of the present invention.
The method 5200 is also obtained 5,214 second documents.In this step 5214, the second document that obtains can be whole document or the part of the second document (fragment) only.The illustrative methods of obtaining the second document comprises following: (1) relies on one or more catch mechanisms 230 of acquisition equipment 106, scan text fragment; (2) one or more catch mechanisms 230 of dependence acquisition equipment 106, the scan text fragment, and subsequently, pretreatment image is to determine correctly to extract the possibility of the feature description of being wanted.For example, if index is based on OCR, then system may determine whether image comprises line of text, and operates for the OCR of success, and whether image definition is enough.If this is determined unsuccessfully, then scan another text fragment; (3) machine-readable identification of the document that scans of scanning recognition symbol (for example, International Standard Book Number (ISBN) or univeraal product code (UPC) code); (4) data of the desired document of input identification or one group of document (for example, motion illustrated supplement magazine 2003 editions), and subsequently, by project (1) or (2) of using this method step, scan text fragment; (5) receive the Email with second appended document; (6) receive the second document by the document transmission; (7) part of one or more catch mechanisms 230 scan images of usefulness acquisition equipment 106; And (9) input the second document with input media 166.
In case carried out step 5210 and 5214, the method is just carried out document or the pattern match between 5,216 first documents and the second document.In one embodiment, this is undertaken by the document fingerprint matching of carrying out the second document to the first document.By inquiry PD index 322, perform document fingerprint matching operation on the second medium document.The example of document fingerprint matching with the synthetic descriptor of those features, and is searched document and the fragment of a part that comprises those descriptors for to extract feature in step 5214 from the image of catching.Should be understood that, can repeatedly carry out this pattern match step, about each document once, wherein whether the many documents of database storage are complementary to determine any document and the second document in storehouse or the database.Alternately, index demarcating steps 5212 is added into the index that represents document sets with document 5210, and execution pattern coupling step once.
At last, the method 5200 carries out 5218 based on the result of step 5216 and alternatively based on the action of user's input.In one embodiment, the method 5200 search with, for example, be stored in the predetermined action that given document fragment in the second medium 504 is associated, the focus 506 that finds as coupling in this second medium 504 and the step 5216 is associated.The example of predetermined action comprises: (1) is from document event database 320, the Internet or other place, retrieving information; (2) information is write the position that the 100b of MMR system of the output of preparing receiving system examines; (3) search information; (4) at client apparatus, for example on the acquisition equipment 106, demonstration information, and guiding and user's interactive sessions; (5) inquiry determined action and data in method step 5216 are in order to carry out after a while (user's participation can be optional); And (6) carry out determined action and data in method step 5216 immediately.The example results of this method step comprises information, through the execution of retrieval, some other actions of the document of change (for example, the purchase of stock or product) or be sent to wired TV box, for example set-top box 126, the input of order, this set-top box (for example is connected to wired TV server, ISP's server 122), it is back to wired TV box with video.In case carry out step 5218, the method 5200 is finished and is finished.
Figure 53 illustrates the block diagram of the exemplary one group commercial entity 5300 related with the 100b of MMR system according to an embodiment of the invention.5300 groups of commercial entities comprise MMR ISP 5310, MMR consumer 5312, Guzman Dennis M. De 5314, printer user 5316, cellular telephone services supplier 5318, hardware manufacturer 5320, hardware retailer 5322, financial institution 5324, credit card processor 5326, document publisher 5328, document print machine 5330, honour an agreement merchant 5332, wired TV supplier 5334, ISP 5336, software provider 5338, advertising company 5340 and commercial network 5370.
MMR ISP 5310 is as referring to figs. 1A to the owner and/or the supvr of 5 and 52 described MMR systems 100.As previous described with reference to Figure 1B, MMR consumer 5312 is any MMR user's 110 representative.
Guzman Dennis M. De 5314 is any supplier of digital multimedia product, for example Blockbuster Inc (Dallas, TX), it provides digital movie and video-game, and the U.S. (it provides digital music, film and TV to show for New York, Sony NY).
Printer user 5316 is in order to produce the printing paper document, to utilize any independent entity of any printer of any kind.For example, MMR consumer 5312 can be printer user 5316 or document print machine 5330.
Cellular telephone services supplier 5318 is any cellular telephone services supplier, Verizon Wireless (Bedminster for example, NJ), Cingular Wireless (Atlanta, GA), T-Mobile USA (Bellevue, WA) and Sprint Nextel (Reston.VA).
Hardware manufacturer 5320 is the manufacturer of any hardware unit, for example the manufacturer of printer, cellular phone or PDA.Exemplary hardware manufacturer comprises Hewlett-Packard (Houston, TX), Motorola, Inc, (Schaumburg, IL) and the U.S. (New York, Sony Corporation NY).Hardware retailer 5322 is the retailer of any hardware unit, for example the retailer of printer, cellular phone or PDA.Exemplary hardware retailer comprises RadioShack Corporation (FortWorth, TX), Circuit City Stores, Inc. (Richmond, VA), Wal-Mart (Bentonville, AR) and Best Buy Co. (Richfield, but be not limited to this MN).
Financial institution 5324 is any financial institution, for example for the treatment of bank account and fund to and from any bank or the credit cooperative of the transmission of other bank or financial institution.Credit card processor 5326 is the credit card mechanism of the ratification process of any managerial credit card authentication and purchase-transaction.Inc. (Eden Prairie, MN) and CCNow Inc. (Eden Prairie, MN), but be not limited to this.
Document publisher 5328 is any document publishing company, for example, Gregath PublishingCompany (Wyandotte, OK), Prentice Hall (Upper Saddle River, NJ) and Pelican Publishing Company (Gretna, but be not limited to this LA).Document print machine 5330 is any document print company, for example, PSPrint LLC (Oakland CA), PrintLizard, Inc. (Buffalo, NY) and Mimeo, Inc. (New York, NY), but be not limited to this.In another example, document publisher 5328 and/or document print machine 5330 are any entity of generation and distribution newsprint or magazine.
As the well-known, honour an agreement merchant 5332 for being specially adapted for any third-party logistics warehouse of fulfiling of order.The exemplary merchant that honours an agreement comprises Corporate Disk Company (McHenry, IL), OrderMotion, Inc. (New York, NY) and Shipwire.com (Los Angeles, CA), but be not limited to this.
Wired TV supplier 5334 is any wired TV ISP, for example, and ComcastCorporation (Philadelphia, PA) and Adelphia Communications (GreenwoodVillage, CO), but be not limited to this.ISP 5336 is the representative of any entity of service that any kind is provided.
Software provider 5338 is any software supplier, for example, and Art﹠amp; Logic, Inc. (Pasadena, CA), Jigsaw Data Corp. (San Mateo, CA), DataMirror Corporation (New York, NY), DataBankIMX, LCC (Beltsville, but be not limited to this MD).
Advertising company 5340 is any advertising company or agency, for example, and D and BMarketing (Elhurst, IL), BlackSheep Marketing (Boston, MA) and GothamDirect, Inc. (New York, but be not limited to this NY).
Commercial network 5370 is the representative by any mechanism of its foundation and/or convenient commercial relations.
Figure 54 illustrates according to an embodiment of the invention method 5400, and it is for passing through to use easily general business method of the MMR 100b of system.Method 5400 comprises step: opening relationships between at least two entities, determine possible business transaction; Carry out at least one business transaction and payment about product or the service of this transaction.At first, between at least two commercial entities 5300, set up 5410 relations.For example, can be at four categories widely, arrange commercial entity 5300 such as (1) MMR founder, (2) MMR distribution person, (3) MMR user and (4) in other, and some commercial entity can belong to a more than category within it.According to this example, commercial entity 5300 is classified as follows:
● MMR founder-MMR ISP 5310, Guzman Dennis M. De 5314, document publisher 5328, document print machine 5330, software provider 5338 and advertising company 5340;
● MMR distribution person-MMR ISP 5310, Guzman Dennis M. De 5314, cellular telephone services supplier 5318, hardware manufacturer 5320, hardware retailer 5322, document publisher 5328, document print machine 5330, merchant 5332, wired TV supplier 5334, ISP 5336 and advertising company 5340 honour an agreement;
● MMR user-MMR consumer 5312, printer user 5316 and document print machine 5330; And
● other-financial institution 5324 and credit card processor 5326.
For example in this method step, the MMR ISP 5310 as MMR founder, with the MMR consumer 5312 as MMR user, and as setting up commercial relations between MMR distribution person's cellular telephone services supplier 5318 and the hardware retailer 5322.In addition, hardware manufacturer 5320 has commercial relations with hardware retailer 5322, and it both is MMR distribution person.
Next, method 5400 determines that 5412 have business transaction possible between the group of the relation of setting up in step 5410.Especially, between any two or more commercial entities 5300 multi-exchange can occur.Exemplary transaction comprises: purchase information; Buy actuals; Buy service; Buy bandwidth; Buy Electronic saving; Buy advertisement; Buy the advertistics amount; Transport commodity; Sale information; Sell actuals; Sell service; Sell bandwidth; Sell electronic memory; Sell advertisement; Sell the advertistics amount; Lease/hire out; And opinion collection/grading/ballot.
In case method 5400 has been determined business transaction possible between the group, just reach the agreement of 5414 at least one business transaction with MMR system 100.The result's of conduct transaction various motion may occur between any two or more commercial entities 5300 especially.Exemplary action comprises: purchase information; Receive order; For more information point is advanced; Create advertising space; Part/remote access is provided; Sponsor; Transport; Create commercial relations; The storage private information; Information is passed to other object; Add content; And blog.
In case method 5400 has been reached the agreement of business transaction, just use MMR system 100 to pay product or the service of 5416 these transaction, for example, to MMR consumer 5312.Especially, as the result of the business transaction of in method step 5414, reaching, between any two or more commercial entities 5300, can exchange plurality of kinds of contents.Exemplary content comprises: text; Web page interlinkage; Software; Still photo; Video; Audio frequency; With above any combination.In addition, for the facility transaction, between any two or more commercial entities 5300, can utilize multiple delivery mechanisms.Exemplary delivery mechanisms comprises: paper; Personal computer; Network computer; Acquisition equipment 106; The individual video device; Personal audio set; With above any combination.
Except the invention that requires and describe such as institute among the above-mentioned embodiment, at least one aspect of one or more embodiment of the present invention is provided for providing the computer implemented method of mixed media document.Described method comprises that reception (at the concordance list place) is from the electronics description of the feature of paper document extraction.Concordance list is associated with the mixed media document of combination medium that print and numeral for being used for the position with paper document and the feature in described document.Based on the data from concordance list, described method continues to receive the query term of the two-dimentional relation between the object that is captured in the destination document, and calculates at least one mixed media document and potentially in response to the hypothesis on location of query term.Under a particular condition, method further comprises storage characteristic that be associated with destination document, other.Under a described situation, other characteristic comprises one or more actions, and described action comprise retrieval, the graphical information of text message retrieval, implementation, fill order, place an order, retrieve video, retrieval sound, storage information, create new document, printed document and/or display document.Under another particular case, reception (at the concordance list place) comprises the electronics description that receives from the feature of a plurality of paper documents extractions from the electronics description of the feature that paper document extracts.Under another particular case, calculate at least one mixed media document and hypothesis on location and comprise based on the data from concordance list, calculate graduate one group of mixed media document, the page and hypothesis on location.In another embodiment, the query term that receives the two-dimentional relation between the object be captured in the destination document comprises and receives the adjacent word of one group of horizontal and vertical extracting from destination document pair.Under another particular condition, concordance list comprises is inverted the entry index table, and wherein item in being inverted concordance list, that each is unique points to a row record, and each is recorded on the page in the mixed media document and identifies the candidate region.Under a described situation, calculate at least one mixed media document and hypothesis on location and comprise inspection by each record of indexing corresponding to a key point of query term, and the identification zone the most consistent with all query terms.Satisfy scoring match-on criterion, coupling if the zone of identifying has, so described method may further include confirms corresponding mixed media document and hypothesis on location.Under a particular condition, destination document is the image of the fragment of paper document or described paper document.At least one additive method of one or more embodiment of the present invention provide by the machine readable media of instruction encoding (as, one or more compact disks, floppy disk, server, memory stick or hard disk drive, ROM, RAM or be suitable for any type of the medium of store electrons instruction), and when being carried out by one or more processors, it impels processor to carry out the process that is used for providing mixed media document.Described process can for, for example, similar in appearance to method as described herein or be its variant.
The Database Systems that at least one additive method of one or more embodiment of the present invention provides, and described system provides mixed media document.System comprises concordance list, and it is used for receiving the electronics description from the feature of paper document extraction, and the position of paper document and the feature in described document is associated with the mixed media document of being combined with medium numeral that will print.System further comprises accumulator module, and it is based on the data from concordance list, receives the query term of the two-dimentional relation between the object that is captured in the destination document, and calculates at least one mixed media document and potentially in response to the hypothesis on location of query term.Under a particular condition, system comprises storage facility (such as, relational database), the other characteristic that its storage is associated with destination document.Under a described situation, other characteristic comprises one or more actions, and described action comprise retrieval, the graphical information of text message retrieval, implementation, fill order, place an order, retrieve video, retrieval sound, storage information, create new document, printed document and/or display document.Under another particular condition, concordance list can receive from the electronics of the feature of a plurality of paper documents extractions and describe.Under another particular condition, paper document comprises a plurality of pages, and in order to identify mixed media document, the page and the x-y feature locations in the described page, comes further configuration index table.Under another particular condition, at least one mixed media document of calculating and the hypothesis on location carried out by accumulator module comprise based on the data from concordance list, calculate graduate one group of mixed media document, the page and hypothesis on location.Under another particular condition, the query term that the reception of being carried out by accumulator module is captured in the two-dimentional relation between the object in the destination document comprises and receives the adjacent word of one group of horizontal and vertical extracting from destination document pair.Under another particular condition, concordance list comprises is inverted the entry index table, and wherein item in being inverted concordance list, that each is unique points to a row record, and each is recorded on the page in the mixed media document and identifies the candidate region.Under a described situation, at least one mixed media document of calculating and the hypothesis on location carried out by accumulator module comprise inspection by each record of indexing corresponding to a key of query term, and the identification zone the most consistent with all query terms.Satisfy scoring match-on criterion, coupling if the zone of identifying has, accumulator module can be confirmed corresponding mixed media document and hypothesis on location so.Under another described situation, concordance list further comprises document index table, and it comprises relevant information for each mixed media document, comprise at least one the information in print resolution, date printed, paper size, shadow file name and the page-images position.Under another particular condition, by the feature that will extract and the characteristic extracting module that the internal document position data of described feature is associated, calculate the description that is received by concordance list.Under a particular condition, destination document is the image of the fragment of paper document or described paper document.Can be by a plurality of means, for example software (as, instruction that encode at one or more computer-readable mediums, executable), hardware (as, gate-level logic, or one or more ASIC), firmware (as, the microcontroller of the one or more I/O of having performances and be executed in routine this described function, embedded) or its some combinations, realize systemic-function.For example, at one or more server places, or on computer system, or on mancarried device, or its some combinations, can be fulfillment database system in this real border (MMR) of described blending agent system.
In a particular embodiment, the MMR system comprises allowing using based on the mode of searching of the index of document, disposed to be illustrated in two-dimensional geometry relation between the object that extracts from printed document by concordance list, content-based retrieval database.Given data from concordance list can be calculated graduate one group of document, the page and hypothesis on location.The feature that described technology will be surveyed in images fragment converts text items (or other features that can search for) effectively to, two features self of its expression and the geometric relationship between them.Storage facility can be used for the relevant characteristic each file and picture fragment, other of storage.
Be not relevant with any special computing machine or miscellaneous equipment inherently at this algorithm that presents.Can or dispose the system of various general purposes and/or specific purposes according to the embodiments of the invention sequencing.As will be clearly according to this open invention, can realize multiple such system with many programming languages and/or structure.In addition, embodiments of the invention can operate or operate with them in infosystem or network.For example, the present invention can have independent multi-function printer or the network printing hands-operation that depends on the function that disposes and change.The present invention can be with operating to those any infosystem that all function disclosed herein is provided from those with minimum function.
Be the purpose of explaining and describing, presented the aforementioned description of embodiments of the invention.But do not meaning detailed or limiting the invention to disclosed precise forms.According to above-mentioned religious doctrine, many changes and change all are possible.Mean scope of the present invention and can't help this detailed description restriction, but by the claim restriction of this application.As be familiar with those skilled in the art and will understand, can embody the present invention with other specific form, and not deviate from its spirit or intrinsic propesties.Equally, the special name of module, routine, feature, attribute, method and other side and part are not enforceable or important, and the mechanism of the present invention or its feature that realizes can have different titles, part and/or form.In addition, as will be clearly for the person of ordinary skill in the relevant, module of the present invention, routine, feature, attribute, method and other side can realize as software, hardware, firmware or this any combination of three.Equally, be that the parts of the present invention of module are realized Anywhere as software at an one example, these parts also can be as independent program, as the part of larger program, as the program of a plurality of separation, as static state or dynamic link library, as the core loadable module, as device driver and/or for the those of ordinary skill in the field of computer programming, present or in the future known each and any alternate manner, and realize.In addition, the present invention is in no way limited in any specific programming language or about the realization of any specific operating system or environment.In addition, open invention of the present invention meaning illustrative but be not the restriction of scope of the present invention, it is set forth in following claim.
The present invention is based on the S.N.60/710 that U.S.'s priority requisition was submitted on August 23rd, 2005,767, the S.N.60/792 that submitted on April 17th, 2006,912, the S.N.60/807 that submitted on July 18th, 2006,654, the S.N.11/461 that submitted on July 31st, 2006, the S.N.11/461 that on July 31st, 147 and 2006 submitted to, 164, therefore its full content is incorporated into this, for your guidance.

Claims (14)

1. computer implemented method for generation of concordance list comprises:
Produce the electronic representation of described paper document, described electronic representation comprises from the object of the electronic data extraction of paper document and the coordinate of described object;
Coordinate by the object that comprises in the electronic data based on described paper document and object indexs to produce concordance list for the position of a pair of object adjacent one another are in the page or leaf of the electronic data of described paper document.
2. the method for claim 1 further comprises the preliminary step that receives described paper document.
The method of claim 1, wherein described to as if word, and in the page or leaf of the electronic data of described paper document, always arrange a pair of word adjacent one another are with horizontal direction and Vertical Square.
4. the method for claim 1 further comprises:
Receive the step of one or more query terms; And
Based on the step of described concordance list calculating about the position of described query term.
5. method as claimed in claim 4, wherein before receiving one or more query terms:
Receive described destination document;
Create the image of at least one fragment of destination document; And
Based on described image, produce described one or more query term.
6. method as claimed in claim 5 based on described image, produces described one or more query term and comprises and produce that extract from described image, level and vertical word pair.
7. method as claimed in claim 6, wherein calculate at least one mixed media document and hypothesis on location and comprise:
Page fragment, storage the location of most probable being mated described destination document; And
Calculating most probable in the described page is the position at the center of described fragment.
8. method as claimed in claim 7 wherein pair is associated each word with reverse document frequency, and the page at least one fragment, storage the location of most probable being mated described destination document comprises:
To for the right described reverse document frequency of each word, be added to the totalizer of the document file page that presents thereon being indexed by described word; And
In response to surpassing maximal value threshold value, in described totalizer, the output conduct is to the corresponding document file page of the coupling of described fragment.
9. method as claimed in claim 8, the position of wherein calculating in described page most probable and be the center of described fragment comprises:
With weight add to each word to around the district in each unit, wherein determine described weight for each unit by the right reverse document frequency of described word and product between the center in described unit and described district, normalized geometric distance;
Search for described totalizer, corresponding Accum array has peaked described unit in the totalizer with searching; And
In response to the described maximal value that surpasses threshold value, report is as the coordinate of the described unit of the position of described fragment.
10. method as claimed in claim 4, the step of wherein calculating about the position of described query term based on described concordance list comprises:
In described concordance list, search each in described one or more query term, retrieve the one or more positions related with each query term; And
For the position of each identification, identify one or more candidate regions that comprise described position.
11. method as claimed in claim 10 is wherein calculated at least one mixed media document and hypothesis on location and is comprised:
In whole the most consistent, described one or more candidate regions in identification and the described one or more query terms one; And
In response to that satisfied predetermined match-on criterion of determining in described one or more candidate regions, confirm that described zone is the coupling to described destination document.
12. the computer implemented equipment for generation of concordance list comprises:
For generation of the parts of the electronic representation of described paper document, described electronic representation comprises the object that extracts from the electronic data of paper document and the coordinate of described object; And
In the page or leaf of the electronic data of described paper document, index to produce the parts of concordance list for the position of a pair of object adjacent one another are for the coordinate of the object that comprises by the electronic data based on described paper document and object.
13. equipment as claimed in claim 12 further comprises:
For the parts of storage with at least one one or more characteristic that are associated of described feature.。
14. equipment as claimed in claim 12 further comprises:
Be used for receiving the parts of one or more query terms; And
Be used for based on the parts of described concordance list calculating about the position of described query term.
CN200680039477.4A 2005-08-23 2006-08-22 Data organization and access for mixed media document system Expired - Fee Related CN101297318B (en)

Applications Claiming Priority (11)

Application Number Priority Date Filing Date Title
US71076705P 2005-08-23 2005-08-23
US60/710,767 2005-08-23
US79291206P 2006-04-17 2006-04-17
US60/792,912 2006-04-17
US80765406P 2006-07-18 2006-07-18
US60/807,654 2006-07-18
US11/461,147 2006-07-31
US11/461,164 2006-07-31
US11/461,164 US9405751B2 (en) 2005-08-23 2006-07-31 Database for mixed media document system
US11/461,147 US9171202B2 (en) 2005-08-23 2006-07-31 Data organization and access for mixed media document system
PCT/JP2006/316812 WO2007023993A1 (en) 2005-08-23 2006-08-22 Data organization and access for mixed media document system

Publications (2)

Publication Number Publication Date
CN101297318A CN101297318A (en) 2008-10-29
CN101297318B true CN101297318B (en) 2013-01-23

Family

ID=40035652

Family Applications (4)

Application Number Title Priority Date Filing Date
CN200680039532.XA Expired - Fee Related CN101297319B (en) 2005-08-23 2006-08-22 Embedding hot spots in electronic documents
CN2006800393767A Active CN101292258B (en) 2005-08-23 2006-08-22 System and methods for creation and use of a mixed media environment
CN200680039477.4A Expired - Fee Related CN101297318B (en) 2005-08-23 2006-08-22 Data organization and access for mixed media document system
CN2006800393983A Expired - Fee Related CN101292259B (en) 2005-08-23 2006-08-22 Method and system for image matching in a mixed media environment

Family Applications Before (2)

Application Number Title Priority Date Filing Date
CN200680039532.XA Expired - Fee Related CN101297319B (en) 2005-08-23 2006-08-22 Embedding hot spots in electronic documents
CN2006800393767A Active CN101292258B (en) 2005-08-23 2006-08-22 System and methods for creation and use of a mixed media environment

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN2006800393983A Expired - Fee Related CN101292259B (en) 2005-08-23 2006-08-22 Method and system for image matching in a mixed media environment

Country Status (1)

Country Link
CN (4) CN101297319B (en)

Families Citing this family (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010108159A2 (en) * 2009-03-20 2010-09-23 Exbiblio B.V. Associating rendered advertisements with digital content
EP2275916A3 (en) * 2009-06-29 2013-01-23 Kabushiki Kaisha Toshiba Print job managing apparatus, print job managing system, and print job managing method
US9245043B2 (en) * 2009-12-23 2016-01-26 Fuji Xerox Co., Ltd. Embedded media markers and systems and methods for generating and using them
US8332424B2 (en) * 2011-05-13 2012-12-11 Google Inc. Method and apparatus for enabling virtual tags
WO2013115788A1 (en) * 2012-01-31 2013-08-08 Hewlett-Packard Development Company, L.P. Print sample feature set
WO2014024197A1 (en) * 2012-08-09 2014-02-13 Winkapp Ltd. A method and system for linking printed objects with electronic content
US9374517B2 (en) 2012-10-12 2016-06-21 Ebay Inc. Guided photography and video on a mobile device
TWI496016B (en) * 2013-01-02 2015-08-11 104 Corp Method and system for managing hibrid database
JP5998952B2 (en) * 2013-01-25 2016-09-28 富士ゼロックス株式会社 Sign image placement support apparatus and program
JP5967036B2 (en) * 2013-08-22 2016-08-10 富士ゼロックス株式会社 Image search system, information processing apparatus, and program
CN104699707A (en) * 2013-12-06 2015-06-10 深圳先进技术研究院 Data clustering method and device
US10043070B2 (en) * 2016-01-29 2018-08-07 Microsoft Technology Licensing, Llc Image-based quality control
US11599833B2 (en) * 2016-08-03 2023-03-07 Ford Global Technologies, Llc Vehicle ride sharing system and method using smart modules
US10558817B2 (en) * 2017-01-30 2020-02-11 Foley & Lardner LLP Establishing a link between identifiers without disclosing specific identifying information
CN110020108B (en) * 2017-09-12 2023-04-28 腾讯科技(深圳)有限公司 Network resource recommendation method, device, computer equipment and storage medium
CN108446737B (en) * 2018-03-21 2022-07-05 百度在线网络技术(北京)有限公司 Method and device for identifying objects
CN110888993A (en) * 2018-08-20 2020-03-17 珠海金山办公软件有限公司 Composite document retrieval method and device and electronic equipment
CN109034267B (en) * 2018-08-20 2019-07-12 南京乐象网络科技有限公司 Piece caudal flexure intelligent selection device
CN111291167B (en) * 2018-12-07 2023-05-05 宁波方太厨具有限公司 Automatic product paper specification checking method based on image recognition
CN111339387B (en) * 2018-12-18 2023-06-09 阿里巴巴集团控股有限公司 Click feedback acquisition method and device based on information template and electronic equipment
US10846553B2 (en) * 2019-03-20 2020-11-24 Sap Se Recognizing typewritten and handwritten characters using end-to-end deep learning
CN110210470B (en) * 2019-06-05 2023-06-23 复旦大学 Commodity information image recognition system
CN110909726B (en) * 2019-11-15 2022-04-05 杨宏伟 Written document interaction system and method based on image recognition
CN111275043B (en) * 2020-01-22 2021-08-20 西北师范大学 Paper numbered musical notation electronization play device based on PCNN handles
CN112597345B (en) * 2020-10-30 2023-05-12 深圳市检验检疫科学研究院 Automatic acquisition and matching method for laboratory data
CN114511058B (en) * 2022-01-27 2023-06-02 国网江苏省电力有限公司泰州供电分公司 Load element construction method and device for electric power user portrait

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6537324B1 (en) * 1997-02-17 2003-03-25 Ricoh Company, Ltd. Generating and storing a link correlation table in hypertext documents at the time of storage

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6411953B1 (en) * 1999-01-25 2002-06-25 Lucent Technologies Inc. Retrieval and matching of color patterns based on a predetermined vocabulary and grammar
US7475061B2 (en) * 2004-01-15 2009-01-06 Microsoft Corporation Image-based document indexing and retrieval

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6537324B1 (en) * 1997-02-17 2003-03-25 Ricoh Company, Ltd. Generating and storing a link correlation table in hypertext documents at the time of storage

Also Published As

Publication number Publication date
CN101297319A (en) 2008-10-29
CN101292258B (en) 2012-11-21
CN101297318A (en) 2008-10-29
CN101297319B (en) 2013-02-27
CN101292259A (en) 2008-10-22
CN101292258A (en) 2008-10-22
CN101292259B (en) 2012-07-11

Similar Documents

Publication Publication Date Title
CN101297318B (en) Data organization and access for mixed media document system
US7639387B2 (en) Authoring tools using a mixed media environment
US7672543B2 (en) Triggering applications based on a captured text in a mixed media environment
US7991778B2 (en) Triggering actions with captured input in a mixed media environment
US8005831B2 (en) System and methods for creation and use of a mixed media environment with geographic location information
US7551780B2 (en) System and method for using individualized mixed document
US7920759B2 (en) Triggering applications for distributed action execution and use of mixed media recognition as a control input
KR100980748B1 (en) System and methods for creation and use of a mixed media environment
US8156427B2 (en) User interface for mixed media reality
US7812986B2 (en) System and methods for use of voice mail and email in a mixed media environment
US7669148B2 (en) System and methods for portable device for mixed media system
US7917554B2 (en) Visibly-perceptible hot spots in documents
US7702673B2 (en) System and methods for creation and use of a mixed media environment
US8838591B2 (en) Embedding hot spots in electronic documents
US8332401B2 (en) Method and system for position-based image matching in a mixed media environment
US7885955B2 (en) Shared document annotation
US8195659B2 (en) Integration and use of mixed media documents
KR100979457B1 (en) Method and system for image matching in a mixed media environment
JP4897795B2 (en) Processing apparatus, index table creation method, and computer program
KR100960640B1 (en) Method, system and computer readable recording medium for embedding a hotspot in a document

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20130123