CA2975694A1 - Systems and methods for data indexing and processing - Google Patents

Systems and methods for data indexing and processing Download PDF

Info

Publication number
CA2975694A1
CA2975694A1 CA2975694A CA2975694A CA2975694A1 CA 2975694 A1 CA2975694 A1 CA 2975694A1 CA 2975694 A CA2975694 A CA 2975694A CA 2975694 A CA2975694 A CA 2975694A CA 2975694 A1 CA2975694 A1 CA 2975694A1
Authority
CA
Canada
Prior art keywords
document
characters
reference database
file
strings
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CA2975694A
Other languages
French (fr)
Other versions
CA2975694C (en
Inventor
Joseph Matthew Morvant
Michael John Ebaugh
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Indxit Systems Inc
Original Assignee
Indxit Systems Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Indxit Systems Inc filed Critical Indxit Systems Inc
Priority to CA3074633A priority Critical patent/CA3074633C/en
Publication of CA2975694A1 publication Critical patent/CA2975694A1/en
Application granted granted Critical
Publication of CA2975694C publication Critical patent/CA2975694C/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/313Selection or weighting of terms for indexing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/04Billing or invoicing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/414Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/418Document matching, e.g. of document images
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Software Systems (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Development Economics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Document Processing Apparatus (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

Systems and methods are disclosed that allow for indexing, processing, or both of information from physical media or electronic media, which may be received from a plurality of sources. In embodiments, a document file may be matched using pattern matching methods and may include comparisons with a comparison reference database to improve or accelerate the indexing process. In embodiments, information may be presented to a user as potential matches thereby improving manual indexing processes. In embodiments, one or more additional actions may occur as part of the processing, including without limitation, association additional data with a document file, making observations from the document file, notifying individuals, creating composite messages, and billing events. In an embodiment, data from a document file may be associated with a key word, key phrase, or word frequency value that enables adaptive learning so that unindexed data may be automatically indexed based on user interaction history.

Description

SYSTEMS AND METHODS FOR DATA INDEXNG AND PItOCESSING
BACKGROUND
FIELD OP THE INVENTION
[00011 The present invention relates generally to the field of data processing, and more particularly to systems and methods for data processing and data indexing.
BACKGROUND OF THE INVENTION
[00021 Many individuals and business entities have to process documents and electronic files. For example, medical facilities are continually inundated with lab reports, medical transcriptions, test results, insurance forms, and the like. Stores and other business MUST
maintain information related to products, inventory, customers, vendors, employees, and so forth.
NOM] Traditionally, much of the processing of this information, whether contained in physical media, such as paper, or electronic files stored on electyonie media, such as magnetic disks, optical disks, flash memoty, network servers, storage devices, and the like, is done manually. That is, the information contained on physical or electronic media is mamtally reviewed and manually indexed or processed.
100041 The amount of time required to review the data contained in physical or electronic media and to catalogue the information contained therein often consumes a large portion of the time. Increasing the accuracy of cataloguing of these records and documents generally results in increasing the time spent reviewing and processing.
[0005] Some prior methods attempt to increase accuracy but decrease time involved in piccessing data used automated systems. One such system involves entry of information, at least in part, by using barcodes, predefined fields, or optical marks indicia imprinted or placed on a paper-based form. The barcode or marks arc scarned to enter information into a database. However, such systems are not without problems. These methods are heavily dependant on the direct activities of the professional staff or organization providing services.
To be effective, these methods require consistent and accurate usage by the staff or organization. In some instances, barcodes, predefined fields, or optical marks systems still required manual, labor-intensive processes. Furthermore, barcode or optical mark systems often do not work across different entities, as it requires consistent adoption of uniform procedures and infrastmeture by all the entities. That is. the markines of one entity are often not useful to another entity.
100061 Currently, no systems or methods allow for the automated input and processing of information f.rom various documents received from a plurality of sources.
Accordinglv, systems and methods are needed that allow indieia contained within data. which may he originally embodied in physical or electronic media. to be identified and processed without extensi ve professional staff assistance.
10006.11 According to a broad aspect of the present invent.ion, there is provided a method for associating a document file with a record in a reference database, the method comprising: receiving the document tile. the document Ille comprising unstructured data related to a record in the reference database; organizing data extracted from the unstnictured data in the doctiment file into an array of strings: obtaining a first set of strings by filtering al least a portion of the array of strings using at least one of: string position, position of a portion of a string. string value, value of a portion of a string, string format, format of a portion of a string, a property of one or more characters within a striae, and string length;
comparing the first set of strings from the array of strings against a comparison reference database comprising a plurality of records from the database, wherein a record comprises at least one data field element; dynamically generating a match pattern by selecting, from results of comparing. the first set of strings from the array of strings against the comparison reference database. a set of matches to one or more data field elements within a record from the plurality or records in the comparison reference database to form the match pattern;
determining :a number of occurrences of the match pattern %% Min records from the pluralit) of records in the comparison reference database: and responsixe to the number of occurrences of the match pattern within records from the phirtilit) of records in the comparison reference database being below a threshold number. associating the clocument tile uith the record corresponding u ith the set of inatclies front hich the match pattern wits formed.
10006.21 According to a further broad aspect of the present ins ention. there is provided a *stem for associating a document. tile %% ith a record in a reference database. the system comprising: one or more prmessors communicatiµel) coupled to at least one computer- .
readable medium storing one or more sequences of instructions, wherein execution of the
2 one or more sequences of instructions by one or more processors causes the one or more processors to associate a document file by performing the steps comprising:
receiving the document file, the document lile comprising u structured data related to a record in the reference database; organizing data extracted from the unstructured data in thc document file into an array of strings; obtaining a first set of strings by filtering at least a portion of the arra) of strings using at least one of: string position, position of a portion or a string, string value, %aim; of a portion of a string. string Ibrmat, format of a portion ola string, a property of one or more characters within a string. and string length; comparing the first set of strings froin thc arra) of strings muting a comparison reference database comprising a phirality of records wherein a record comprises at least one data lield element;
dynamically generating a match pattern by selecting. front results of comparing the first set of strings from the array of strings against thc comparison reference database, a sct of matches to one or more data field elements within a record from the plurality of records in the comparison reference database to form the match patient; determining a number of occurrences of the march pattern within records from the plurality of records in the comparison reference database;
and responsive to the nutnber of occurrences of the match pattern within records from the plurality of records in the comparison database being below a threshold number, associating the document file with the record corresponding, with the set of matches front which the match pattern was Conned.
(0006.31 According to a gill flintier broad aspect or the present invention, there is provided a non-transitory computer-readable medium comprising one or more sets of instructions which, when executed by one or more processors. causes the one or more processors to perform a method for associating a document file with a record in a reference database, the method comprising: receiving the document file, the document file comprising unstructured data related to a record in the reference database; organizing data extracted from the onsiructured data in the document file into an array of strings; obtaining a first sct of strings by filtering at leas a portion of the array of strings using at least one of:
string position, position la portion of a string, string value, valne of a portion of a string, string fon at, format of a portion of a string. u property of one or more characters within a string. and string length: comparing the first set of strings from the array of strings against a comparison reference database comprising a plurality of records front the database. wherein a record comprises at least one data field element; dynamically generating a match pattern
3 by selecting. from results of comparina the first set of strings from the array of strings against the comparison reference database, a set of matches to one or more data field elements within a iecord from the plurality of records in the comparison reference clatahase w loon the match pattern; determining a number of occurrencesof L I
a: match patient within records from the pluralit) of records in the comparison reference database; and responsiµe tu the number of occurrences of the match pattern within records from the pluralit of lecords in the comparison reference database being below a threshold number, associating the document tile with the reeord corresponding with the set of matches from which the match pattern was formed.
10006.41 According to a still further broad aspect of the present invention.
there is provided a non-transitor) computer-readable medium or inedia compris:ng one or more sequences of instructions which, when executed by one or more processors, causes steps to be performed comprising: obtaining a first sct of criteria for identifying one or more document characteristics in document files comprising unstructured data, wherein each criterion in the first set of criteria comprises cute or more conditions and is associated with one or more document characteristics, the first set of criteria being from a first source; obtaining a second set of criteria for identifying one or more document characteristics in document files comprising unstructured data, wherein each criterion in the second set of criteria comprises one or more conditions and is associated with one or more document characteristics the second set of criteria being from a second source: and comparing the first and second sets of criteria to generate a set of match criteria for use in identifying une or more document characteristics (Or a document file comprising unstructured data. wherein each cajun-ion in the set of match criteria comprises one or more .:onditions and is associated with one or more document characteristics.
10006.51 According to a still further broad aspect of the present invention.
there is provided a processor-implemented method for identifying a document characteristic comprising receiving.
from a plurality of sources, a plurality or features for use in identifying one or more document characteristics of document files comprising unstructured data, wherein each feature comprises one or more elements and each feature .s associated %%Mt a document characteristic; generating.
from the plurality of features, a set of feattnes and their associated document characteristics for use in identifying one or more characteristics in a document file; receiving a document file comprising unstructured data; comparing at least some of the features froin thc set of = features with the document file comprisine uostructured data; and responsive to a feature exceedina a threshold match with data in the document file, attributing the document characteristic associated with the matching feature in the document file.
4 10006.6J According to a still further broad aspect of the present invention, there is provided a system for detecting an object in an image. the system comprising one or more processors;
ancl a non-transitory computer-readable medimn or media comprising one or more sequelices of iiistructions 1%hich, %then executed by the one or more processors, causes steps to be performed comprising rceeking. from a plurality or sources, a plurality of features for use in identilying one or more document characteristics of docutnent files comprising unstructured data, t% herein each feature comprises one or more elements and each feature is associated with a document characteristic; generating, front the plurality of features. a set of features and their associated document characteristics for use in identifying one or more characteristics; recci% mg a document file comprising unstructured data;
comparing a( least some of the features from the set of features %%ith the document file comprising unstructured data; and responsive to a feature exceeding a threshold match %%ith data in the document Ille, attributing the document characteristic associated with the matching linutre to the document file.
[0006.7] According to a still further broad aspect of the present invention, there is provided a method for indexing a document file comprising a plurality of characters arranged into an array of wines, the method comprising; tittering the array of strings to obtain a set of strings; f1r each string in the set of strings. creating a first sequence list compi ;sing a substring starting at a first character position in the string and a second sequence list comprising a substring starling at a second character position in the string;
generating a comparison reference database by (locoing the first and second sequence lists against a reference database, the reference database comprise a plurality of records and each record comprises a plurality of data fields; for each record in the comparison reference database, generating a first set of substrings based upon a first set of data fields from thc plurality of data fields in thc record; and comparing the first set of substrings against the set of strings to identify a longest substring match. if any, for each of the lirst set of data fields from the record: filtering the comparison reference database to create a second comparison reference database by selecting each record that has a longest substring match for one or more data fields front the first set of clam fields; assigning a point value for each match found in a record arid summing the point value for the record; responsive to a record having a total point value cm:ceding a threshold match value. assoeiating the document tile with that record; and responsive to no records having u total point value exceeding the threshold match value, providing at least a portion of the plurality of records to a user to facilitate the user's selection of a record to associate with the document Ille.
10000.8.1 According to a still limber broad aspect of die present invention.
there is provided a method for indexing a document file comprising a plurality of characters arranged into an array of strings, the method comprising: identifying date strings within the array of strings that correspond to a date and selectiog a date string that corresponds to the earliest date;
comparing the date string that corresponds to the earliest date against a reference database, the reference database comprise a plurality of records and each record comprises at least one data field. to generate a comparison reference database comprising records from the reference database that possess at least one data field that matches the date %villa;
responsive to the comparison reference database comprising a plurality of records, performing a matchina operation to reduce the number or records that comprise the comparison refereilee database: responsive to the comparison rell.trence database comprising one record. associating the document tile with that record; and responsive to the comparison reference database comprising a second plurality of records following performance of the matching operation. providing at least a portion of the second plurality or records to a user to facilitate the user's selection of a record to associate with the document file.
10006.91 According to a still further broad aspect of the presimit invention.
there is provided a method for indexing II document tile comprising a plurality of characters arranged into an array of strings. the method comprising: generating a lirst sequence set comprising silbstrings from each string in a set of string selected from the array of strings, the substrings being tiirmed by taking a number of consecutive characters from the string starting at a first character position in the string; generating a second sequence set comprising substrings from each string in the set of string selected from the array of strings. the substrinas being formed by taking a number of consecutive characters from the string starting at a second character position in the string; querying one or more combinations a substrings from the first and second sequence sets against a reference database to form a comparison reference database, the reference database comprise a plurality of records and each record comprises a plurality of data fields; for each record in the comparison reference database, generating a set of string. fragments; identifying in either the sct of strings or the array of strings a string fragment from the set of string fragments that matches; and searching, in either the set of strings or the array of strings using one or more data fields from the record from which the string fragment that matches was obtained to identify the number of matches.
[0006.101 According to a still further broad aspect of the present invention, there is pro % icted a processor-implemented method for indexing a document file comprising:
receiving a document file. wherein the doctiment file comprises a plurality or unstructured characters; organizing the plurality of unstructured characters into an array of strings;
receiving at leitst a portion of a reference database from a client, wherein the reference database cotnprise a plurality of records wherein each record comprises at least one data field clement; comparing a first set of strings the array of strings against a comparison reference database obtained front the reference database; and responsive to at least a portion of the first set a strings excec.ding a threshold match with at least a portion la record in the comparison rcli:rence database. generating a structured message that associates the document file with the record.
10006.11] According to a still further broad aspect 01' the present invention;
there is provided a processor-implemented method for identifying a document lilc comprising:
responsive to locating a recognized set or characters in a document file comprising a plurality of characters, using the recognized set of characters an anchor point and performing the steps comprising: selecting au examination set of characters front the document file, the examination set being selected based upon proximity to the anchor point;
and searching the examination set for onc CPI mon: indicators to assist in uniquely identifying the document file.
10006.12) According to a still 1111111cl broad aspect of the present inveation. there is provided a processor-implemented method for identifying a document comprising searching a document comprising a phirality of characters to identify an anchor point comprising a set of characters; and responsive to identifying an anchor point: assigning proximity weighting to at least some of the characters in ihe document based upon their position relative to the anchor point; selecting an examination set of characters from the document using the proximity weightings; and searching the examination set for one or more indicators to assist in uniquely identifying the document.
[0006.131 According to a still further broad aspect of the present invention, there IN
provided a system comprising one or more processors: and a non-transitor) computer-readable medium or media comprising one or more sequences of instructions which, m hen executed by al least one of the one or more processors. causes steps to be performed comprising searching a document comprising a pluralit) of characters to identify an anchor point comprising a set of characters: and responsive to identifying an anchor point:
assigning proximity weighting to at least some of the characters in the document based upon their position relative to the anchor point; selecting an examination set of characters from the document using the. proximity weightings; and searching the examination set for one or more indicators to assist in uniquely identifying the document.

BRIEF DESCRIPTION OF THE DRAWINGS
[0007] Reference will be made to embodiments of thc invention, examples of which may be illustrated in the accompanying figures. These figures are intended to he illustrative, not limiting. Although the invention is generally described in the context of these embodiments, it shall be understood that the scope of the invention is not limited to these particular embodiments.
[0008] Figure ("FIG.") 1 illustrates an exemplary envi:tniment in which embodiments of systems and methods of the present invention may operate.
[0009] Figure 2 is a functional block diagram illustrating an exemplary multi-compnting-device system in which exemplary embodiments of the present invention may operate.
[0010] Figure 3 depicts an exemplary computing system according to an embodiment of the present invention.
10011] Figure 4A depicts an exemplary laboratory report which may be embodied in a document file according to an embodiment of the invention.
[0012] Figure 4B depicts an exemplary reference database according to an embodiment of the present invention.
[0013] Figure 5 depicts an exemplary method for initially accessing an indexing service provider system according to an embodiment of the invention.
[0014] Figure 6 depicts an exemplary method for processing a document tilc or files according to an embodiment of the present invention.
[0015] Figure 7 depicts an exemplary method for transferring a document file or set of document files from a client system to an indexing service provider system according to an embodiment of the present invention.
100161 Figure 8 depicts an exentplary inethod for decrypting and = extracting/decompressing a received batch of document files according to an embodiment of the present invention.
[0017] Figure 9 depicts an exemplary method for extracting data from a document file according to an embodiment of the present invention.
[0015] Figure 10 depicts an exemplary method for extracting characters from a dociurient file according to an embodiment of the present invention.
[001.9] Figure 11 depicts an exemplaiy method for checking the extraction of characters from a document file according to an embodiment of the present invention.

[0020] Figure 12A depicts an exemplary plurality of characters obtained fToin a document file according to an embodiment of the present invention.
[0021] Figure 12B depicts exemplaiy arrays of strings obtained from a document file according to an embodiment of the present invention.
[0022] Figure 13 depicts an exemplar)/ method for indexing a document file according to an embodiment of the present invention.
(0023] Figure 14 depicts an alternative embodiment of a method for indexing a document file according to an embodiment of the present invention.
[0024] Figure 15 depicts an exemplaiy method for determining a document type of a document file according- to an embodiment of the present invention.
[0025] Figure 16 depicts an alternative embodiment of a method for indexing a document fife according to an embodiment of the present invention.
[0026] Figure 17 depicts an alternative embodiment of a method for indexing a document file according to an embodiment oldie present invention.
[0027] Figure 18 depicts an alternative embodiment of a method for indexing a document file according to an embodiment of the present invention.
[0028] Figure 19 depicts an exemplaiy method for determining u date of service of a document tile according to an embodiment of the present invention.
[0029] Figure 20 depicts an embodiment of a method for determining a date of service for a document file according to an embodiment of the present invention.
[0030] Figure 21 depicts an alternative embodiment of a method for indexing tt document file according to an embodiment of the present invention.
[0031] Fignie 22 depicts an alternative embodiment of a method for indexing a document file according to an embodiment of the present invention.
[0032] Figure 23 depicts an exemplary method for determining a provider associated with a document file according to an embodiment of the present invention.
[0033] Figure 24 depicts an exemplary method for indexing a document file according to.
an embodiment of the present invention.
fo03q Figure 25 depicts an exemplary method for returning information related to pzoccssed document files to a client system according to an embodiment of the present invention.
[0035] Figure 26 depicts exemplary types of information that may be associated with a document file according to an embodiment of the present invention.

[0036] Figure 27 illustrates an exemplary composite message according to an embodiment of the present invention.
[0037] Figure 2S depicts an exeniplary method for presenting files for manual review according to an embodiment of the present invention.
[00381 Figure 29 depicts an exemplary method for receiving and processing document files received front tin indexing service provider according to an embodiment of the present invention.
100391 Figure 30 graphically illustrates an exemplary file structure for indexing a plurality of files according to an embodiment of the present invention.

DETAILED DESCRIPTION OF THE INVENTION
[0040] According to an aspect of the present invention, systems and methods are disclosed that allow for the automated indexing and/or processing of inforniation from a variety of documents, both from physical media and electronic media, which may be received froni a plurality of sources. Although the features and advantages of the invention are generally described in this section in the context 01 embodiments, it shall be understood that the scope of the invention should not be limited to these particular embodiments. Many additional features and advantages will be apparent to one of ordinary skill in the art in view of the drawings, specification, and claims hereof.
[0041) in the following description, for purposes of explanation, specific details are set forth in order to provide an understanding of the invention. It will be apparent, however, to one skilled in the art that the invention can be practiced without these details. One skilled in the art will recognize that embodiments of the present invention, described below, may be performed in a variety of ways and using a variety of means and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will convey the scope of the invention to those skilled in the art. Those skilled in the art will also recognize additional modifications, applications, and embodiments are within the scope thereof, as are additional fields in which the invention may provide utility.
[0042J The embodiments of the 'present invention may be present in software, hardware, firmware, or combinations thereof. Structures and devices shown in block diagram arc illustrative of exemplary etnboditnents and are meant to avoid obscuring the invention.
Furthermore, connections between systems, services, components, and/or modules within the figures are not intended to be limited to direct connections. Rather, data between these systems, services, components, andior modules may be modified, re-formatted, Or otherwise changed by intermediary systems, services, components, and/or modules.
[0043] Reference in the specification to "one embodiment" or "an umbodiinent" means that a particular feature, structure, characteristic, or function described in connection with the embodiment is included in at least one embodiment of the invention.
Furthermore, the rippearance of the phrase "in one embodiment," "in an embodiment," or the like in various places in the specification are not necessarily all referring to the same embodiment.
=

A. EXEMPLARY SYSTEM IN WHICH EMBODIMENTS OF THE
PRESENT INVENTION MAY OPERATE
[0044] Figure l graphically depicts an exemplary environment in which systems and methods of the present invention may operate. Figure I depicts a medical environment 050 in which a physician's office 010 receives information, in the form of electronic or physical tiles, frorn a plurality of sources 020. Thost.-. sources 020 may include, but are not limited to, hospitals 020A, patients 02013, government agencies 020C, insurance companies 020D, previous caregivers 020E, and laboratories 0201:. It shall be noted that the present invention is not limited to use within medical.systems, but may be employed in other settings, including without limitation, governmental, business, non-profit, and educational environments.
/0045J As noted previously, a physician's office may receive a number of files from a number of sources 020. The physician's office 010 must process all of these files received from the multiple sources. Processing these files has typically been performed by hand, or at least principally by hand, which requires huge amounts of time and expense.
100461 As illustrated in Figure 1, an indexing service provider 030, cotnmunicatively connected with the physician's office 010, may be employed to automate the processing of the plurality of tiles received by the physician's office according to embodiments of the picsent invention. ln the embodiment depieted in Figure l, the indexing service provider 030 may be functionally and/or physically located in another location separate from the physician's office 010; alternatively, thc indexing service provider may be functionally and/or physically located at the physician's office 010.
[0047] FIG. 2 is a functional block diagram illustrating an exemplary multi-computing-dcvice system 200 in which exemplary embodiments of the present invention may operate. It shall be noted that thc present invention may operate, and be embodied in, other systems us well. =
[0048] Depicted in FIG. 2 is a first computer system or device 101 and a second computing device or system 201 communicatively connected to the first computer system 101. As will be apparent to those skilled in the art, first and second con-muting systems may be coat-mired to communicate directly or may communicate indirectly via one or more intermediate computing devices. In an embodiment, in addition to being capable of being coupled in a variety of different manners, the first and second computing devices may communicate by any of a number of different communications protocols, including, but not limited to, standard networking and Internet communication protocols.

[0049] In an embodiment, first computing device 101 and second computing device 201 may be owned or operated by a single entity or may be housed within a single facility.
Alternatively, first computing devices 101 and second computing device 201 may be owned or operated by separate entities or may be housed in separate facilities. For example, first computing device 101 may be located at a physician's office 010, such as the one depicted in Figure 1, and the second computing device 201 may be operated hy a service provider 030.
[0050j In an embodiment, first computing system 101 and second computing device 201 may comprise one or more services, or modules, to perforin operations. These modules may be communicatively coupled together to perform the described operations or achieve the described results. It shall be noted that the terms "coupled" or "communicatively coupled,"
whether used in connection with modules, devices, or systems, shall be understood to include direct connections, indirect connections through one or more intermediary devices, and wireless connections. It shall also be understood that throughout this discussion that services or modules may be described as separate functional units, but those skilled in the art will recognize that the various services, or portions thereof, may be divided into separate services or modules or may be integrated together, including integrating within a single computing system. One skilled in the art will also recognize that a service or module may be implemented in software, hardware, firmware, or a combination thereof. The term "services"
may also be used interchangeably herein with "utilities" or "modules."
[00511 The embodiment of first computing system 101 depicted in the Figure 2 may = comprises a database interface 102 for iuterfacing with one or more databases 100 or a database systems 110, and a communication services module 103. First computing system 101 may comprise fewer or additional services or modules to accomplish tasks illustrated by the embodiments described herein.
[0052] Second computing system 201 may comprise a database interface 202 for interfacing with one or more databases or database systems; a connounications services module 203, which may communicate with other devices, systems, and/or applications and may include messaging services, encnyptionidecryption, compression/extracting services, and/or packaging services; unpack services 204, which may include unpacking batch files received from the first coinputing system; extraction services 205, which may include optical and/or voice recognition services, rotation utilities, and text verification services; indexing services 206, which may include one or more matching/filtering algorithms and may also include manual indexing functionality; observation set-vices 207, which may take certain actions based upon obsmation made within a document file; messaging service 208, which may be part of or separate from communications services 203, for providing messages to indexing recipients, clients, and/or third parties; archiving and retrieval services 209 for providing data back-up for an indexing recipient; packaging services 210, which also may be part of communications services 203, for preparing files for transmission to an indexing recipient; and account and billing services 211, which may monitor indexing recipient's account(s) and provide billing when billing events have transpired. Additional information about these services is provided herein with description of the various embodiments.
[00531 One skilled in the art will recognize that these services may be integrated into a single computer system. One skilled in the art will also recognize that some services, such as packaging and unpaeldng services may not be needed in single computing system embodiments. It shall also be noted that other systems and services may be configured and fall within the scope of the present invention.
[0054] First and/or second computing devices 101, 201 may be a general computing device, including, without limitation, a workstation, server, personal computer, and the like, or may be a specifically designed computing device. It shall be noted that employing one or more second computing systems 201 may be beneficial to reduce the processing and bandwidth loading on first computing system 101. In an embodiment, second computing system 201 may be communicatively coupled to database system 110.
Alternatively, second computing system 201 may receive access to or receive files from database system 110 via first computing system 101. Although not depicted in FIG. 2, one skilled in the art will recognize that second computing system 201 may be communicatively coupled to the same or similar devices, inputs, and networks that am communicativeiy coupled to first computing device 101, which are described in mom detail below. It shall be noted that the present invention may operate, and be embodied in, other systems as well.
[0055] In an embodiment, an aspect of the present invention is indexing and/or processing of data received by first computing device 101. At one or more instances of time, firsi computing system 101 may receive data from one or more of the sources of data. The data in files received by first coinputing system 101 may be originally enibodied in electronic files or in physical media, such as paper reports and the like. Examples of received document files include, but are not limited to, faxes, papers, letters, email messages, instant messages, data files, text tiles, document files, 1-1L-7 messages, ASTM message, mark-up language files, image files, audio files, and the like. In some instances, a received document file directly representative or the data contained in thc ph)sieal c.ir electronic media.
For example. the data representing an I 11..-7 tile directl) represents the data ol interest. In embodiments, the received document tile indirectly represents the data contained in physical or electronic media. For example, Ihe document tile may represent an image of a report rather than the report data itself As explained in more detail with tclerence to extraction services data extracticm ma>
assist in estracting at least some of the plurality of characters for such received document files. For eµtemple, optical character recognition operations may assist in extracting sonic or all or the pluralit or characters from a tile.
In either event, the document file may be associnied with a plurality of characters. For the purposes of explanation, a file received by first computing system 101 tbr processing may be referred to herein as a document tile. It shall be understood thtit the term "document file"
refers 10 any file, regardless of the contents or type of data contained within or associated with the tile, which is to be processed according to one or more embodiments or the present invention. That is, the tile may not contain data associztted with a "document," but for thc purposes of explanation herein, it may be referred to as a "document rile."
100561 Regardless of its original form. a file may contain a plurality of characters, which may form identifying indicia that are uselitt for indexing and/processing a document tile. Identifying indieia may inclucle, but are not limited to, first name, last name. account number, social security number, date of birth, document title, document type, document contents, identification number, product number, stock keeping unit (SK(!) number, Me type, file structure, tile source, life name, document identilleation number. document source, transmission information, encryption inlbrtnation (such as key encryption. hash, and the like), hash number, metadata, and any other information usefiil tòr identifying, categorizing, or processing a document tile.
f00571 Consider, by way of example, the report 400 illustrated in FIG. 4A. The report 400 may be represented in an electronic file. Report 400 comprises a plurality of characters. which one or more portions of the plurality of characters may be used lr processing the report 400. ft should be noted that the plurality of characters arc not limited to the data appearing on the filet of a report or form.
Rather. the plurality of characters shall be construed to include any intOrmation associated with the document tile. which also includes any data or information useful tin identifying, categorizing, or processing the file. Examples of the plurality of characters. in addition to the information included upon the thee Of the document, that may be associated with the file may also include. but are not limited to, Ille type, file source, file sh-ucture, file name, transmission information, encryption information (such ns key encryption, hash, and the like), hash number, metadata, and the like.
[0058) Returning to Figure 2, in an embodiment, first computer system 101 is adapted to receive document files for processing. As depicted in the ernbodiinent illustrated in FIG. 2, first computing system 101 may be communicatively coupled to receive data in a variety of manners and from a variety of sources. In an embodiment, first computing device 101 may communicate according to any of a variety of communications protocols, including, but not Ihnitcd to, standard networking and Internet communications protocols.
[00591 In an embodiment, first computing device 101 may communicate wirelessly, such as by means of a wireless local area network (LAN) or a wireless wide area network (WAN), with one or more networks or devices, such as remote network 150 and mobile device 155.
Mobile device may comprise one or more mobile or wireless computing device, including, but not limited to, a laptop computer, a mobile phone, a PDA, a wireless communication device, and the like. Alternatively, Or in addition to the wireless connections, first computing system 101 may be connected via a wired LAN, wired WAN, or by any other wired connection, including but not limited to universal serial bus (USB), firewire, serial, and parallel port connections, to one or more devices or networks, such as to network 145 or to one or more storage devices 140. Storage devices include, but arc not limited to, optical drives, disk drives, tape drives, flash memory drives, RAID arrays, and the like. Data may be received from a network 145, 150, and/or storage device 140. I an embodiment, network 145, 130 may provide access one or more of the following: intrancts, extranets, portals, the Internet, and one or more information servers. Examples of information servers include, but are not limited to, a transcription information server, a medical infonnation server, a laboratory information server, an entail server, databases, or any other data source known to those skilled in the art.
[0060) In an entbodiment, first computing system 101 may be connected to one or more input devices 115. Fur eAamplc, firs' compating system I Ul may receive data via a keyboard, touchpacl, mouse, or the like. I-irst computing system may also receive data via an audio/vidco input. An audio input rnay be recorded and manually transcribed or may be transcribed using speech recognition software or hardwate, which may be resident within system 101 or system 201.
[0061] In an embodiment, first computing system 101 may also be communicatively coupled to other input devices. In one embodiment, a scanner 125 may provide data to first computing system 101, which data may be a digital representation of physical media, such as handwritten, typed, or printed documents. First computing system 101 may also be communicatively coupled to a fax machine and/or fax server 120 to receive facsimile data. In an embodiment, the scanned or faxed file may be all image of thc physical media. In an alternative embodiment, the scanned or faxed data inay include text and/or graphical data.
Embodiments of the present invention may include a bar code reader and/or optical mark reader 135. A barcodc or optical mark indicia imprinted or placed on an item, when scanned, = may provide data to first computing system 101.
[00621 First computing system 101 may include a directory interface 102 for conimunicating with u directory or database system 110. In one embodiment, database system 110 may be implemented using Centricity E,MR(g) (formerly Logician%) an eleeti=onic medical record system marketed by GE Healthcare. In an embodiment, database system 110 may be located on a local storage device, such as a hard drive. In an alternative embodiment, database system 110 may be stored reinotely and accessed by first computing system 101 via a direct or networked connection.
[0063] In an einbodiment, database system {10 may include one or more databases. In an embodiment, database system 110 stores data that has been received by first computing system 101. In ono embodiment, database system 110 possesses a database: 100, which includes a plurality of records comprising onc or more sets of data, such as identifying indicia, that may be used as a contparison reference database, as explained in more detail below. Figure 4B depicts an exemplaiy database coinprising a plurality of records 410, which comprise a plurality of data fields 405. Database 100 or database system 1(0 may also include one or more of the following: additional identifying indicia, instructions for processing certain data, composite data, or other data. In an embodiment, any database stored in database systein 110 niay be selectively queried. For example, reference database 100 may bc queried using one or more key terms or identifying indicia, which may include but is not limited to, 311 account number, an individual's name, data of birth (DOB), social security number, item number, stock keeping unit (SKU) number, report data, associated provider, and the like.
B. EXEMPLARY COMPUTING SYSTEMS
[0064] In an embodiment, first coinputer system 101, second computing system 201, or both may be implemented using a conventicmal computing device, such as a personal computer, a workstation, a server, a portable computing device, such as a laptop computer or PDA-type (personal data assistant) device, or the like. Alternatively, first computing device 101, second computing device 201, or both may be a specifically designed or configured computing device. FIG. 3 is a functional block diagram of an embodiment of a computing system 300 that may be used for first computing device 101, second computing device /01, or both.
[00651 As illustrated in FIG. 3, a processor 302 executes software instructions and interacts with other system components. In an embodiment, processor 302 may be a general purpose processor such as an AMU processor, an INTEL x86 processor, a SUN
MICROSYSTEMS SPARC, or a POWERPC compatible-CPU, or the processor may bc an application specific processor or processors. A storage device 304, coupled to processor 302, provides long-term storage of data and software programs. Storage device 304 may be a hard disk drive and/or another device capable of storing data, such as a computer-readable media (e.g., diskettes, tapes, compact disk, DVD, and the like) drive or a solid-state memory device.
Storage device 304 may hold programs, instructions, and/or data for use with processor 302.
In an e.mbodimcnt, programs or instructions stored on or loaded from storage device 304 may be loaded into memory 306 and executed by processor 302. In an embodiment, storage device 304 holds programs or instructions for implementing an operating system on processor 302. In one embodiment, possible operating systems include, but are not limited to, UNIX, AIX, LINUX, Microsoft Windows, and the Apple MAC OS. The operating system executes on, and controls the operation of, the computing system 300.
[0066] An addressable memory 306, coupled to processor 302, may be used to store data and software instructions to be executed by processor 302. Ivlemory 306 may be, for example, firmware, read only memory (ROM), flash memory, non-volatile random access memory (NV RAM), random access memoty (RAM), or any combination thereof. In one embodiment, memory 306 stores a number of software objects, otherwise known as services, utilities, or modules. One skilled in the art will also recognize that storage 204 and memory 206 may be the same items and function in botlt capacities.
[0067] In an embodiment, computing system 300 provides the ability to communicate with other devices, other networks, or both. Computing system 300 may include one or morc network interfaces or adapters 312, 314 to communicatively couple computing system 300 to other networks and devices. For example, computing system 300 may include a network interfitce 312, a communications port 314, or both, each of which are communicatively coupled tu processor 302, and which may be used to couple computer system 300 to other computer systems, networks, and devices.
[006S1 In an embodiment, computing system 300 may inciude one or more output devices 308, coupled to processor 302, to facilitate displaying graphics and text. Output devices 308 may include, but are not limited to, a display, LCD screen, CRT
monitor, printer, touch screen, Or other device for displaying information. Computing system 300 may also include a graphics adapter (not shown) to assist in displaying information or images on output device 308.
[0069] One or more input devices 310, coupled to processor 302, may be used to facilitate user input. Input device 310 may include, but are itot limited to, a pointing device, such as a mouse, trackball, or touchpad, and may also include a keyboard or keypad to input data or instructions into computing system 300. In an embodiment, one or more of the input devices 210 may he the same as input device 115 (FIG. 2).
[0070J One skilled in the art will recognize no computing system is critical to the practice of the ptescnt invention. One skilled in the art will also recognize that a number of the elements described above may be physical and/or functionally separated into sub-modules or combined together.
C. EMBODIMENTS OF FILE ACOUISMON SliRVICES AND
ACCOUNT VALIDATION
[0071] In an embodiment, the present invention may include file acquisition services and/or account validation services. Figure 5 depicts an exemplary method for file acquisition services (505) and account login (515) for a client system, which may be first computing system 101, according to an embodiment of the present invention. In an embodiment, the file acquisition services may comprise a program or function that monitors the receipt of document files received by the client system 101 from one or more sources.
These documents files may bc received in multiple formats including, but not limited to, e-mails, instant messages, IIL-7 files, scanned documents, text documents, audio files, transcription files, image files, ASTM message files, mark-up language files, and the like.
In an embodiment, all document files may be stored in a specifie folder or folders and the file acquisition services (505) monitors the specific folder or folders.
[0072] In an embodiment, the client system 101 may also include reference database acquisition services (510). In one embodiment, reference database acquisition services obtains from the client systeni 101 a reference database that may be used to index or match document files to records in the reference database. For exainple, a reference database in the physician's office 010 may comprise a database of records for patients.
Consider the exemplary reference database 100A depicted in FIG 4B. Reference database 100A
includes a plurality of records 410-1 to 410-n containing one or more data fields 405A¨D. In an embodiment, reference database 100A may include one or more fields 405E¨x for including additional identifying indicia, additionul data, links to files, notes, instructions for processing document files, and other data. As noted above, the fields of reference database 100A may be populated using one or more methods for including or entering data into a database. For example, assuming the database is used by a medical center and the entries represent patients of the medical center, the entries may be entered by a receptionist, imported front other databases, and/or obtained from previously indexed/processed files.
[0073] In one embodiment, during an initial setup, the entire reference database may be transmitted to the indexing service provider 201 for use for indexing and/or processing files according to embodiments of the present invention. In an embodiment of the invention, changes to the client system's database may be monitored by the reference database ac.-tquisition services so that only the differential changes need be sent to the indexing service provider 201. Alternatively, the entire reference database may be transmitted to the indexing service provider at periodic intervals or at the occurrence of certain events.
In another alternative embodiment, the reference database 100 may be accessible by the indexing service provider.
[0074] In an embodiment, the client system 101 may login to access the indexing service provider system 201. In one embodiment, when a client logs (515) into the indexing service provider system 201, the client account may be validated to determine if the account is valid (525). If the account is valid, a notification (545) may be sent to the indexing recipient 101, and it may proceed with transferring of any files and reference database or reference database updates as part of the transfer services (555). If the account is not valid, the indexing service provider 20 l may submit a notification (530) to the client system 101 that the account is not active. In an embodiment, the client system 101 may receive (535) a notification to activate thc account and the process may end (540). In an embodiment, the notification may indicate what steps may need to be taken to activate thc account, including without limitation, paying past due bills, subscribing to services, or updating other information, fees, or software.
D. EMBODIMENTS OF INDEXING 4YSTEIVI QVERVIEW

[0075] FIG. 6 depicts an exemplary method for providing indexing services and additional processing services to a client according to an etabodiment of the present invention. Illustrated in Figure 6 is an indexing recipient, or client, system 101 and an indexing service provider system 201. In an embodiment, the method for providing indexing and data processing services may include file(s) and reference database transfer services (605) in which received document files and a reference database may be transferred to the indexing service provider, as mentioned in the prior section. In an embodiment, the indexing service provider 201 may receive the document file or files and reference database or databztses through its transfer and unpacking services (610). In an embodiment, the document file(s) and/or reference database(s) may be enctypted and/or may also be compressed. Accordingly, indexing service provider 201 may employ extraction services (615) to decrypt and decompress the data, if necessary.
[0076] The files received from the client system 10! may bc processed by the indexing services (620) of the indexing service provider, which may also include tbe processing of document files that do not yield matches by manual indexing.
[0077] In an embodiment, observation services may also be performed (625) related to the document tiles. In an embodiment, observation services may include, but are not limited to, noting the occurrence of certain key characters or strings within a document file. For example, in a medical report certain terminology found to occur within a document file may trigger specified actions. In an embodiment, a specified action may be indicating to the client system 101, a recipient, or a third party that a certain terminology has been found. For example, the presence of certain terms, numbers, phrases, etc. being found in a document file may be used to alert a client In an embodiment, additional data may be conditionally associated with data associated with the document file. For example, identification within the document file of testing positive- for some marker ntay be associated with a selected action or actions, such as, indicating that a follow-up appointment should be scheduled.
In an embodiment, first computing system 101 or indexing service provider 201 may interface with one or more programs to initiate an action. For example: first or second computing systems 101, 201 may interface with a caiendaring program to suggest or schedule appointments and may interface with a messaging prograin to notify a patient.
[00781 In an embodiment, indexing service provider 201 may comprise messaging services (630) in which notifications and transmissions of data may be sent to the client system 101, it recipient, and/or third parties. As explained in inore detail below, the messaging services may create and transmit a structured message, a znessage with additional data that may be associated with a matched document file or structured message, and/or a composite message that combines items into a message.
[0079] In embodiments, the indexing service provider 201 may additionally provide archiving and retrieval sei vices (635) fat the indexing recipient 101. For example, the indexing service provider 201 may provide data back-up functionality for document files.
reference databases, and other files, which files may be accessed by the indexing recipient 101.
[0080] In an embodiment, indexing service provider 201 may include packaging and transmission services (640) for transmitting data to the indexing recipient 101. In embodiments, the packaging and transmission services may include encryption and compression features. or algorithms. In embodiments, the packaging and transmission services may be part of the messaging services; or alternatively, the messaging services may be part of the packaging and transmission services.
10081] ln an embodiment, one or more steps or event occurrences may be linked (650) to a billing event and noted in billing records or in a billing table for billing to an indexing recipient or third party.
[0082] One skilled in the art shall recognize that the above-described are embodiments and that other configurations, including with fewer or additional steps or services, fall within thc scope of the present inventions. Aspects of thc steps mentioned above shall be described in more detail below.
E. EMBODIMENTS OF TRANSFER SERVICES
[008.3] FIG. 7 depicts an embodiment of transfer services performed in a client-server embodiment. As illustrated in the embodiment depicted in FIG. 7, an indexing recipient (or client) system 101 creates (705) a batch number and renames each file that is to be transmitted to an indexing service provider 201. In one embodiment, the client system 101 may generate a unique batch number by using a client account number phis a sequential number and/or a date/time number. In an embodiment, client system 101 may loop through each tile to be processed to ensure that it has exclusive system access to the file. Client system 101 may not have exclusive access to a file if, for example, the file is still being received by client system 101 or is being used by another application on the client system. In an embodiment, the files may be renamed with a client account number or code, batch number, and a unique file name, and move/copy each renamed file into a "transit" folder.

One skilled in the art will recognize that the renaming and movingleopying of the files to the transit folder may occur as part of the same step.
[00841 In an embodiment, the client system 101 may create an empty compression file with a file name comprising the batch number and a client account ntunber or code. Each renamed file in the transit folder may be moved to the compression file. In one embodiment, the compression file may be a "Zip" file format. The compressed file may also be encrypted.
In one embodiment, the encrypted file may be identified by adding an extension, such as ".enc" to the file name used for the compressed file.
100851 In the depicted embodiment, the client system 101 announces (710) the batch to tlic indexing service provider (server) 201, and may also transmit a number that represents the number of files that are in the batch, the name of the batch file, and a client account and/or sub-account identification. If the batch information is received in total by the server, a batch ID may be generated (715) and transmitted (720) to the client system 101. ln an embodiment, the batch ID may be a sequential number representing the total number of batches received to date by the service provider. In an embodiment, the batch number may be related to the batch ID. For example, it may contain a time-date stamp and may be generated during the same transaction of transmitting files to the server.
[00861 As illustrated in Fla 7, the batch ID is received (725) by the client systcm 101. If the batch ID is zero (0), an error has occurred. In an embodiment, any error infonnation may be logged and the transmit procedure may restart at the beginning or at any intermediate step to the point of re-announcing the batch.
10087i if the batch ID is not zero, then the announcement was successhil.
That is, the client system 101 has successflilly infonned the server system 201 that a batch is about to be transmitted. In an embodiment, the client system 101 inay loop through each file in the minsit file and announce (735) the file name and batch ID to the server system 201. 'The announcement of the file registers a filename and batch ID to later identify the file on the server. In an embodiment, a batch detail record may be created (740) on the server system 201. The batch detail record may be used to define a document file on the server side 201. In an embodiment, the batch detail record may be used to eventually contain all details about a document file. The batch record may then be updated (745) with the number of files.
[00:381 For the server system to begin indexing, it must be able to determine when files are ready. In an embodiment, the server system may employ a file watcher service Of monitor service to monitor or look for files. To prevent the watcher service from inadvertently finding files that are not yet completely transferred, the service system 201 may also look for a request file, or REQ file. In an embodiment, an REQ file is created (750), which file may be a blank file with the same file name as the encrypted batch package file name with an additional extension, such as "seq."
[0089] In the depicted embodiment, the client system 101 transfers (755) the batch package, which in this illustrated embodiment is a compressed and encrypted file containing the document files, and its associated REQ file. By transmitting the batch package first followed by the REQ, when the file watcher service finds the REQ file, the server system is assured that the batch package file has already been received.
[0090] In an embodiment, the server system 210 may notify (765) the client system 101 that it has received the files. In one embodiment, if either file is not received, the client system 101 may repeat ail or pait of the entire transfer. This may include renaming and moving the image files to their original locations. In an embodiment, if the batch package and REQ files were successfully received by the server system 201, ihe files in the transit folder may be moved to a "pending" folder, and the client system 101 may be deleted batch package and REQ files from its system.
[0091] In an embodiment, after unpacking the batch package., the server system 201 may check the number of files that were announced to it. If the munber of document files in the batch do not equal the number of files that were announced to the server system, an alert notification may be made by the server system and sent to an individual, a system administrator, and/or to the client system. In an embodiment, an automated process may be initiated to rollback the erroneous transmission and reinitiate the transfer.
F. EMBODIMENTS OF UNPACK SERV10ES
[0092) 'fuming to FIG. 11, an exemplary method for unpacking files from the indexing recipient 101 according to an embodiment of the present invention is depicted.
Figure 8 may represent an embodiment of the unpack services (6(0) mentioned in Figure 6. As depicted in the embodiment of Figure 8, the indexing service provider 201 may monitor (305) an input directory for receipt of files from an indexing recipient or client system 101. hi an embodiment, when files have been received or found (810) to be present in an input directory, the indexing service provider 201 may look up (812) the batch number that has been received.
[0093] As mentioned previously, embodiments of the present invention inay include encrypting files to provide security. Embodiments of the preseht invention may also include utilizing compression algorithms to help reduce bandwidth requirements of transmitting data between the indexing client system 101 and the indexing service provider 201.
The embodiment depicted in Figure 8 is directed toward embodiments in which compression and encryption have been performed as part of the transmission process.
Alternative embodiments may not include encrypting, compression, or both.
[0094] Returning now to Figure S, if the batch package is properly decrypted (815), the batch package may be extracted or uncompressed. If the decryption or extraction is not successful (820), the received batch package and its associated REQ file may be moved .(850) to an error directoty for additional processing. In an embodiment, the additional processing may include requesting the indexing recipient system 101 to retransmit the files or to change encryption or compression algorithms.
[0095] ln an embodiment, following successful decryption and extraction, the indexing service provider system 201 may set (825) the batch status to "active" and for each document file in the batch package, perform additional processing. ln an embodiment, this additional processing may include marking (830) a document file in a database as being received, storing (835) each document tile in a database, creating (840) an REQ file for the document lile, and moving or otherwise noting Mat the document file is ready for further processing. In an embodiment, the noting that the document file is ready for further processing may be accomplished by moving (845) the document file or copying the document file and the new REQ tile to an extraction directory. In an cinbocliment, a monitoring service niay begin the extraction processes for a document file when its RFQ file is present.
G. ENLBODlivIENTS OF EXTR.A.CTION SERVICES
[0096] FIG. 9 depicts an embodiment of a method for extracting characters from a document file according to an embodiment of the present invention. In some instances, a document file may directly representative of the data contained in a physical or electronic media. For example, a document tile that is an IIL-7 file directly represents the data contained therein. In some embodiments, a document file may indirectly represent the data contained in a physical or electronic media. For example, a document file of a scanned image indirectly represents the data contained within the scanned document. Data extraction may be beneficial in such cases to extract at least sonic of the plurality of characters that is related to the data of interest. For exatnple, optical character recognition or voice character recognition operations may assist in extracting the data. In either event, the document file coinprises data that comprises a plurality of characters.

r00971 In an embodiment, indexing service provider system 201 monitors (905) au extraction directory for receipt of a document file and its associated REQ
file. When those files are received, a document file may have data extracted (9)0) from the document file.
f009s1 In an embodiment, the extraction type performed by the indexing service provider system 201 may be determined by one or more characteristics such as, for example, file type or extension, client or account, or may be indicated in the REQ file. For example, an image file that is a portable document format (PDF), or some image file type such as a TIFF, GIP, PEG, or the like, may be sent for optical character recognition. If the document file is an image file, the data contained within the document that the document tile represents may be converted from the image file. In one embodiment, optical character recognition operations may be performed to convert the document file to obtain at least some of the plurality of characters, which characters may include alphanumeric text or graphics. In an embodiment, the optical character recognition may be performed on machine-generated documents and/or on handwritten documents.
10099) Assume, by way of example, that the report 400 in Figure 4A is scanned to create a document file that is an image file of the report 400. By performing optical character recognition operations on the document file, at least some of the plorality of characters representing information on report 400 may be obtained. This data may include the alphanumeric text on report 400, for example, the patient's name, age, date of birth, account number, test results, and the like. This data may be used for indexing or processing of the document file.
1001001 Audio files, which may be denoted by having an audio extension such as, for example, .wav or .nm3, or the like, may be processed using voice recognition methods. In an embodiment, an audio file may bc converted by using speech recognition software or hardware.
[00101] In an embodiment, extraction services may also be used for documents that are alleady in an electronic format that is character-based by extracting or parsing characters from structured fields. One skilled in the art will also recognize that certain file types may possess one or more fields which makes identifying strings and indexing files easier and more accurate_ Consider, by way of illustration, an data file or message. AnI-IL-7 message is a structured ASCII file with delimiting characters, or pipes, that divide the tile into segments or fields, which correspond to or can be used as identifying indicia.
For example, the flist line of data in an I1L-7 message is typically the message header segment which identifies the file producer and date the file was created. The file may also identify additional information including, but not limited to, to whom the message refers, internal account numbers, external account numbers, various patient information, and the provider of services.
[00102] In one embodiment, the file type is RTF, TXT, or other similar text-based file containing a plurality of characters that may be used for indexing or processing the fik.
Consider, by way of illustration, a transcription file or message containing the transcript of an audio file. One skilled in thc art will also recognize that text-based file types are inherently less prone to matching error based on individual match strings and thus may provide a high degree of resolution and computation.
[00103] Second and/or first computing system 201/101 ntay be configured to index a structured document file based upon one or more of the identifying indicia strings contained within known fields in the file. In an embodiment, a set of one or more strings from the document tile may be compared against a comparison reference database in the same or similar manner as described below with reference to indexing services for validation and/or error correction. Because these document files may contain errors, by comparing these files against the reference database, these errors may be identified and corrected, thereby improving the accuracy of the indexing process. For example, a provider of laboratory services that mutually enters data by reading a specimen label may inherently produce erroneous structured message document files. This structured message document file may be reconciled against a reference database and corrected, thereby improving the accuracy of a previous manual process.
[001041 One skilled in thc art will also recognize that other thins of data conversion may bc performed on a document file wherein at least some of the plurality of characters may be obtained, regardless of the type of data originally received.
[00105] ln an embodiment, when data has been extracted from a document file, indexing service provider system 201 may check (915) if the process has been successfill. In one embodiment, if the process has been successful, the characters may be stored (935) in a file and that tile and an RIEQ file may be moved (940) to the input of a matching/indexing utility.
[001061 In an embodiment, if the extraction process has not been successful, the document file may be subjected to a rotation utility (920) for rotating the image. An exemplary rotation utility is described below with reference to Figuie 10. A rotation utility may or ntay not be performed depending on the file type. For example, if the document file is an audio file, TXT, RTF, XML, or I-IL-7 file, rotation would not be performed. If the rotation algorithm is successful (925), then the extracted character data may be stored (935) in a data file and associated with that document file from which it was extracted. In an embodiment, if the rotation utility is not successful or if 110 rotation utility is performed, then no characters may be stored (930) in the database data file associated with that document file.
[00107] Figure 10 depicts an exemplar), method for extracting character data from a document file according to an embodiment of the present invention. In the depicted embodiment, the linage data may be converted (1010), for example, through the use of an optical character recognition algorithm or algorithms, if necessary. The resulting character data obtained from the optical character recognition process may be used to identify all string candidates of length in or greater, where m may be preset or user selected. In one embodiment, t may be three or more characters; that is, each string composed of three or more characters is identified. These strings may then be compared (1020) against a reference dictionary or dictionaries. In an embodiment, a reference dictionaty may be a dictionary of common words, or may be words specific to a client, account, or sub-accottnt.
For example, it' the client is a medical profession, the reference dictionary may have words that commonly occur within that client's practice. hi an embodiment, the reference dictionary may contain words specific to an industry and common words not specific to any industry.
[00108] The comparison of the string candidates with the reference dictionary detennines if any words are found (1025) from the character data obtained from the extraction process.
lf no words are found, the data obtained from the extraction process is likely to be nonsensical and it is also likely that an error or problem occurred during the extraction process. An example of an error may be that the document file contained an image that when scanned or otherwise produced was in a layout that is different than the layout assumed by thc extraction process. For example, the image may be in landscape view or somewhat skewed and the extraction process assumes a portrait layottt. According to an embodiment of the present invention, the image may be rotated some n degrees (1030) and have the steps repeated again to see if the alignment is such that character data that yields words has been extracted. This process may be repeated a set number of times, until words are found, a user-selected number of times, or until all orientations have been checked.
[00109] In an embodiment, it may be set such that if the process has repeated steps (1010) through (1030) a =Tiber of times and no words are found, the process may cnd and return an alert that the process failed¨that no data was found and store a blank character data file as mentioned in Figure 9 (step 930). If words are found, the character data may be added (1035) to the character data file.
[00110] In an embodiment, additional orientations may be checked (1040). For example, in some reports, the textual data may exist in different orientations. For example, some characters may be in landscape layout and some characters may be in portrait layout. In the embodiment depicted in Figure 10, additional orientations may be checked (1040) to capture that character data. If it is desired that additional orientations are checked, the image may be rotated a certain number of degrees, ir, which may be preset or user selected, and the process repeated. If checking additional orientations (1040) is not desired, the character data obtained from the process may proceed (1045) to the next stage. As noted previously, if no character data was found, the character data file would be blank.
[00111.1 Turning to FIG. 11, in an embodiment, the character data file obtained from the extraction services may be checked against a reference dictionary to determine if valid data was extracted. Figure 11 depicts an exemplary method for determining if valid data was obtained from the extraction services according to an embodiment of the invention. Sitnilar io what was depicted with reference to Figure 10, the character data may be checked (1110) against one or more dictionaries. In embodiments, the reference directory may be specific to a client, may be a general dictionary, or may be SO= combination thereof. By checking strings against a reference dictionary, it may be determined (1115) whether valid words occur within the extracted character data file. In an embodiment, if no valid words are found, the file may be marked as "error" in a database of the indexing service provider 201. In onc embodiment, a blank character data file may indicate that an error has occurred.
[00112] In an embodiment, if valid data is found, the process may proceed to generating (1120) an array of strings from character data file, which comprises a plurality of characters.
H. EXEMPLARY CHARACTER DATA FILE AND EXEMPLARY
ARRAY OF STRINGS
[001131 Figure 12A depicts an exemplary character data file 1200 comprising a plurality of characters that might be obtained from extraction services performed on the document file 400. In an embodiment, the plurality of characters may bc organized into an array of strings 1205 or 1210 as depicted in FIG. 1213. In ( ie embodiment, a string may be defined as a set of characters bounded by delimiters, such as space, tabs, punctuation, and the like. In the depicted embodiment in 171.Ci. 12B, the strings are selected by space deliniiters, and a suing (e g., 1220-1) may be assigned a position within the array (e.g., 1215-1).

I. EIVIBODTMENTS OF INDEXING SERVICES
[00114] It should be noted, however, that difficulties may arise in processing a document file if the character data associated with it contain errors. For example, one or more of the character data, whether through data entt-y error or misidentification of a character or word by recognition operations, may be incorrect. Accordingly, in an embodiment, a comparison reference database may be einployed to improve the accuracy of identifying, indexing, and/or processing of a doctunent file.
[00115] In embodiments of the present invention, the array of strings obtained ftorn a document file may be compared against a comparison reference database to hclp index and/or process the document file. The comparison reference database may be the full reference database obtained from the indexing recipient system 101, or alternatively, the comparison reference database may be the database resulting front one or more filtering operations performed upon the full reference database or on an already filtered reference database. It shall be noted that in some instances even after performing filtering, the cornpatison reference database may be equivalent to the refc:rence database. Some embodiments of the present invention may utilize filters on the array of strings obtained from a document file, on thc reference database, or both in attempts to reduce either or both files.
Reducing either or both the array of strings and the reference database speeds the indexing. It shall be noted that the terms "filter" and "filtering" may be construed to mean one or more filtering/matching operations.
[00116] As noted previously, the information contained in or converted from a document file includes a plurality of character elements. These character data elements may be used as identifying indicia for categorizing the document file. In an embodiment, the character data may match information in a comparison reference database with varying levels of accuracy.
The data string elements are generally arranged in proximity between respective pairs of data suing elements that comprise identifying indicia. As such, the data may define identifying indicia to varying degrees of accuracy.
[00117] in order to improve the accuracy of the identifying indicia, second computing system 201 may analyze the data elements associated with the Ocument file, in particular by utilizing approximate matching algorithms and comparing a reference database to data siting elements at a plurality of points along the lengtlt of the data clement.
[00118] Absent comparison with one oi inorc known reference databases, the values of the data string elements derived tIont the document tiles may have errors since the document =
files may contain erroneous infortnation from the primary data source, such as from missing, incorrect, or misspelled inlbrmation, or from thc extraction process. such as optical character recoenition, speech recognition, or optical mark recognition.
101191 In embodiments, second computing system 201 may interpret a value for data string elements comained within document Mc .derived from physical or electronic media. In one embodiment, one Or more comparisoo reference databases may be applied to 3 data string element to obtain a value for that data sitine element. The value for a data string element may be the result of applying a comparison reference database to the original data string element. 13y repeating this process at a = plurality of points consisting of data string elements. the identifying indicia contained within the document file may be extracted such that a resulting array of strings. a structured tile comprisine clata from the document file. a composite, or a message representing the data contained within the document file is precise relative to the reference database. In the present embodiment, the resultant data tile, which may be an array of strings or set of strings. may be utilized in automated indexing processes.
101291 In an embodiment, in order to improve at least some of the plurality or data elements associated with the document lile, second computing system 201 may be configured to automatically correct information associated with the document tile according to a reference database or databases.
In an embodiment. second computing system 201 may apply a lecvenshtein algorithm to correct the information associated with thc document file. In one embodiment, second computing system 201 folly apply a Levenslitein-distanee algorithm, which is known to those skilled in the art and is disclosed in Algorithms am/ Theory of Computation Handbook CRC' Press 1.1..C.
1999, "1,eµ,,enshtein distance", in Dictionary of Algorithms and Data Structures, Paul U. 131uck. ed., U.S.
National Institute of Standards and Technology (10 Nov. 2005), and which is also available at <hup://www.nisegovidads/HTMULevensIncinduinl>. One skilled ili the art will recognize that a variety of approximate matching and correction techniques may be utilized to correct information, such as the plurality of data elements associated with a document file. and such techniques are within the scope attic present invoition.
101211ln addition to the foregoing or as an Ýalternative, as part of thee.raci. t ;
xon process and/or as part of the indexing/matching process(es), one or more techniques may be employed, including associative memory techniques that rely on learned coupling constraints or objective set definition poecedures, such aS, fur example. bigrams. Other approaches to error-tolerant searching, which include but are not limited to, deterministic finite automation, hash tables, associative memory, bipartite matching, longest-common-subsequence (LCS), glob style matching. regular expressions matching, and other approaches known to those skilled in thc art may be also employed. Searching methods are further described by Gonzalo Navarro and klathieu Rai(blot in Flexible Pattern Mulching in Strings (Cambridge University Press, 2002); by Maxims:
Crochemore and Wojciech Rytter in Jewels of Siring()logy (World Scientific.
2002); and by Vladimir 1. Lcvenshtein in Flintily codes capable olcorrecling deletions, insertions, and reversals.
Doklady Akademii Nauk SSSR, 163(4):845-848, 1965 (Russian) (English translation in Soviet Physics Dok lady, 10(8).307-710. 1966).
101221 One or more matching algorithms may be employed as part or or in combination with an indexing/processing method: exemplary indexing/processing methods are provided below for purposes of illustration.
101231 1'1G. 13 depicts an exemplary method 1300 for indexing a document tile according to an embodiment of the present invention. In an embodinient, a matching algorithm and/or one or more filters may be selected (1310). The matching algorithm, filters, or both may he preset or may be user selected. One skilled in the art shall recognize that a matching algorithm may he a filter and a filter may be a matching algorithm. For example. filtering the reference ilatabase based on a characteristic or characteristics may filter the reference database to a single matching record. Alternatively, a matching algorithm may return two or more records that satisfying the matching criteria. thus effectively littering the reference database, and this filtered reference database may be used in subsequent filtering and/or subsequent matching algorithms.
11)1241 The filtering (if application) and matching algorithm is performed (1315), and the results obtained. II a sufficient threshold match has been found (1325), the document file may indexed. In an embodiment, the threshold match value may be preset or user selected and may he based upon one or more factors including. but not limited to, the :wither of matching strings in the array of strings, the uniqueness oldie matching strings. the degree uf I-tininess allowed in the extraction and or matching processes. the type of Filters andior matching algorithms used, the deuce of matching with tile next closest mold) or matches, and the like. ill an embodiment, the indexing set-%
ice pro \ ider system 201 may index a cloeument tile by associating the document tile with a matching, record by generating.
(1330) a structured message that links the document tile to the matching record. In embodiments. the structured message may be an fIL-7 message, a mark-up language file, a file in a database, a text file with associated information, some other file type, or a combination thereof.
[001.25] If a threshold match has not been achieved, another match algoritlun and/or filtering operation may be selected (1335) and the process repeated. In an embodiment, the process may be repeated until a match has been determined or until all the filters and/or .
algorithms have been utilized.
[00126] If a threshold match still has not been found, the document file may be sent or loaded into a manual indexer (1340). The manual indexine services will be described in more detail below, but in an embodiment, the manual indexer may present the document file to an individual for manual matching. In an embodiment, one or more of the highest ranking matches (if any) may be associated with the document file and loaded into the inanual indexer to provide matching suggestions to the user. In an embodiment, if a match is made via the manual indexer, the document may be associated with a record via a structured message. (1330), as discussed previously.
[00127] In an embodiment, if, after manual review, an indexing match has not been made, the document file may be marked for deletion (1350) and/or it may be put into a queue for reprocessing. Reprocessing may be beneficial in certain instances. Consider, for example, if the comparison database does riot yet contain e record to which the document file should be indexed. By waiting and reprocessing, the reference database may be updated and a match found.
[00128] Figure 14 represent an alternative embodiment of a method for indexing a document file. The method depicted in Figure 14 is the same as that disclosed with respect to Figure 13 with the excepvion of an additional step (1410). In an embodiment, the indexing services may attempt to determine the document type that the document file represents.
Figure 15 depicts an embodiment of a method for determining the document type of a document file.
[00129] Figure 15 depicts an exemplary method for detennining a document type of a document file according to an embodiment of the present invention. In an embodiment, the array of strings for a document file may be compared (1510) against a phrase list or lists of document types. The phrase list may be specific to a client or industry Or may be general.
For example, a specific phrase list for medical office may include a list of medical lab reports and the like. If a phrase match is found (1515) the document type may be associated with the document file. In an embodiment, the doe,untent type may be associated with the document file by storing (1520) the document type In a structured file for the document file.
[00130] If a phrase match is not found, the document type may be recorded (1530) that it is not latown. In an embodiment, the document type may store (1530) the document type in a structured file as "Unsigned External Other," which means tltat it is not currently known. In an embodiment, if the document type is not known, a user may be alerted and requested (1635) to review the document file and input the document type, if any, and update the document type list. By updating the document type phrase list, more document types may be identified in subsequent document file processing. In an embodiment, the alert and review to determine document type may be performed via the manual indexer utility.
[00131] In one embodiment, an airay of strings or sct of strings may be associated with document types. The fi=equency of words, key words, or key word phrases may be calculated and tabulated. The association of word frequencies, key words, or kcy word phrases with the document type may be stored in a relational database. Subsequent string arrays from unknown document types may then be compared to previously stored associations and an estimation of document typc may be obtained. In one emboditnent, when a threshold of certainty for document match is reached, then the unknown document may be assigned a document type.
[00132] In one embodiment, system 201 may learn by experience to suggest the most likely document type match, and this suegcstion may be associated with document file submitted to a manual indexer. Based on a user's response, a match association is made.
That match association may be stored for use in other automated document type matches.
[00133] In an embodiment, the array of snings for a document file may be analyzed for word frequency and/or word associations and compared against known word frequencies, key words, or key word phrases contained in or associated with a phrase list or list of document types. The phrase list or list of document types may be specific to a client or industry or may be general. For example, a phrase list or list of document types for medical office may include word frequency, key words, kcy word phrases, word/phrase associates, ward/phrase proximity, and the like to help identify document types, such radiology reports, pathology reports, medical lab reports, and thc like For example, a phrase list or list of document types may indicate that multiple instances of the key word "X-ray" or a few instance of the key tvord "X-ray" in connection with at least on instance of "Radiology" in an array of strings may result in a conclusion that a threshold match (1515) has been reached. If a threshold match is found (151 5), the document type may be associated with the docutnent file.
[001341 In an embodiment, an examination of the array of strings using input front the phrase list may result in some matches but none that exceed a threshold match.
One or more of the document type matches may be provided to a user as suggestions. In an embodiment, the phrase list may be updated by identifying new document types associations.
In an =
embodiment, the alert and review to determine document type and document type associations may be performed via the manual indexer utility. In an embodiment, the system may record the matching configurations for future associations and matching, or utilize other adaptive learning techniques known to those skilled in the art to improve the matching processes.
[001351 One skilled in the art shall recognize that adaptive learning by the system helps increase indexing and processing and may be applied to other aspects of the system, including but not limited to embodiments of niatching/filtering not limited to document type.
One skilled in the art shall also recognize that the phrase lists or dictionaries utilized as part of the matching/filtering may comprise not only key words and/or key phrases querying, but also utilize word fi-equeneies, word proximities, conditional relationships, word associations, and the like and may be utilized in other matching/filtering applications described herein or known to those skilled in the art. It shall also be noted that a "word," such as in "key word,"
is a string.
[00136]
E:.'inbodiment of the present invention nmy utilize one or more match/filter operations on the array or strings and/or on the comparison reference database to aid in the indexing. Consider the following exemplary methods for indexing a document file.
[00137] Let A be the input alphabet, a finite set of symbols. Elements of A
are called the characters, which may be text or symbols. Examples of alphabets fluty include, but are not limited to, the set of all ordinary letters, the set Of binary digits, and the set of 256 8-bit ASCII symbols. In an embodiment, words or strings over A are finite sequences of elements of A. The length (size) of a string may be the number of its elements, which may include repetitions. Thus, the length of "aba" is 3. The length of a string may be denoted by Ix. The input data for an embodiment of a matching function may be a string, which may be the array of strings fro ) a document file or a portion thereof [003.38) The i-th element of string h is denoted by h[i) and i is its position on h. We denot6 by h[i...jj thc fitctor h(i), h[i+1], h[j.1 of h.
If i is genter than j, by convention, the string CA 2975694 2017-08-07 =

sfi..../J is an empty string (a sequence of length 0), which ma) be denoted b) E. In an embodiment, the string h of length in may be referred to as a factor (also called a substrine or subwordl of (he stringy if h ,v(iõ,1. wliere I. is an increasing sequence of indices on y.
101391 Instead ofjust one, pattern. one can consider a finite set of patterns and ask whether a given string contains a pattern from each set. Intbrination related to string matching has been discussed by lvfaxime Crochemore and Wojciech Rytter in Jewels of Stringology (World Scientific. 20(J2) at pp 10-11.
101401 Assume for purpose of explanation. that an array of strings comprises strings ht...hõ of the same or varying lengths and ills assitine that an array of strings that contain at least one capitalized character, CAPSTRING, eoniprises strings 11/.../1õ. In an embodiment, the CAPSTRING
strings. 11,...11õ, may be n subset of the array of strums hi.. .hõ.
101411 An embodiment of a matching function may comprise the following steps.
In an embodisnent, a filtering operation may be performed, which may comprise apply one or more lifters, to reduce the size of the reference database and/or of the array of strings, For example, an array of strings obtained from a document tile may be filtered to obtain all strings /1õ of length lx( that contain a capital In an embodiment. alter identifying, all strings 11õ.. strings surrounding these strings may also be included in the filtered result. That is, embodiment of the present invention may use identified strings as anchor points for including or excluding additional strings in the set of strings used for matching.
In an embodiment. the inclusion or exclusion of strings may be symmetrically or asymmetric:all) disposed about the anchor points. In an embodiment, the filtered result may select strings within a selected locntion, p, of the an identified string Hõ, such that the filtered set of strings comprises the strings /./õ.,, ... Assume.
for the purposes of illustration. that p=1, then the set of strings would be //õ.1, 11õ. and 1/õ. 8. In an embodiment, the filtered set of strings may contain a plurality of sei of strings comprising strings within the same or difThrent proximities of identified strings.
101421 In an embodiment, a sequence list comprisine sequences of length m that corresponds to first m characters in each ming from a set of stritius may he Lterieratecl. If in-3 and if (he set of Irings comprises Ow strings with at least one capital letter. /1,õ, then the sccpletice list xvotild comprise the seque ce (//411, /1.421, /1õ)3.)) for ench strinu Hõ in the set of strings. In an embodiment, a sequence may be generated from a substring portion of the string, and in an embodiment, the sequence list rnay comprise one or more sequences from strings in the set of strings. ft should be noted that these sequence lists may be used for matching/filtering purposes. One skilled in the art shall recognize that one benefit of using sequence lists, or substrings, is that if there exists some errors, such as from entry errors or as a result of the extraction process, matches may still be obtained by matching substring portions.
[00143] Consider now a comparison reference database, which may be a full reference database, a subset of a full reference database, or one or more subsets of a comparison reference database. Let the comparison reference database be composed of rows !al, where al, cx, ..., represent the set of possible comparison reference database rows in the comparison reference database. In att embodiment, each row may correspond to a record, wherein each record cotnprises one or more data field elements. Examples of data field elements may include, but are not limited to, the fields 405 depicted in Figure 4B (i.e., name, date of' birth, account number, service provider, provider, etc.). In an embodiment, a data field element may comprise a set of e.lements.
[00144] In= an embodiment, a set of strings, which may comprise a list of sequences, may be compared to the comparison reference database to reduce the cx (row) candidates by matching the sequences against al, ct2. ..., In an embodiment, search functions or algorithms may be employed, such as, for example, using the search engine marketed by dtSearch, Corp. or Bethesda, Maryland. In an embodiment, the row candidates (a) may be ranked by number of matches per row.
[00145] If only one row candidate is returned (has a match or matches), then the document file may be associated with that record. Alternatively, if additional verification is desired, additional Inatcbing may be performed, include without limitation, checking some or all of the array of strings against the row or rows to determine if inore matches are found.
[00146] In an embodiment, if more than one row candidate has a match or matches, each such row candidate may be searched against the entire array of strings, or a subset thereof, to identifY matches. In an embodiment, the matches may be grouped by row (aõ) to derive a new ftmetion or pattern, denoted INDXMATCII. In an embodiment, INDXNATCH for a row an may be denoted INDXIvLATC1-t_o.õ and equals the set of matches, MATCH!, MATCH2...., MATCHtp occurring in row uõ. The number of matches found in row ar, is cp.
In an embodiment, the INDX.MATCH results may be ranked by tp, which ranking may be used to indexing the document file. In an embodiment, certain values of matches in a row, MATC1-1p, may be given different weights for ranking purposes.
[00147] It shall be noted that each INDXMATCFI_an forms a pattern that may be searched against the reference database, or subset portion thereof. Let the number of instances that the pattern INDX1vIATCH_aa occurs in the reference database equal p. In an embodiment, the document file with an INDX1VIATCH_an that yields a value of p l may be defined as a threshold match. In an embodiment, if INDXMATCItan occurs in the reference database such that the value of p> 1, then those reference database rows containing the INDXMATCH_en pattern may undergo additional filtering/matching operations or may be submitted to a manual indexer. In au embodiment, the frequency of an INDXIVIATCItai, pattern may be defined as p/Zaj, where j is the number of rows in the reference database in which the patient is searched. The probability of the match, Põõ may be defined as 1- (p/Ecti).
In an embodiment, the probability, Põõ may be associated with the document file, such as in a stnictured message file. In an embodiment, the probability that a randomly selected identifying ludic:ill unrelated to the INDX.MATCH_ctõ would coincidentally share the observed pattern profile is thc product of the individual match frequencies.
[001481 One slcillecl in the art will recognize that additional filteringlmatching operations may be performed with the above described embodiments. For example, as discussed above matching/filtering the array of strings based upon capital letters, using strings within a proximity of identified strings, using sequences lists, using INDXMATCH
pattern matching, and the like may be employed in combinations. Additional examples of filtering/matching operations include, but are not limited, to filtering by string size, filtering by dictionaiy/phrase list or lists, filtering by recently matched records, etc.
One skilled in the art shall also recognize that steps performed above may be rearriniged, excluded, or repeated.
For example, in an embodiment, one or morc anchor points may be selected strings in the array of strings that have matched something in the reference database. It should be noted that one benefit of using anchor points is to improve the searching/matching by introducing proximity weighting.
1001491 Consider, by way of illustration, the following example. An array of strings may be filtered to obtain strings that may coirespond to a birth clate. This set of strings may be searched against a comparison reference database. If a string or a plurality of strings matches data field elements in the comparison reference database, one or more of those strings may be used as anchor points to form a set of strings. In embodiments, other identifying indicia, such as name, account number, social security number, etc., are likely to be in proximity in the array of strings to the date of birth. In an embodiment, the set of strings obtained from proximity weighting may be used with INDXMATCH pattern matching or other fil(ering/matching algorithms.
100150) Figure 16 depicts an alternative embodiment of a method for indexing a document file according to an embodiment of the present invention. In an embodiment, the array of strings obtained from the document file may be filtered (1610) to obtain all strings that included at least one capitalized letter. In en embodiment, additional filtering may be part of the filtering, operation. As depicted in Figure 16, thc set of strings attain front the capital list filter may be filtered to remove (1615) common words, and may be filtered to select (1620) only strings that are greater than y characters in length.
[00151] In an embodiment, one or more sequence lists may be created. In the depicted embodiment, two sequence lists was may be populated, in and in). The first sequence list, iìì;, may be. defined as a substring of length r starting at a first diameter position for each string in the set of strings obtained after the filtering operation, and the second sequence iist, 1112, may be defined as a substring of length s that starts at a seuond character position. In an embodiment, the first and second character positions may be the same and the character = lengths r and s may be the same. For purposes of illustration, assume that r = s = 3 and that mi starts at character position 1 and iìì) starts a character position 2. The sequence lists for the string "test" would be m tcs and 111?
= est. Returning to Figure 16, in an embodiment, the reference database may be filtered ui generate a cotnparison reference database of just first name and last name data fields. Matching may be performed between the set of strings comprising the sequence lists, ml and n12, and the comparison reference database. In an embodiment, one or more Boolean operators may be usecl in the searching procedure. For example, Boolean searching may comprise searching (1630) for first and last name matching both rn1 and in2; first or last name matching both mi and m2; first and last name matching either in1 or nì2; first or last name matching either im or 1n2; or any combination thereof. The results obtained from this search may he considered a comparison reference database comprising a list of potential matching candidates, and one skilled in the art will recognize this as a filleting operation. In an einbediment, for each candidate data field, a11 possible substrings may be calculated (1635), and starting with the longest fragment, the array of snings, or a filtered subset thereof, may be searched to find the longest fragment that matches. In an embodiment, one or more of the matching strings within the array of strings may become an anchor point for their ntatching candidates. In an embodiment, the comparison reference database comprising the candidate list may be filtered (1645) to those rows or records where fragments of both the first and last name were found in the array of strings or subset thereof. In an entbodiment, the comparison reference database obtained from step 1645 may be filtered (1650) based upon the proximity of the string fragments to each other. For example, in an embodiment, a filter may select only those candidates front the comparison reference database wherein the first name fragment match and the last name fragment match are within a sct number of positions within the array.
1001521 In an einbodiment, each candidate within the comparison reference database obtained from step 1650 may be compared against the array of strings or a portion thereof (such as a set of strings obtained from one or more of the filtering steps 1610.-1620) to look (1655) for other matching strings of identifying indicia. In an einbodirnent, a match value or score may be assigned (1660) to cach string front a record found within the array of strings or portion thcieof. In embodiments, the match t,alue may be the same value per match (e.g., each match regardless of what is matched receives the sante value) or may be different values (e.g., longer string matches or matches to certain data fields may have higher point values).
hi an embodiment, onc point may be assigned (1660) to each string or snbstring from a record tbund within the array of strings or ',onion thereof. In an embodiment, if a single record has the highest match score and that score exceeds (1665) a threshold match value z, that record may be selected (1670) as the matching tecord and the document file may be associated with that record. If more than (toe record has yielded the highest match score or if the highest seming record does not have a match score that exceeds a threshold match value 7., then in an embodiment, the candidates with the highest score(s) may be listed in a manual indexer. In an embodiment, these candidates may be ranked according to their match scores and displayed with thc ranks or in ranking order. In an embodiment, the matching string or string fragments for each of these candidates may also be displayed as part of the manual indexing process. One skilled iit the art shall recognize that embodiments the methods presented above may end if, following a filtering/matching step, one candidate is returned.
[001531 Onc skilled in the art shall recognize the assignment of match values may be applied to any matching task or operation, including without limitation, document type, service provider, service recipient, recording events, observations, or other indexing tasks not specificaliy listed.

[001541 Consider the following exemplary embodiments of methods for matching a document file with a record or records. For purpose of illustration, assume that the document file is an image that has undergone an exttaction process to yield the following array of strings:
TABLE .t ¨ SAMPLE ARRAY OF STRINGS
Acme Women's Medical Associates, Inc Board Certified Specialists in Women's Ficalth Care John J. Doe, MD, PhD, FACOG
Michael D. Gelring, MD, FACOG Jane Smith, RN, CNIvI
111 N. Crestwood PO Box 2222 Porterville, CA 93258 559 555 5555 Fax: 559 555 May 11, 2006 Page 1 Patient Information For: Dreda J Schmidlkobbler DOB: 0 911 511 94 0 Account #: 6463 ! Patient Consent for Use and Disclosure o Health Information ' I Dreda J Schinidlkobbler hereby give my consent for Acme Womens edical Associates, Inc to use and disclose protected health information about me to carry out treatment, p yinent, and health care operations.
____________________________________________________________ ...
100155] In an embodiment, a filtering operation may be perfomied upon the array of strings to obtain a filtered set of shings. hi an embodiment-, the filleting operation may comptise one m more filters. An example of a filter may be a client/indexing recipient address filter that searches for and removes. Jr present, the address of the.indexing recipient.
For example, if the client were Acme Women's Medical Association with an address of 111 N. Crestwood, PO Box 222, Porterville, CA 93258, the filter may look for these strings. In an embodintent, variants the client's address and contact information may also be included.
(001561 Another example of a filter may be a size filter that removes all strings that do not exceed a set number of characters in length y. For example, if y = 4, all strings with three or fewer characters may be filtered out of the set of strings, such as, for extunple, Inc, in, MD, PhD, DO, RN, CNM, for, Use, and, o, g:, 1, J, my, and so forth.
1001571 Another example of a filter may be an exclusion list filler comprised of strings to be excluded, which may include general woids and/or client specific words. For example, client employee names, such as John J. Doc and Jane Smith may be excluded from the array of strings.
[001581 Yet another example of a filter may be a dictionary filter comprised of strings to be excluded, which may include general words and/or client specific words.
Examples of wurds that may be excluded from the above array of strings may include such word as Board, ' Certified, Specialists, Women's, Patient, Information, hereby, give, consent, disclose, protected, health, information, about and the like.
[00159) Another example of a filter may be a duplication filter, in which duplicate strings may be removed.
[00160] In an embodiment, after the filtering operation, the resulting sct of strings filtered from the array of strings may be that listed in Table 2.
TABLE 2 ¨ SET OF STRINGS
Fiealth Michael Geiring FACOG
Dreda Schmidlkobbler edical Lyman 1001611 In an embodiment, one cm more sequence lists for each of the strings from the set of strings that contain a capital letter may be generated. For illustration purposes, assume each string with a capital letter (e.g., Michael, Gelling, Dreda, and Schntidlkobbler) has two sequences created, mi and in, where ni ¨ the first three characters of the string and nt= the three character of the string starting at the second character in the siting.
Thus, the resulting sequences would be:
[00162] 1111 = 111, MIC, GEL, FAC, DM, and SCH
[00163] m?. = IE.A, ICH, ELR, ACO, RED, and CHM
[00164] In an embodiment, relational pairs may be generated based upon proximity to each other in the set of strings. Table 3 shows relational pairs for the illustrated example:
TABLE 3 ¨ RELATIONAL PAIRS
E ___________________________ llil 1 n12 rl FIE,MIC t LEA,ICH
r2 MIC,GEL ICKELR
r3 GEL,fAC ELR,ACO
FAC,DRE ACO,RED
r5 DRE, SCH RED,CHIVI

[00165] In an embodiment, the set of shings in Table 3 may be compared against a comparison reference database trying to find records in the comparison reference database where the first name and last name data field elements match both nx, r, and th2, rx. For purposes of illustration, assume the records from the comparison reference database that satisfy the above-stated conditions are:
TABLE 4 ¨ QUERY RESULTS

SSN FIRST jvi. LAST NAME
ID N A ME NAIVIE
I 108130 12-27. 2162 ABC- DREDRAM SCHMIDT

=
[001661 In art embodiment, the comparison reference database in Table 4 may be further , reduced by taking the FIRST NA.ME and/or LAST NAME. data file elements for each of the candidate records and creating a siting fragment table. For example, the substring fragment list for DREDRAM may comprise: DREDRAM, DREDRA, REDRAM, EDRAM, EDRA, DRAM, DRE, RAM, and EDR; and the substting fragment list for DREDA may compiise:
DRFDA, DRED, REDA, DRE, RED, and EDA. It shall be noted that the size of the substring may be varied.
[00167] ln an embodiment, starting with the longest fragment, the set of strings, which represents a filtered portion of the array of strings, may be searched to find the longest fragment present in the set of shings that matches this fragment. In an embodiment, a matched string within the set of strings may become an anchor point for this record candidate, and a search may be perfornicd for the longest FIRST NAME and/or LAST NAME
substring µvithin p string positions of the anchor point. In one embodiment, p may equal 2.
[001681 in an embodiment, if no match exists, this anchor point may be ignored and a search may be peribrined to find a new anchor point, wherein the process of searching thc strings surrounding the anchor point is repeated.
[001691 In an embodiment, if a match for a substring exists for both FIRST
NAME and LAST NAME, then a match score of 2 points may be assigned to that record.

[00170] In an embodiment, an 1NDXMATCH pattern comprising the matching sequences MATC1-I1, IvIATCH2. ..., and MATCHcan for a record may be generated, where a substring match is MATCH9õ. In the illustrated example, INDXMATCH2 is Dreda Schmidlkobbler or SelimiclIkobbler, Dreda and INDXMATCH1 is Drednun, Schmidt.
1001711 bi an embodiment, let the number of instances (i.e., frequency) of INDXIvIATCHõ
in the comparison reference database equal ph. If a values or values of põ
1, then the records with the 1NDXMATCH patterns that produced that frequency may be associated or matched with the document file.
[00172] In an embodiment, if the records' INDXMATCH patterns generate frequency values p> 1, then it may be considered inconclusive whether such a record matches. In an embodiment, sonic or all of these records may form a comparison reference database and additional criteria or operations may be used to reduce the number of record candidates. In an embodiment, a document file may bc associated with more than one record.
[001731 In an embodiment, the proximity of search strings surrounding anchor points may be increased or iteratively increased. For example, searches may be performed for thc longest FIRST NAME and/or LAST NAME substring within 15 string positions of an anchor point. If no match exists, the candidate records from the comparison database may be sent to a manual indexer as suggestion from which a user may select.
[001.741 In an embodiment, if additional match sequences are identified, an additional point may be assigned to a record for each such additional match sequence, wherein the points may be used to match a record to a document file and/or to rank the records. In one embodiment, let the number or match sequences that comprises INDXIAATCH be (p, where cp = 1, 2, 3, ..., or:. The candidate record with the highest tp value may be matched with the document tile.
[00175] Turning to FIG. l 7, an alternative embodiment of a method for indexing a document tile according to an embodiment of the present invention is depicted.
In the embodiment depicted in FIG. 17, the reference database may be filtered using dates obtained from the array of strings. The array of strings nu) be searched to identify (1705) all strings in sets of strings that may conform to a date format, and these may be sorted (1710) chronologically. The earliest date may bc assutned (1715) to be the date of birth of a patient, and that date may be compared against the date of birth fields to identify (1725) all candidate records that have the same date of birth. In embodiments, if the date field for a record is empty, that record may be included or excluded as a candidate record. Thc resulting candidate records form a comparison reference database. In an embodiment, if a single candidate record is rettu-ned, dte document file may be associated with that record.
[00176] As depicted in Figure 17, the string in the array of strings that is assutned to be the date of birth may form an Michor point. Strings within x places from the anehor point may be searched (1730) against the coinparison reference database. If a threshold match is found (1735), that record may be associated (1740) with the document file. As noted previously, a document file may be associated with a record by storing information to a structured message.
1001771 If a threshold match is not found (1735), the proximity filter may be expanded (1750). if the proximity filter is expanded (1755), the newly added strings may be compared against the comparison reference database. This process may be repeated until a match is found, a set nuinber of times, until all the strings and the array of strings have been included, and/or until a user indicates to stop.
[00178] If, after the above steps, a threshold match has not been exceeded, it may be recorded (1760) that no match was found. In an embodiment, the candidate records may be submitted to a manual indexer.
[001791 Turning to FIG. 18, an alternative embodiment of a method for indexing a document file according to an embodiment of the present invention is depicted.
In the embodiment depicted in FIG. 18, a record data field element or elements, such as a patient identifier, may be used to filter a document file. One or more record data elements, such as patient name, account number, social sccurity number, date of birth, etc. may be used to search the array of strings of a document file, or a set of strings obtained from the array of strings, to locate (1820) a matching marker/identifying indicia. If a marker is found (1825), the marker may be used as an anchor point. In an embodiment, a set number of strings may be selected surrounding the anchor point. In an alternative embodiment, a set number of characters surrounding the marker inay be selected (1830), and those characters may be placed into a set of strings (1835).
[00180] In an embodiment, the array of strings or the set of strings may be searched to obtain (1840) the oldest date, which may be assumed to be a birth date. The comparison reference database may be queried to obtain (1850) a listing of all records in which a person has a matching birth date, which may form a new comparison reference database.
One or more data fields from this comparison reference database, such as first and last name (1855), maybe checked against the set of strings. If a mateh is found (1860), the document file may 46.

be associated with the matching record. In an embodiment, the document file may be associated with the matching record by storing (1365) information in a structured message. If more than one match was found or if no matches were found, one or more additional matching/filtering operations may be performed. In one embodiment, an matching algorithm or method, such is the one described with refetence to Figure 16, may be employed (1370) and the results retuned to generate a set of strings (1335) wherein the method may be repeated.
[00131] In an embodiment, if an initial marker is not located within the array of strings, thc entire array of strings may be selected (1875) and the process may continue from step ISO in like nianner as described above.
[00162] 'Figure 19 depicts an exemplary method for determining a date of service of a document file accordine to an embodiment of the present invention. An embodiment of the date of service utility may begin by searching (1910) the array of strings, or a filtered VelSi011 thereof, to identify specific data. In an embodiment, a dictionary list inay be used to search for specific words. For example, a list of document typcs may be compared against the strings.
[00183] IT identified data is found, a date of service may be found (1915) based on a specific algorithm related to that identified data. Consider, for example, the exemplary emboditnent depicted in Figure 20. The identified data may help indicate where the informatioti may be located within the document file. For example, if the phrase "pathology lab report" is found within the array of sttings, it may be known that the date of service will be within a set distance (2010) from that phrase. Accordingly, the data service may be easily identified. In such cases, the date of service information may be associated (1920/2030) with the document file. In an alternative embodiment, the date of service algorithni may look for a date relative to the dates within the report. Consider for example the following example ¨
assume that the report type is known and it is known that that report type contains three date fields: a birth date of a patient, a date service, and the dine the report was submitted to a client. The date-of-scrvicc algorithm may be identified by finding the three dates within the set of strings and locating the middle date since it will be after the birth date but before the date the report was submitted to the indexing recipient system.
100184] En one embodiment, the date of service information may be stored in a structured message for the document ftle. One skilled in the art shall recognize that other algorithms may be used to determine information once another piece of identifying data has been fotmd.

[00185] Returning to Figure 19, if no data has been identified within the array of strings that is beneficial for identifying the date of service, the array of strings may be searched (1930) to identify all strings corresponding to a date format. The selected dates may be sorted (1935) chronologically, and a check (1940) made to see if the most recent date is the current date. If the most recent date is the current date, it may be that the date fOund is referencing the date the document file was submitted. Thus, in an embodiment, if a penultimate date is present (1945), that date may be set (1950) as the date of service and associated (1920) with the document file as discussed previously.
[00186] If the most recent date is not dte current date (1940), then a check may be performed (1955) to determine whether that date is greater a set time interval old. In an embodiment, it may be assumed that a document file has been received because of some recent activity; therefore, if a recent date appears within the array of strings and that date is relatively recent, then that date may be set (1965) as the data service and associated (1920) with the document file, as discussed previously.
[00187] If the most recent date is not the current date (1940) and the most recent date is greater than a set amount of time old, the date of service may be set (1960) as "Llnknown"
and that infoimation inay be associated (1920) t.vith the document file. In an embodiment, if no date strings were located within the array of strings, the date of service may similarly be set as "Unknown." In one enibodiment, date candidates may be sent to the manual indexer as match information comprising date suggestions.
[00188] Turning to Figure 21, an embodiment of a method for indexing a document file is depicted. The method begins by generating (2)05) a filtered set of the array of strings by selecting every string that has at least one capital letter, CAPSTRING. In an embodiment, this set of strings may optionally be further reduced (2110). In one embodiment, additional filtering may be performed to the set of strings by removing (2115) any strings from the set of strings that match strings in a list of strings, such as a dictionary list, client address/contact information list, or the like. Otte skilled and the art will recognize that other filtering stcps may be performed as part of this initial filtering operation.
[00189] The set of strings obtained from the filtering operation may be used to find pattem matches in the reference database, or in certain fields within the reference database, such as, for example, first name, last name, and the like. ln an embodiment, substrings front the set of strings may be used to find matches within the reference database. For example, substrings n =

characters in length may be used. The records that results in a match from the comparison (2120) may be considered a comparison rekrence database.
[00190] In an embodiment, if the pattern match process returns no record (i.e., the comparison reference database is the empty set), the filtering operation used to obtain the comparison reference database may be expanded (2130). If it is desired to change the filtering, one or more filtering parameters may be changed (2135). For example, the size of the substrings, n, may be decreased to obtain smaller substrings sizes, and the process may be repeated of compiling the substrings to the reference database to obtain a comparison reTerence database. If expanding the filter is not desired (2130), it may be indicated (2165) that no record match was found for the document file. In an embodiment, the document file may be sent to a manual indexer for manual indexing of the document file.
100191] IF the comparison reference database is not the empty set, the array of strings, or a filtered atray of strings sue!: as the set of strings from step 2105, may be compared (2140) against Oita comparison reference database to identify additional matches..
The records within the comparison reference database that yielded matches may be ranked (2145) according to ranking criteria. In an embodiment of ranking criteria may be based on the number of matches within the record and may include weighting the ranks based upon which fields within the record were matched. If a record exceeds (2150) a threshold matched level, the document file may be associated (2160) with the matching record. In an embodiment, the file may be associated with the record by storing information to a structured message file, which may include the infornuttion that was matched.
[00192] If no record exceeds a threshold match, the records with at least one match, or alternatively, only the lop ranked records, may be scnt (2165) to a manual indexer for manual indexing by a user. There may also be an indication (2165) that no record match was found for the document file. In an embodiment, the indication that no record match was found may be stoted in a structured message for the doc.ument Iile.
[00:193] Figure 22 depicts an alternative embodiment of a method for indexing a document tile according to an embodiment of the present invention. In the embodiment depicted in Figure 22, the array of strings may initially be filtered to extract (2210) strings conforming to a date forniat in deretmining (2215) the oldest date.
[00194] Assuming thc oldest date corresponds to a birth date, that clate may be compared against the date of birth field in a reference database. The comparison reference database obtained from this operation may contain one or more records. To provide additional assurance that a record is the correct match or to further reduce the comparison reference database, one or more matching/filtering operations 2225- l ¨2225-n may be performed. In an embodiment, the matching/filtering operations may be tiered.
[00195] For purposes of illustration, consider the following tiered search embodiment.
The search may begin by selecting the first mune and last naine from the date-of-birth filtered comparison reference database to look for those strings within 3 strings of each other (proximity value) in the array of strings. In an embodiment, the date of birth string may be used as an anchor point for reducing the array of strings. In an embodiment, the degree of match, herein referred to as fuzziness or the threshold match value, may be set to a specific value. In an embodiment, the fuzziness value may be set at a value that requires a close match.
[00196] A second tier matching/filtering operation may comprise the following matching/filtering process. If the comparison reference database comptises candidate records with matching date of birth but no matches were found during the first name and last name search, then in an embodiment, thc threshold match value may be adjusted to allow for less exact matching and/or the proximity value may be increased.
[00197] A third tier tuatching/filtering operation may comprise the following matching/filtering process. Additional fields from records within the comparison reference database may be utilized. In an embodiment, account number, patient ID, social security number, and the like may be used in the matching/filtering. In an embodiment, the fuzziness/threshold match value may be set to require a close match.
1001981 A. fourth tier matching/filtering operation 'nay comprise searching for first name and/or last nanie within a proximate range of one or more of the foregoing identifiers, i.e., account number, patient ID, social security munber, and the like.
[00199] An embodiment of e last tier matching/filtering operation may comprise the Following. First, a capital list string filter may be applied to reduce the array of strings to a set of strings that comprise at least one capital hater in each string. The reference database may be filtered by identifying all candidate records that have the first three letters of the first 11111110 and the hist name and the second three letters of the first name and the last name. In an enibodinient, the comparison reference database may be reduced further by excluding ail records that have not had any activity with a set number of dates, for example, 45 days.
[00200] If no record has a match that exceeds a threshold value, a search may be perfonned to identify all candidate records that have the first three letters of the first name =

and the last name or the second three letters of the first name and the last name. In an embodiment, the comparison reference database may bc reduced further by excluding all records that have not had any activity with a set number of dates, for example, 45 days.
[002011 If a recotx1 is found to exceed a threshold match, the document file may be associated (2230) with that record, and the system may wait (2235) for the next document file or array of strings from a document file to be received for processing.
100202] If, following the inatching/filtering operations, the document file has not been successfully matched to a record, the final comparison reference database obtained from the matching/filtering operations may be examined (2240) to determine the number of records contained therein. If more than one record is a matching candidate, this information may be sent (2240) to a manual indexer for manual indexing by a user. lf no records exist within the comparison reference database, it may be indicated (2245) that no record match \vas found.
In an embodiment, the document file may be sent (2240) to a manual indexer for manual indexing. Alternatively, the document file may be put into a queue and may be reprocessed at a later date or following a specified event, such as tbr example, receiving an update to the reference database.
[002031 One skilled in the art shall recognize that filters/matching algorithms may be used in any orders, in any combination for any matching or tiering. In an embodiment, the application of a filtering operation or operations may be directed by processing times and/or match results. Filters may be applied to an unmatched array of strings or to a comparison reference database (which shall be construed to also include the reference database or a previously filtered comparison reference database). Examples of filters hnve been given lictrein and some are additionally given below, although one skilled in the art shall realize that other filters/matching algorithms not listed here may also be used.
[00204] Date of Service (DOS) Filter. The comparison reference database rnay be reduced by applying a DOS tiller so that only candidates with activity (e.g. have been seen by a doctor) within or after a certain tune period are used for matching algorithm(s).
[00205] Date of Indexing (DOD Filter. The comparison reference database may be reduced by applying a DOI filter so that only candidates with activity (e.g.
have been recently indexed) within or after a certain time period are used for matching algorithm(s).
[00206] DIDXMATCH Filter. The comparison reference database may be reduced by applying a filter so that only candidates derived from the reference database that meet pattern matching criteria arc ttscd for matching algorithm(s). In an embodiment, the pattern match filter may be derived from the army of strings by identifying string candidates of n length, considered as the longest common substring.
[002071 Boolean Filter. The comparison reference database may be reduced by applying a Boolean titter so that only candidates derived from the array of strings that meet Boolean criteria (AND/OR) are used for matching algorithm(s). In embodiment, the complexity of the search criteria may be varied to include proximity searching, root expansion, wild card searching, conditional operators, string frequencies, string associations, match profiles, and the like, as well as Boolean operators.
[002081 CAPS Filter. The comparison reference database may be reduced by applying a set of strings that have been derived from the array of strings and wherein the strings in the set of strings have at least on capital letter. By applying such a filter, only reference database records that meet capital letter pa deni mate hiitg criteria arc used for matching algorithm(s).
[002091 Subtraction Filter. The compaiison reference database or array of strings/set of strings nmy be reduced by applying a subtraction filter derived from the array of strings or from other source(s). An example of a subtraction filter may be removing common words from the array of strings. Another illustrative example may be attempting to process an array of strings from a document file that has multiple patient names. Once a patient name is identified, the identifiers for that patient (which may come from the matching record for that patient) may be subtracted from the atTay of strings and the array of strings may be reprocessed to look for other patients.
(002101 Fuzziness Filter. As noted previously, the threshold level of match identity may be varied. In an embodiment, a firainess of I may be an exact match, where a fuzziness of may be a weak match. Adjusting the fuzzi 'ma can allow one to identify sequence candidates while allowing for the causes of mismatch, such as OCR error, misspellings, etc.
[002111 It shall be noted that filtering/matching algoritluns may be adjusted based on match quality. One skilled in the art shall recognize that a number of inatching/filtering operations may be performed as part of the embodiment depicted in Figure 22, including Nvithout limitation all those described herein. It shall also be noted that the embodiment depicted in Figure 22 is for purpose of illustration and that embodiments may be employed.
[00212] Figure 23 depicts an exemplar), method for determining a provider associated with a document file according to an embodiment of the present invention. The array of strings obtained from a document file may be searched to identify a provider. .A
provider may mean iecipient of the document tile, an author of the document file, a patient, a subject of a document file, the owner of the document file, the user of a document file, and the 'lice. In an embodiment, because the account information for the document file is known;
that is, the indexing recipient infortnation or account is known, a list of possible providers for this account may bc accessed (2315). That list may be compared against the array of strings to identify (2320) providers.
[00213j In an embodiment, a provider may be deten ined based upon an association rather than from finding a direct match in the array of strings. In one embodiment, the list of providers may comprise more than just a listing of providers, but may also include associations with providers. For example, the list of providers may include key words or matching that, when found, tesults in an association with a provider. ln an embodiment, thc provider may be associated with a document file based upon the document file being matched to a record in a reference database. For example, the provider may be associated with a record in a reference database and this information may be associated with the document file when the document file is matched to the record. In an embodiment, a provider may be associated with it document file based upon information provided within a record in the reference database to which the document file has been matched.
[002141 If no providers or more than one provider is identified within the array of strings, a default provider for that account may be assigned (2330). If one provider is found, that provider may be assigned or associated (2325) with that document file. In an embodiment, the provider information may be associated with a document file by storing the provider data into a structured me,ssage for that document file. In an embodiment, the client OT indexing recipient or the provider identified may be billed. For example, if thc document file is lab results and a provider that performed the lab testing has been identified, and the patient for which this testing has been performed has also been matched within thc database, one embodiment of the present invention may involve billing the patient for the services provided by the laboratory. In an embodiment, the indexing recipient may be billed for services provided by the indexing service provider.
(00215] Turning to Figure 24, an exemplary method for indexing a document file according to an embodiment of the present invention is illustrated. In an embodiment, unmatched document file may be compared (2410) against the match results of the previously document in the batch. In an embodiment, a comparison reference database may comprise the records which have been successfully matched to other document files within a batch. An array of strings for a non-indexed/unmatched document file may be compared (2420) against this comparison reference database. In an embodiment, the comparison reference database may also include records that were manually indexed. The array of strings may also be competed (2425) against the results of other matched or manually index documents from the sante or recent batches. In an embodiment, he inunatchecl document file may be compared (2430) against a comparison reference database using one are more subtraction library techniques.
[0021G) If any of' the foregoing comparison methods successfully identify matching information, this matching information may be associated (2415) with the document file. If the foregoing matching techniques were unable to identify a matching record, the document file may be indicated (2435) as having no match and may be sent to a manual indexer for indexing by a user.
J. EXEMPLARY EMBODIMENTS OF ADDITIONAL DA'I'A
[00217) In an embodiment, additional data may be related to or associated with a document file. In an embodiment, this additional data may include additional text, such as one or more standard or predefined paragraphs. In an embodiment, the additional data may include, hut is not limited to, predefined text, predefined video, web sites information, photographs, pictures or other images, letterhead, stationery, links to any of these items, pointer to the document file's location, a link to the document file, or the like. In an embodiment, the additional data may include the infomuition contained within a reference database or databases. The additional data may itlso include, but is not limited to, corrected identifying indicia such as name, date of birth, social security number or the like. The additional data may also include, but is not limited to, structured data, array of strings/set of strings, document identifying indicia such as document type, event observations, document content, interpretation of document content, and the like.
(002181 As a result of matching, interpolation, and/or approximation processes, second computing device 201 may effectively define or identify one or more additional data elements. In one embodiment, once a match between a document file and a reference database record has been made, additional data may be added to or associated with the document tile. hi an embodiment, one or more of the fields 405 may provide additional in lomiation that may be associated with the cloeument file. For example, the additional data may include an account number or other infomiution. In an embodiment, corrected data may he additional data and may be added to the associated data, the document file, stnictured data, and/or referenced database.

[00219] By way of illustrative example, an account number may be additionally identified based upon associated data elements such as name, date of birth, or social security number.
hi an embodiment, if additional data is present in the reference database record associated with the document file but not in the document file, that data may be added to the structured message.
[00220] In yet another embodiment, the additional data may be data to include with the document file. For example, in an exemplav case, additional data such as, for example, notes from the physician, prior medical information, test results, or other data may be included \vith the document file.
[00221] In an embodiment, the additional data may include internal or external instructions for processing the document file. According to one aspect of the present invention, a user, client, or third party may provide first or second computing device 101/201 with instructions related to or associated with a document file, record, or account. These instructions may Mehl& additional data to be included with the document file. In an etnbocliment, the instructions may indicate that a message is to he generated and may also indicate the additional data that is to be provided in the message or messages to recipients and/or in certain types of messages. For example, the instructions may indicate that all messages to a particular indexing recipient should include a predefined letterhead or background image and should further include additional text that may have been previously stored.
For example, Ms. A Iamb may have set specific instructions that she wants a copy of all reports to be sent to her. The additional data may include instructions to first and/or second computing system 101/201 to transmit a copy of the document file 400 to her and may include an address, fax number, or email address for Ms. Alamb.
[00222] The first and/or second computing system 101/201 may be adapted to create a variety of different types of messages, including, but not limited to email messages, facsimiles, instant messages, and audio messages. In an embodiment, the type of message generated may depend upon either the instructions received by the first and/or second computing system 101/201 from a user or upon prior parameters that have been defined with respect to messages directed to the intended recipient.
K. EXEMPLARY EM BODI MENTS OF PACKAGING AND
TRANSMISSION SERNLICES
l00223] Figure 25 depicts an exemplaty method for returning information related to processed document files to a client system according to an embodiment of the present invention. In an embodiment, the method of Figure 25 may be performed by the packaging and transmission services of indexing service provider 201. As depicted in Figure 25, the indexing service provider system 201 may obtain (2505) a list of the current active batches, and count (2510) the number of complete document files and error document files. A check may be performed (2515) to verily that the batch has completed processing.
That is, that the number of complete files plus error files equal the total number of tiles that batch contained.
if the total number of files does not equal to the total number of files in that batch, the system 201 may wait for the batch to finish processing.
[002241 If the batch is completed, a package or folder for all files that are ready for packaging for this batch may be generated (2520) and all files not ready for packaging may be marked as incomplete. The files which may be ready for packaging may include, for example, a structured message file for each of the process document files. The structured message files, which may be an XML, HL-7, text, or other file type, may be moved (2525) into the package file. The return status for the package and/or the files within the package may be indicated (2530) as "packaged." In an embodiment, a bill for the client/indexing recipient may be generated (2535) and may be included with the package or sent separately.
In an embodiment, a rename file may be eenerated that instructs the client systeni 101 how to rename the document file to pair/index it with the structured messages. In this way, the document tiles need not be retransmitted to the client system 101. In an embodiment, the structured message may also be the rename file. In an embodiment, the package file may be compressed andior encrypted (2545) as part of the transmission. In an einbodirnent, the pae.kage file may be placed (2550) on a server, such us a tile transfer protocol server, for irmismission to the client systein 101, wherein the client may initiate the transmission. In an alternative embodiment, the package may be transmitted to the client system 101.
1002251 Figure 26 depicts exemplary types of infommtion that may be associated with a document file according to an embodiment of the present invention. As illustrated in Figure 26, a plurality of types of information may be associated with a document file, including but not limited to, document type information 2605, demographic data 2610, additional information 2615, array of strings and/or one more sets of suings 2620, provider 2635, matched results 2625, and other matching information found through the process of matching or filtering. Additional infortnation inay mean the same as discussed previously, including, without limitation, information contained within one or more data fields of a matching record. In an embodiment, reference database 100A may also include oite or MOM

additional fields 405E¨x for including additional iudicia, additional data, links to tiles, notes, instructions for processing received files, aud other data. Throughout the process of filtering or matching, a structured message may be populated with this information. This structure tnessage may be stored in a directory (2640) by the indexing service provider 201 and returned to the client system 101 as part of -the package. In an embodiment, the structured message may also include information about instructing a client on how to index the docunient file.
L. EXEMPLARY EMBODIMENTS OF COMPOSITE MESSAGING
SERVICES
[00226] Figure 27 illustrates an exemplary composite message according to an embodiment of the present invention. In embodiments, it may be beneficial to create composites, such as, for example, when transmitting messages. A composite may comprise the combination of any additional data with one or more of the following:
other additional data, a document file, the array of strings (or portion thereof), a reference database record, file location, image file, thumbnail, hyperlink, graphics, audio files, video tiles, and the like. One skilled in the art will recognize other items may be included in a composite.
1002271 in embodiments, first or second computing device 101/20 I may create a composite, siich as a composite image, message, record, or file, including both the document tile and the additional composite items ftl one embodiment, a composite message may be created that includes the image of the information contained within a document file and additional data included within the body of the same message. In embodiments, the document tile may be superimposed upon the additional data, such as in instances in which the additional data is letterhead, stationery or some other background image.
In these embodiments, first or second computing system can overlay the information contained within the document file upon the additional data in such a manner that in those locations at which the information contained within the document file properly overlaps the additional data. In an embodiment, the additional data may be treated as being transparent such that the information contained within the document file will appear to be overlaid upon the additional data. Composite messages are beneficial because, depending upon the embodiment, the intended recipient can receive a message that includes both the document file and any additional data that is related to or otherwise associated with the docuinent ft le.
[00228] Consider, by way of illustration, the following example. Having identified that report 400 is a lab report for Mary Alarnb by use of one or more of the methods discussed above, additional data may include instructions indicating that a composite message should be sent to Ms. Alainb. In one embodiment, Mary may have indicated that she desires to receive copies of all reports. In an=altenta live embodiment, one or more key words or phrases from the document file may indicate that a message should be generated and sent. For example, the indication in a "Notes" field that the results of the lab testing yielded a certain result, such as testing "positive," may trigger a message being sent to Mary.
In an embodiment, additional data may be conditionally associated with data associated with the document file. For example, identification within the document file of testing a positive for gram-negaiive bacteria may be associated with a selected text, such as, indicating that a follow-up appointment should be scheduled. In an embodiment, first or second computing system 101/201 may interface with one or more programs, such as a calendaring system to suggest or schedule appointments, or to initiate an action.
[002291 Alternatively or additionally, first or second computing device may create a composite message that includes an image of the received data. In an embodiment, the received data to be incorporated into a message may be embedded as an image, such as a portable document format (TO, tagged image file format (.tif), or the like, into the hypertext Ýnark-up language (litml) of the email message or contained within an HL-7 message. For most current entail clients, such as Lotus Notes or Outlook users, the image of the data will appear in the body of the entail tnessage itself. However, some devices may utilize older or different email clients, such as older versions of Lotus Notes, for example, that will place the image of the data in an attachment to the entail message. In some embodiments, the entail client will determine whether the image will appear in the body of the email tnessage or as an attachment. In alternative embodiments, where the first or second computing system 101/201 has information about the characteristics of the email client regarding images, the system may decide whether to send the message with the image appearing in the body of the message or as an attachment. In an embodiment. while the first or second computing device may transmit the email message directly to a computer or other computing device having an email client of the intended recipient, the first or second computing systetn may transmit the email message to an email server (not shown) for subsequent delivery to the intended recipient. In any event, the recipient can receive an entail message and view the data. It should be noted Mat clata, such as maps, diagrams, drawings, reports, documents, and various language characters, may also be readily transmitted.

100230) Alternatively or additionally, the first or second computing system 101/201 may create a message that includes a file location reference or link to thc document file. Consider, by way of illustration, the message 2700 depicted in FIG. 27. Depicted in FIG.
27 is an email message which may be generated by first or second computing system 101/201.
Included within email message 2700 is a link 2705 to the report 400. In an embodiment, security and encryption may be employed to restrict access to the linked file 400.
1002311 One skilled in the art will recognize that other forms of communication may likewise be employed. In an embodiment, an instant message containing the received data or a link to :he data may be transmitted from the first or second computing system to onc or more devices or networks having instant messaging capability.
100232j hi yet another embodiment, the document file may be included in the body of a facsimile. In this regard, a facsimile coversheet, generally identifying the intended recipient as well as the phone and facsimile numbers of the intended recipient and name and phone number of the user transmitting the facsimile, may be defined, either by the user at the time of transmitting the information or at some prior time. lit either instance, a user may define eustorn facsimile coversheet tailored to the user or an intended recipient.
(00233) ln an embodiment, first or second computing system 101/201 may store one or more of the following: the document tile, the array of strings (or portions thereof), the additional data, a composite, or a message. In onc embodiment, first or second computing system may place the indexed information, or a link thereto, in a predefined location for import into an electronic record or other database software application. A
user of the electronic record or other database software application may access and view the document file, thc additional data, the composite, and/or die message while using the application. In an alternative embodiment, first or second computing device may transmit the indexed data, or a link thereto, to a second device, for storing in a predefined location lbr import into an electronic record or other database software application.
M. EXEMPLARY EMBODIMENTS OF IvIANLIAL INDEXING
10023,11 Figure 28 depicts an exemplary method for presenting files for manual review or indexing according to an embodiment of the present invention. In an embodiment, the manual indexer may be part of a database interface system at the indexing recipient system 101 or the indexing service provider system 201. Incomplete files or files that have been marked as "no match found" may be sent (2805) to a manual indexing utility or service. In an embodiment, the manual indexer may also load (2'810) and display suggested matches. A user =
may review (2815) the document file and indicate to which record the document file should he matched. In an embodiment, the manual indexer may additionally include (2820) features such as a search features to search the array of strings for the document file and/or a reference database to obtain additional values Or replace suggested match values.
Following the manual review, the document file may be indexed (2825). In an embodiment, the index information may be processed in like manner as other index document files including being sent to the packaging services, an embodiment of which is depicted in Figure 25.
[002351 In an embodiment, a manual indexer user interface may comprise a viewing section or sections for displaying match information. In an embodiment, the mmitial indexer user interface may comprise a viewing section for optionally viewing items such as, a document file, its associated array of strings or set of strings, the document file's associated structured message with the data that has been obtain through matching operations, and log inibmiation, which might contain system or processing information and additional information collected through the indexing/matching process. The manual indexer user interface may comprise a section to display and allow a user to review matched data fields for a document file and unmatched data fields, including any prcloaded suggestions for the unmatched data fields obtained through the matching operations. In an embodiment, the user may review and approve or comet the matched data fields. Matching information, including possible matching candidates, rankings, structured data file, string matches, and any other of the data available for viewing as discussed above, may be displayed to the user. The user may provide other information that has not been identified in the document file. For example, the user may select and enter the document type or provider information. This intbrmation may be added to items such as phrases lists and the like and may be associated with matching profiles thereby allowing the system to adaptively improve for subsequent matching operations. The manual indexer user interface may also comprise additional features and inputs that may be specific to an indexing client or to a database system.
[00236) ln an embodiment, the manual indexer may utilize user-derived associations and feedback to modify the indexing processes in an adaptive method by providing document indicia suggestions and receiving user-responsive feedback to modify thc matching/filtering elements, includitg but not limited to threshold match values, dictionary/phrase lists, match associations. and the like. In an embodiment, the manual indexer may provide the ability to add information to dictionary/phrase lists, such as, exclusion list, document type lists, provider lists, client contact lists, and the like. In an embodiment, the manual indexer may associate information, such as document type or provider, to a word frequency value, a key word, or key phrase thereby enabling the indexer to suggest potential matches or values, and/or to improve subsequent matching operations.
N. EXEMPLARY EMBODIMENTS OF FILE INDEXING AND
PROCESSING SERVICES
[00237] Figure 29 depicts an exemplaiy method for receiving information related to processed document files from an indexing service provider according to an embodiment of the present invention. In an embodiment, indexing recipient system 101 receives (2905) an encrypted package tile and decrypts the package (2910) from the indexing service providet 201. In an embodiment, indexing recipient system 101 decompresses (2915) the rename files and structured message files and moves them to a pending folder. In an embodiment, the rename files indicate (2920) which document tiles in the pending folder are to be moved and to where they should be moved. In an embodiment, the indexing recipient system may verily (2925) that the document file is still in its pending folder. If the document file is no longer in its pending folder, the indexing recipient sysien nay notify (2930) the indexing service provider that the document file no longer exists, and the document filc may be restored (2935) from the indexing service provider. If the document file is still in the pending folder, the document tile is renamed (2940) according to the definition in the renamed file and may be move to a specified server location. ln an embodiment, the structured message file may also be moved (2945) to a specified location. In one embodiment, one or more of the files may be inoved to a database or database inbox. In an embodiment, the indexing recipient system may notify the indexing service provider that the files have been delivered and processed (2950). In the depicted embodiment, the indexing recipient system may repeat the process for all pending document files for that package. If there arc no remaining files to be processed (2950), the indexing system recipient system may delete the confirmation file. In an embodiment, a confirmation tile may be a zipped and encrypted package that contains struutured messages, such as messages, and a rename file, which may be an XML file, that explains how the miginal files on the client machine are to be renamed and where they arc to be stored. In an embodiment, the document files, the structured message, or both may be stored so as to be accessed by a database client, such as Centricity EM .
1002381 lii an embodiment, the array of strings of the document file, which may be construed to be a part of the document tile, may also be indexed with the document lile, such as in the case when the document tile is an image or audio file and the associated data was created by converting the document file. In an embodiment, all document files obtained by the first computing system 101 may be stored into a common folder or location.
The files may be stored locally, such as on storage device 304 on first computing system 101, or on a remote device or network, such as storage device 140, network 145, and/or remote network 150. In an einbodiinent, the document files inay be stored within database system 110. hi one embodiment, the document files may be indexed according to a unique identifier, which identifier may be one or more of the strings front the army or strings or one or more data field elements for the record matched to the document file.
100239] Figure 30 gtaphically illustrates an exemplary file structure for indexing it plurality of files according to an embodiment of the present invention. As illustrated in FIG. 30, after a document file has been successfully identified, it may be moved from an unindexed folder 3005 to a folder associated with that individual or organization. For example, the received file 400 may be stored in a folder associated with Mary Alant. In an embodiment, the folders 710x rnay be uniquely identified by an account number, patient name, or ihe [00240.1 In an alternative embodiment, instead of or in addition to indexing the received tiles by storing them into specific folders or locations, the received file may be indexed by using of a pointer or link to the received data file. In an embodiment, a database indexes the tile pointer or tile link. In one embodiment, the database may be part of the reference database 100. For example, one of the fields of the reference database 100 may include file location information. "fhtis, the received tile may be indexed by associating or linking its storage location to the matched record.
[002411 In an embodiment, first or second computing system 1()1/201 rnay place the document file and or associated data (such as structured message tile, array of strings, ete.) in a predefined location for import into an electronic record or other database software application. Accordingly, a user or the electronic record or other database software application can access and view the data using the electronic record or other database software application. One skilled in the art of information management will recognize other ways for indexing and storing the tiles, which are within the scope of the present invention.
O. EXEMPLARY EMBODIMENTS OF ACCOUNT SERVICES AND
BILLING
[00242] Embodiment of the present invention may include archiving and retrieval services for an indexing recipient. As noted above, embodiment of the present invention may include archiving services such as for cases of missing files during the indexing process (see, e.g., Fig. 29, steps 2930-2935). Indexing service provider may also maintain copies of all tiles related to the indexing for an indexing recipient, inchide but not litnited to, document files, array of strings, reference databases, structured messages, rename files, additional data, conmosite messages, and may also maintained file space for other files for an indexing recipient. One of more of these files may be in the event of lost or corrupted data in the indexing recipient system.
P. EXEMPLARY EMBODIMENTS OF ACCOUNT SERVICES AND
BILLING
[00243] As noted previously, embodiment of the present invention may include billing services for billing indexing recipients and third parties. Billing services may include billing for indexing services, archiving services, inessav.ing services, account services, observational services, error correction services, other services described herein, and other costs and fees.
bmbodiments of the present invention may also include billing associated with financial events and/or marketing events.
1. ASSOcIATE WITH A FINANCIAL EN.'ENT
100244) In embodiments, one or more of the steps perfumed according to the present invention may be associated with an individual and/or organization for the purposes of billing or financial event or events. The billing or financial event may be for the user or operator of first computing system 101, second computing system 201, or may be pei formed on behalf of another individual or organization. Consider, for example, the document tile 400 from XY7.
Laboratories, a medical diagnostics laboratory, and assume that the document file 400 has been successfully matched to a patient, Mary Alamb. In one embodiment; the matching of the document Ille 400 to a record (in this case a patient record), may trigger a message that an invoice needs to be sent to Ms. Alamb's insurance provider or a message that XYZ
Laboratories needs to be paid for services performed. In one embodiment, an invoice may be automatically sent to Ms. Alamb's insurance carrier for the services performed. Additionally, in an embodiment, specific billing codes may provided to the insurance company. In an embodiment, each instance a file is received and indexed or processed according to the present invention, a user of the indexing or processing services may be billed for such usage.
One skilled in the art will recognize that other configurations may beneficially etnploy or be linked to financial events and are within the scope of the present invention.

2. ASSOCIATE WITH A NI-ARM:TING OR ADVERTISING EVENT
(00245] In embodiments, one or more of the steps performed according to the present invention may be associated with an individual and/or organization for the purpose of marketing or advertising. fn an embodiment, the correlation between content data in the document file and an individual or organization may be used for marketing and advertising 'imposes. Consider, for example, document files containing information related to goods or services utilized by an individual or organization. In an embodiment, that infonnation may be used to provide advertising or marketing services to that individual or organization or may be provided to advertising or marketing organizations. In another embodiment, aggregate information may be provided to itdvertising or marketing organizations. One skilled in the art will recognize that other configurations may beneficially employ or be linked to advertising or marketing events and are within the scope of the present invention.
[00246] While the invention is susceptible to various modifications and alternative forms, specific examples thereof have been shown in the drawings and are herein described in detail.
ft should be understood, however, that the invention is not to be limited to tlie paiticular form disclosed, but to the contrary, the invention is ro cover all modifications, equivalents, and alternatives falling within the scope of the appended claims.
[00247] In addition, embodiments of the present invention further relate to computer products with a computer-readable medium that have computer code thereon for performing various computer-implemented operations. The media and computer code may be those specially designed and constructed for the purposes of the present invention, or they may be of thc kind well known and available to those having skill in the relevant ans. Examples of computer-readable media include, but are not limited to: magnetic media such as hard disks, floppy disks, and magnetic tape; optical media such as CD-ROMs and holographic devices;
magneto-optical media; and hardware devices that are specially configured to store or to store and execute program code, such as application-specific integrated circuits (ASICs), programmable logic devices (PLDs) and ROM and RAM devices. Exaniples of computer code include machine code, such as produced by a compiler, and files containing higher level code that are executed by a computer using an interpreter.

Claims (20)

CLAIMS:
1. A processor-implemented method for identifying a document file comprising:
responsive to locating a recognized set of characters in a document file comprising a plurality of characters, using the recognized set of characters an anchor point and performing the steps comprising:
selecting an examination set of characters from the document file, the examination set being selected based upon proximity to the anchor point;
and searching the examination set for one or more indicators to assist in uniquely identifying the document file.
2. The processor-implemented method of claim 21, further comprising the step of:
responsive to not finding any indictors in the examination set, changing the proximity and selecting an examination set of characters using the changed proximity.
3. The processor-implemented method of claim 22, further comprising the step of:
iterating the step in claim 22, until a stop condition is reached.
4. The processor-implemented method of claim 23, wherein the stop condition comprises:
finding one or more indicators that uniquely identifying the document file;
a set percentage of the characters of the document file have been included in the examination set; and a number of iterations has been reached.
5. The processor-implemented method of claim 21, wherein the proximity comprises selecting characters that are symmetrically positioned about the anchor point.
6. The processor-implemented method of claim 21, wherein the examination set comprises characters that are asymmetrically positioned about the anchor point.
7. The processor-implemented method of claim 21, further comprising the step of:
excluding from the examination set a set of characters associated with the anchor point.
searching a document comprising a plurality of characters to identify an anchor point comprising a set of characters; and responsive to identifying an anchor point:
assigning proximity weighting to at least some of the characters in the document based upon their position relative to the anchor point;
selecting an examination set of characters from the document using the proximity weightings; and searching the examination set for one or more indicators to assist in uniquely identifying the document.
9. The processor-implemented method of claim 28, further comprising the steps of:
receiving at least a portion of a reference database from a client, the reference database comprising a plurality of data elements; and comparing at least some of the data elements from thc reference database against at least part of the document to identify one or morc anchor points.
10. The processor-implemented method of claim 29, further comprising the steps of:
comparing the examination set against a comparison reference database obtained from a reference database; and responsive to at least a portion of the examination set exceeding a threshold match with at least a portion of a record in the comparison reference database, generating a structured message that associates the document with the record.
11. The processor-implemented method of claim 30, further comprising the step of:
using a set of characters selected from the document to filter the reference database to obtain the comparison reference database.
12. The processor-implemented method of claim 31, wherein the set of characters comprises the set of characters that form the anchor point.
13. A system comprising:
one or more processors; and a non-transitory computer-readable medium or media storing thereon one or more sequences of instructions which, when executed by at least one of the one or more processors, causes steps to be performed comprising:
searching a document comprising a plurality of characters to identify an anchor point comprising a set of characters; and responsive to identifying an anchor point:
assigning proximity weighting to at least some of the characters in the document based upon their position relative to the anchor point;
selecting an examination set of characters from the document using the proximity weightings; and searching the examination set for one or more indicators to assist in uniquely identifying the document.
14. The system of claim 33, wherein the steps to be performed further comprise:
responsive to receiving at least a portion of a reference database comprising a plurality of data elements from a client, comparing at least some of the data elements from the reference database against at least part of the document to identify one or more anchor points.
15. The system of claim 34, wherein the steps to be performed further comprise:
comparing the examination set against a comparison reference database obtained from a reference database; and responsive to at least a portion of the examination set exceeding a threshold match with at least a portion of a record in the comparison reference database, generating a structured message that associates the document with the record.
16. The system of claim 35, wherein the steps to be performed further comprise:
using a set of characters selected from the document to filter the reference database to obtain the comparison reference database.
17. The system of claim 36, wherein the set of characters comprises the set of characters that form the anchor point.
18. The system of claim 33, wherein the steps to be performed further comprise:
responsive to not finding any indictors in the examination set, changing at least some of the proximity weightings and selecting an examination set of characters using the changed proximity weightings.
19. The system of claim 38, wherein the steps to be performed further comprise:
iterating the step in claim 38 until a stop condition is reached.
20. The system of claim 33, wherein the proximity weighting comprises selecting characters that are symmetrically positioned about the anchor point.
CA2975694A 2005-07-15 2006-07-14 Systems and methods for data indexing and processing Active CA2975694C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CA3074633A CA3074633C (en) 2005-07-15 2006-07-14 Systems and methods for data indexing and processing

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US69989305P 2005-07-15 2005-07-15
US60/699,893 2005-07-15
CA2928051A CA2928051C (en) 2005-07-15 2006-07-14 Systems and methods for data indexing and processing

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CA2928051A Division CA2928051C (en) 2005-07-15 2006-07-14 Systems and methods for data indexing and processing

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CA3074633A Division CA3074633C (en) 2005-07-15 2006-07-14 Systems and methods for data indexing and processing

Publications (2)

Publication Number Publication Date
CA2975694A1 true CA2975694A1 (en) 2007-01-25
CA2975694C CA2975694C (en) 2020-12-08

Family

ID=37669452

Family Applications (4)

Application Number Title Priority Date Filing Date
CA2928051A Active CA2928051C (en) 2005-07-15 2006-07-14 Systems and methods for data indexing and processing
CA2975694A Active CA2975694C (en) 2005-07-15 2006-07-14 Systems and methods for data indexing and processing
CA3074633A Active CA3074633C (en) 2005-07-15 2006-07-14 Systems and methods for data indexing and processing
CA2657212A Active CA2657212C (en) 2005-07-15 2006-07-14 Systems and methods for data indexing and processing

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CA2928051A Active CA2928051C (en) 2005-07-15 2006-07-14 Systems and methods for data indexing and processing

Family Applications After (2)

Application Number Title Priority Date Filing Date
CA3074633A Active CA3074633C (en) 2005-07-15 2006-07-14 Systems and methods for data indexing and processing
CA2657212A Active CA2657212C (en) 2005-07-15 2006-07-14 Systems and methods for data indexing and processing

Country Status (3)

Country Link
US (8) US8112441B2 (en)
CA (4) CA2928051C (en)
WO (1) WO2007011841A2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210295033A1 (en) * 2020-03-18 2021-09-23 Fujifilm Business Innovation Corp. Information processing apparatus and non-transitory computer readable medium

Families Citing this family (104)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7222297B2 (en) 2002-01-14 2007-05-22 International Business Machines Corporation System and method for using XML to normalize documents
WO2007011841A2 (en) * 2005-07-15 2007-01-25 Indxit Systems, Inc. Systems and methods for data indexing and processing
US7783615B1 (en) * 2005-09-30 2010-08-24 Emc Corporation Apparatus and method for building a file system index
US8954426B2 (en) * 2006-02-17 2015-02-10 Google Inc. Query language
US20070185870A1 (en) 2006-01-27 2007-08-09 Hogue Andrew W Data object visualization using graphs
US8954412B1 (en) 2006-09-28 2015-02-10 Google Inc. Corroborating facts in electronic documents
JP2008268995A (en) * 2007-04-16 2008-11-06 Sony Corp Dictionary data generation device, character input device, dictionary data generation method and character input method
US7720883B2 (en) * 2007-06-27 2010-05-18 Microsoft Corporation Key profile computation and data pattern profile computation
US8392816B2 (en) * 2007-12-03 2013-03-05 Microsoft Corporation Page classifier engine
US8250469B2 (en) * 2007-12-03 2012-08-21 Microsoft Corporation Document layout extraction
US20090144277A1 (en) * 2007-12-03 2009-06-04 Microsoft Corporation Electronic table of contents entry classification and labeling scheme
US20090144334A1 (en) * 2007-12-04 2009-06-04 Mcafee Randolph Preston System and method for contact management
US8774374B2 (en) * 2007-12-13 2014-07-08 Verizon Patent And Licensing Inc. Managing visual voicemail from multiple devices
JP5123032B2 (en) * 2008-04-10 2013-01-16 株式会社リコー Information distribution apparatus, information distribution method, information distribution program, and recording medium
US7860735B2 (en) * 2008-04-22 2010-12-28 Xerox Corporation Online life insurance document management service
US8266168B2 (en) * 2008-04-24 2012-09-11 Lexisnexis Risk & Information Analytics Group Inc. Database systems and methods for linking records and entity representations with sufficiently high confidence
US20100088338A1 (en) * 2008-10-03 2010-04-08 Pavoni Jr Donald Gordon Red flag identification verification system and method
US8060573B2 (en) * 2008-11-20 2011-11-15 MeetMyKind, LLC Matching social network users
US20110087687A1 (en) * 2009-10-14 2011-04-14 International Business Machines Corporation Position sensitive type-ahead matching for resource navigation
US9049117B1 (en) * 2009-10-21 2015-06-02 Narus, Inc. System and method for collecting and processing information of an internet user via IP-web correlation
CN102054171A (en) * 2009-10-30 2011-05-11 株式会社东芝 Device and method for identifying types of document files
WO2011086637A1 (en) * 2010-01-18 2011-07-21 日本電気株式会社 Requirements extraction system, requirements extraction method and requirements extraction program
CA2791292A1 (en) 2010-02-26 2011-09-01 Mmodal Ip Llc Clinical data reconciliation as part of a report generation solution
US8522199B2 (en) * 2010-02-26 2013-08-27 Mcafee, Inc. System, method, and computer program product for applying a regular expression to content based on required strings of the regular expression
US8468119B2 (en) * 2010-07-14 2013-06-18 Business Objects Software Ltd. Matching data from disparate sources
US8429556B2 (en) * 2010-07-20 2013-04-23 Apple Inc. Chunking data records
US8463673B2 (en) * 2010-09-23 2013-06-11 Mmodal Ip Llc User feedback in semi-automatic question answering systems
CA2811942A1 (en) * 2010-09-23 2012-03-29 Mmodal Ip Llc User feedback in semi-automatic question answering systems
US9160693B2 (en) 2010-09-27 2015-10-13 Blackberry Limited Method, apparatus and system for accessing applications and content across a plurality of computers
US20120079043A1 (en) * 2010-09-27 2012-03-29 Research In Motion Limited Method, apparatus and system for accessing an application across a plurality of computers
US9135512B2 (en) * 2011-04-30 2015-09-15 Hewlett-Packard Development Company, L.P. Fiducial marks on scanned image of document
US20120303728A1 (en) * 2011-05-26 2012-11-29 Fitzsimmons Andrew P Report generation system with reliable transfer
US20130046560A1 (en) * 2011-08-19 2013-02-21 Garry Jean Theus System and method for deterministic and probabilistic match with delayed confirmation
US20160358142A1 (en) * 2011-09-02 2016-12-08 Humana Inc. Financial intermediary for electronic health claims processing
US8930492B2 (en) 2011-10-17 2015-01-06 Blackberry Limited Method and electronic device for content sharing
US9015809B2 (en) 2012-02-20 2015-04-21 Blackberry Limited Establishing connectivity between an enterprise security perimeter of a device and an enterprise
US9396540B1 (en) * 2012-03-28 2016-07-19 Emc Corporation Method and system for identifying anchors for fields using optical character recognition data
US8892579B2 (en) * 2012-04-26 2014-11-18 Anu Pareek Method and system of data extraction from a portable document format file
US11487707B2 (en) 2012-04-30 2022-11-01 International Business Machines Corporation Efficient file path indexing for a content repository
US8903929B2 (en) 2012-07-05 2014-12-02 Microsoft Corporation Forgotten attachment detection
WO2014028529A2 (en) 2012-08-13 2014-02-20 Mmodal Ip Llc Maintaining a discrete data representation that corresponds to information contained in free-form text
EP2901303A4 (en) * 2012-09-25 2016-06-01 Moneydesktop Inc Aggregation source routing
US8914356B2 (en) 2012-11-01 2014-12-16 International Business Machines Corporation Optimized queries for file path indexing in a content repository
US9323761B2 (en) 2012-12-07 2016-04-26 International Business Machines Corporation Optimized query ordering for file path indexing in a content repository
US9047343B2 (en) * 2013-01-15 2015-06-02 International Business Machines Corporation Find regular expression instruction on substring of larger string
US10346551B2 (en) * 2013-01-24 2019-07-09 New York University Systems, methods and computer-accessible mediums for utilizing pattern matching in stringomes
US20140324908A1 (en) * 2013-04-29 2014-10-30 General Electric Company Method and system for increasing accuracy and completeness of acquired data
US10803102B1 (en) * 2013-04-30 2020-10-13 Walmart Apollo, Llc Methods and systems for comparing customer records
US9292510B2 (en) * 2013-06-18 2016-03-22 Blink Forward, L.L.C. Systems and methods for indexing and linking electronic documents
US20160259819A1 (en) * 2013-06-18 2016-09-08 Blink Forward, L.L.C. Error identification, indexing and linking construction documents
US9959584B1 (en) * 2013-11-08 2018-05-01 Document Imaging Systems Corp. Automated system and method for electronic health record indexing
US9355152B2 (en) 2013-12-02 2016-05-31 Qbase, LLC Non-exclusionary search within in-memory databases
US9922032B2 (en) * 2013-12-02 2018-03-20 Qbase, LLC Featured co-occurrence knowledge base from a corpus of documents
US9619571B2 (en) * 2013-12-02 2017-04-11 Qbase, LLC Method for searching related entities through entity co-occurrence
US9430464B2 (en) 2013-12-20 2016-08-30 International Business Machines Corporation Identifying unchecked criteria in unstructured and semi-structured data
US9304657B2 (en) 2013-12-31 2016-04-05 Abbyy Development Llc Audio tagging
CN103795735B (en) * 2014-03-07 2017-11-07 深圳市迈科龙电子有限公司 Safety means, server and server info safety implementation method
US11169773B2 (en) 2014-04-01 2021-11-09 TekWear, LLC Systems, methods, and apparatuses for agricultural data collection, analysis, and management via a mobile device
WO2018022301A1 (en) * 2016-07-12 2018-02-01 TekWear, LLC Systems, methods, and apparatuses for agricultural data collection, analysis, and management via a mobile device
WO2016055085A1 (en) * 2014-10-06 2016-04-14 Swiss Reinsurance Company Ltd. System and method for pattern-recognition based monitoring and controlled processing of data objects based on conformity measurements
US9921731B2 (en) 2014-11-03 2018-03-20 Cerner Innovation, Inc. Duplication detection in clinical documentation
US10042837B2 (en) 2014-12-02 2018-08-07 International Business Machines Corporation NLP processing of real-world forms via element-level template correlation
US9928284B2 (en) * 2014-12-31 2018-03-27 Zephyr Health, Inc. File recognition system and method
US10950329B2 (en) 2015-03-13 2021-03-16 Mmodal Ip Llc Hybrid human and computer-assisted coding workflow
CN105045928A (en) * 2015-08-27 2015-11-11 北京金山安全软件有限公司 To-be-cleaned data display method and device and electronic equipment
CN106815268A (en) * 2015-12-01 2017-06-09 中广核工程有限公司 The structuring processing method and system of magnanimity destructuring e-file
EP3449452B1 (en) 2016-04-29 2022-06-29 Nchain Holdings Limited Implementing logic gate functionality using a blockchain
GB201607477D0 (en) * 2016-04-29 2016-06-15 Eitc Holdings Ltd A method and system for controlling the performance of a contract using a distributed hash table and a peer to peer distributed ledger
US11188864B2 (en) * 2016-06-27 2021-11-30 International Business Machines Corporation Calculating an expertise score from aggregated employee data
AU2017320475B2 (en) 2016-09-02 2022-02-10 FutureVault Inc. Automated document filing and processing methods and systems
US10438083B1 (en) * 2016-09-27 2019-10-08 Matrox Electronic Systems Ltd. Method and system for processing candidate strings generated by an optical character recognition process
CN114528369A (en) * 2016-12-21 2022-05-24 伊姆西Ip控股有限责任公司 Method and device for creating index
WO2018136417A1 (en) 2017-01-17 2018-07-26 Mmodal Ip Llc Methods and systems for manifestation and transmission of follow-up notifications
US10204082B2 (en) 2017-03-31 2019-02-12 Dropbox, Inc. Generating digital document content from a digital image
US11899632B1 (en) 2017-04-28 2024-02-13 Verato, Inc. System and method for secure linking and matching of data elements across independent data systems
US11907187B1 (en) * 2017-04-28 2024-02-20 Verato, Inc. Methods and systems for facilitating data stewardship tasks
US10740365B2 (en) * 2017-06-14 2020-08-11 International Business Machines Corporation Gap identification in corpora
US10872105B2 (en) * 2017-10-11 2020-12-22 Adobe Inc. Method to identify and extract fragments among large collections of digital documents using repeatability and semantic information
US11475209B2 (en) * 2017-10-17 2022-10-18 Handycontract Llc Device, system, and method for extracting named entities from sectioned documents
WO2019089888A1 (en) 2017-11-01 2019-05-09 Walmart Apollo, Llc Systems and methods for dynamic hierarchical metadata storage and retrieval
CN107886948A (en) * 2017-11-16 2018-04-06 百度在线网络技术(北京)有限公司 Voice interactive method and device, terminal, server and readable storage medium storing program for executing
US11282596B2 (en) 2017-11-22 2022-03-22 3M Innovative Properties Company Automated code feedback system
CN108363729B (en) * 2018-01-12 2021-01-26 中国平安人寿保险股份有限公司 Character string comparison method and device, terminal equipment and storage medium
US10482540B2 (en) * 2018-02-02 2019-11-19 Accenture Global Solutions Limited Data translation
JP2019197321A (en) * 2018-05-08 2019-11-14 京セラドキュメントソリューションズ株式会社 Image processing apparatus and image forming apparatus
US11269934B2 (en) 2018-06-13 2022-03-08 Oracle International Corporation Regular expression generation using combinatoric longest common subsequence algorithms
US11941018B2 (en) 2018-06-13 2024-03-26 Oracle International Corporation Regular expression generation for negative example using context
US11354305B2 (en) 2018-06-13 2022-06-07 Oracle International Corporation User interface commands for regular expression generation
US11580166B2 (en) 2018-06-13 2023-02-14 Oracle International Corporation Regular expression generation using span highlighting alignment
US10997192B2 (en) * 2019-01-31 2021-05-04 Splunk Inc. Data source correlation user interface
US11631266B2 (en) 2019-04-02 2023-04-18 Wilco Source Inc Automated document intake and processing system
US10754638B1 (en) 2019-04-29 2020-08-25 Splunk Inc. Enabling agile functionality updates using multi-component application
US11151125B1 (en) 2019-10-18 2021-10-19 Splunk Inc. Efficient updating of journey instances detected within unstructured event data
US11170045B2 (en) 2019-12-06 2021-11-09 Adp, Inc. Method and system for interactive search indexing
CN110928725B (en) * 2019-12-17 2021-06-25 西安电子科技大学 Matrix representation-based multiple permutation code construction and decoding method in flash memory
US11604825B2 (en) * 2020-07-13 2023-03-14 Nice Ltd. Artificial intelligence model for predicting playback of media data
US11741131B1 (en) 2020-07-31 2023-08-29 Splunk Inc. Fragmented upload and re-stitching of journey instances detected within event data
TWI765422B (en) * 2020-11-20 2022-05-21 全友電腦股份有限公司 Data capturing method, template generating method and non-transitory computer readable storage medium
US11616744B2 (en) 2021-07-29 2023-03-28 Intuit Inc. Context-dependent message extraction and transformation
US20230035551A1 (en) * 2021-07-29 2023-02-02 Intuit Inc. Multiple source audit log generation
US11809390B2 (en) 2021-07-29 2023-11-07 Intuit Inc. Context-dependent event cleaning and publication
US11734318B1 (en) 2021-11-08 2023-08-22 Servicenow, Inc. Superindexing systems and methods
US20230185954A1 (en) * 2021-12-15 2023-06-15 Bank Of America Corporation Transmission of Sensitive Data in a Communication Network
CN117573704B (en) * 2024-01-17 2024-04-12 上海合见工业软件集团有限公司 Method, device, equipment and medium for indexing composite document of EDA software

Family Cites Families (80)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5450598A (en) * 1985-12-27 1995-09-12 Xerox Corporation Finite state machine data storage where data transition is accomplished without the use of pointers
US5488719A (en) * 1991-12-30 1996-01-30 Xerox Corporation System for categorizing character strings using acceptability and category information contained in ending substrings
US5511159A (en) * 1992-03-18 1996-04-23 At&T Corp. Method of identifying parameterized matches in a string
US5606690A (en) * 1993-08-20 1997-02-25 Canon Inc. Non-literal textual search using fuzzy finite non-deterministic automata
JP2683870B2 (en) * 1994-05-23 1997-12-03 日本アイ・ビー・エム株式会社 Character string search system and method
US5752051A (en) * 1994-07-19 1998-05-12 The United States Of America As Represented By The Secretary Of Nsa Language-independent method of generating index terms
US5845255A (en) * 1994-10-28 1998-12-01 Advanced Health Med-E-Systems Corporation Prescription management system
US5757959A (en) * 1995-04-05 1998-05-26 Panasonic Technologies, Inc. System and method for handwriting matching using edit distance computation in a systolic array processor
US5619199A (en) * 1995-05-04 1997-04-08 International Business Machines Corporation Order preserving run length encoding with compression codeword extraction for comparisons
US5778361A (en) * 1995-09-29 1998-07-07 Microsoft Corporation Method and system for fast indexing and searching of text in compound-word languages
JPH09198398A (en) * 1996-01-16 1997-07-31 Fujitsu Ltd Pattern retrieving device
US5884033A (en) * 1996-05-15 1999-03-16 Spyglass, Inc. Internet filtering system for filtering data transferred over the internet utilizing immediate and deferred filtering actions
US6104834A (en) * 1996-08-01 2000-08-15 Ricoh Company Limited Matching CCITT compressed document images
US6032121A (en) * 1997-05-15 2000-02-29 International Business Machines Corporation Method for proactive planning
JP3143079B2 (en) * 1997-05-30 2001-03-07 松下電器産業株式会社 Dictionary index creation device and document search device
US5956721A (en) * 1997-09-19 1999-09-21 Microsoft Corporation Method and computer program product for classifying network communication packets processed in a network stack
US6092065A (en) * 1998-02-13 2000-07-18 International Business Machines Corporation Method and apparatus for discovery, clustering and classification of patterns in 1-dimensional event streams
JP3692764B2 (en) * 1998-02-25 2005-09-07 株式会社日立製作所 Structured document registration method, search method, and portable medium used therefor
US6047283A (en) * 1998-02-26 2000-04-04 Sap Aktiengesellschaft Fast string searching and indexing using a search tree having a plurality of linked nodes
JP3696731B2 (en) * 1998-04-30 2005-09-21 株式会社日立製作所 Structured document search method and apparatus, and computer-readable recording medium recording a structured document search program
US6178417B1 (en) * 1998-06-29 2001-01-23 Xerox Corporation Method and means of matching documents based on text genre
US6915254B1 (en) 1998-07-30 2005-07-05 A-Life Medical, Inc. Automatically assigning medical codes using natural language processing
US6240409B1 (en) * 1998-07-31 2001-05-29 The Regents Of The University Of California Method and apparatus for detecting and summarizing document similarity within large document sets
US6635088B1 (en) * 1998-11-20 2003-10-21 International Business Machines Corporation Structured document and document type definition compression
US6295529B1 (en) * 1998-12-24 2001-09-25 Microsoft Corporation Method and apparatus for indentifying clauses having predetermined characteristics indicative of usefulness in determining relationships between different texts
JP3696745B2 (en) * 1999-02-09 2005-09-21 株式会社日立製作所 Document search method, document search system, and computer-readable recording medium storing document search program
US6678681B1 (en) * 1999-03-10 2004-01-13 Google Inc. Information extraction from a database
US6782505B1 (en) * 1999-04-19 2004-08-24 Daniel P. Miranker Method and system for generating structured data from semi-structured data sources
EP1049030A1 (en) * 1999-04-28 2000-11-02 SER Systeme AG Produkte und Anwendungen der Datenverarbeitung Classification method and apparatus
US6438543B1 (en) * 1999-06-17 2002-08-20 International Business Machines Corporation System and method for cross-document coreference
US6397224B1 (en) * 1999-12-10 2002-05-28 Gordon W. Romney Anonymously linking a plurality of data records
JP2001202466A (en) * 2000-01-18 2001-07-27 Hitachi Ltd Slip type discriminator
WO2001077898A1 (en) * 2000-04-04 2001-10-18 Globalscape, Inc. Method and system for conducting a full text search on a client system by a server system
US6618724B1 (en) * 2000-04-17 2003-09-09 Sun Microsystems, Inc. Human-natural string compare for filesystems
US7251665B1 (en) * 2000-05-03 2007-07-31 Yahoo! Inc. Determining a known character string equivalent to a query string
AU2001264928A1 (en) 2000-05-25 2001-12-03 Kanisa Inc. System and method for automatically classifying text
US6718325B1 (en) * 2000-06-14 2004-04-06 Sun Microsystems, Inc. Approximate string matcher for delimited strings
US6757675B2 (en) * 2000-07-24 2004-06-29 The Regents Of The University Of California Method and apparatus for indexing document content and content comparison with World Wide Web search service
US7328211B2 (en) * 2000-09-21 2008-02-05 Jpmorgan Chase Bank, N.A. System and methods for improved linguistic pattern matching
AUPR082400A0 (en) * 2000-10-17 2000-11-09 Telstra R & D Management Pty Ltd An information retrieval system
US20020103811A1 (en) 2001-01-26 2002-08-01 Fankhauser Karl Erich Method and apparatus for locating and exchanging clinical information
WO2002065286A2 (en) * 2001-02-12 2002-08-22 Lto Limited Client software enabling a client to run a network based application
US6931418B1 (en) * 2001-03-26 2005-08-16 Steven M. Barnes Method and system for partial-order analysis of multi-dimensional data
US6920448B2 (en) * 2001-05-09 2005-07-19 Agilent Technologies, Inc. Domain specific knowledge-based metasearch system and methods of using
US6687697B2 (en) * 2001-07-30 2004-02-03 Microsoft Corporation System and method for improved string matching under noisy channel conditions
US6868411B2 (en) * 2001-08-13 2005-03-15 Xerox Corporation Fuzzy text categorizer
US6980976B2 (en) * 2001-08-13 2005-12-27 Oracle International Corp. Combined database index of unstructured and structured columns
ES2375403T3 (en) * 2001-08-27 2012-02-29 BDGB Enterprise Software Sàrl A METHOD FOR THE AUTOMATIC INDEXATION OF DOCUMENTS.
US6802810B2 (en) * 2001-09-21 2004-10-12 Active Health Management Care engine
US7222297B2 (en) * 2002-01-14 2007-05-22 International Business Machines Corporation System and method for using XML to normalize documents
US7647320B2 (en) * 2002-01-18 2010-01-12 Peoplechart Corporation Patient directed system and method for managing medical information
US7257530B2 (en) * 2002-02-27 2007-08-14 Hongfeng Yin Method and system of knowledge based search engine using text mining
US6925467B2 (en) * 2002-05-13 2005-08-02 Innopath Software, Inc. Byte-level file differencing and updating algorithms
US7047235B2 (en) 2002-11-29 2006-05-16 Agency For Science, Technology And Research Method and apparatus for creating medical teaching files from image archives
US7233938B2 (en) * 2002-12-27 2007-06-19 Dictaphone Corporation Systems and methods for coding information
US7490116B2 (en) * 2003-01-23 2009-02-10 Verdasys, Inc. Identifying history of modification within large collections of unstructured data
US7627552B2 (en) * 2003-03-27 2009-12-01 Microsoft Corporation System and method for filtering and organizing items based on common elements
US7093231B2 (en) * 2003-05-06 2006-08-15 David H. Alderson Grammer for regular expressions
US7296011B2 (en) * 2003-06-20 2007-11-13 Microsoft Corporation Efficient fuzzy match for evaluating data records
EP1494116A1 (en) * 2003-07-01 2005-01-05 Amadeus S.A.S. Method and system for graphical interfacing
AU2003279999A1 (en) * 2003-10-21 2005-06-08 Nielsen Media Research, Inc. Methods and apparatus for fusing databases
US7698383B2 (en) * 2004-02-27 2010-04-13 Research In Motion Limited System and method for building component applications using metadata defined mapping between message and data domains
US7627567B2 (en) * 2004-04-14 2009-12-01 Microsoft Corporation Segmentation of strings into structured records
US7742997B1 (en) * 2004-04-23 2010-06-22 Jpmorgan Chase Bank, N.A. System and method for management and delivery of content and rules
JP4448537B2 (en) * 2004-04-26 2010-04-14 コダック グラフィック コミュニケーションズ カナダ カンパニー System and method for comparing documents containing graphic elements
US7840571B2 (en) * 2004-04-29 2010-11-23 Hewlett-Packard Development Company, L.P. System and method for information management using handwritten identifiers
WO2005109291A2 (en) * 2004-05-05 2005-11-17 Ims Health Incorporated Data record matching algorithms for longitudinal patient level databases
US20050278623A1 (en) * 2004-05-17 2005-12-15 Dehlinger Peter J Code, system, and method for generating documents
US20050278378A1 (en) * 2004-05-19 2005-12-15 Metacarta, Inc. Systems and methods of geographical text indexing
US7707169B2 (en) * 2004-06-10 2010-04-27 Siemens Corporation Specification-based automation methods for medical content extraction, data aggregation and enrichment
US20060080278A1 (en) * 2004-10-08 2006-04-13 Neiditsch Gerard D Automated paperless file management
US7359895B2 (en) * 2004-11-18 2008-04-15 Industrial Technology Research Institute Spiral string matching method
EP1846881A4 (en) * 2005-01-28 2009-08-26 United Parcel Service Inc Registration and maintenance of address data for each service point in a territory
US20060200464A1 (en) * 2005-03-03 2006-09-07 Microsoft Corporation Method and system for generating a document summary
US20060265357A1 (en) * 2005-04-26 2006-11-23 Potts Matthew P Method of efficiently parsing a file for a plurality of strings
WO2007011841A2 (en) * 2005-07-15 2007-01-25 Indxit Systems, Inc. Systems and methods for data indexing and processing
WO2008107997A1 (en) * 2007-03-08 2008-09-12 Fujitsu Limited Slip category identifying program, slip category identifying method and slip category identifying device
US20090116757A1 (en) * 2007-11-06 2009-05-07 Copanion, Inc. Systems and methods for classifying electronic documents by extracting and recognizing text and image features indicative of document categories
US20120041955A1 (en) * 2010-08-10 2012-02-16 Nogacom Ltd. Enhanced identification of document types
US20140122479A1 (en) * 2012-10-26 2014-05-01 Abbyy Software Ltd. Automated file name generation

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210295033A1 (en) * 2020-03-18 2021-09-23 Fujifilm Business Innovation Corp. Information processing apparatus and non-transitory computer readable medium

Also Published As

Publication number Publication date
US20130110844A1 (en) 2013-05-02
US10474701B2 (en) 2019-11-12
US9754017B2 (en) 2017-09-05
US11361006B2 (en) 2022-06-14
WO2007011841A3 (en) 2007-12-06
CA2657212A1 (en) 2007-01-25
US20120096036A1 (en) 2012-04-19
CA2928051A1 (en) 2007-01-25
CA2975694C (en) 2020-12-08
US8370387B2 (en) 2013-02-05
US20070013968A1 (en) 2007-01-18
US20150149488A1 (en) 2015-05-28
US8954470B2 (en) 2015-02-10
US20220318282A1 (en) 2022-10-06
CA2657212C (en) 2017-02-28
US7860844B2 (en) 2010-12-28
US20070013967A1 (en) 2007-01-18
CA3074633A1 (en) 2007-01-25
US11947576B2 (en) 2024-04-02
US20200050615A1 (en) 2020-02-13
WO2007011841A2 (en) 2007-01-25
CA2928051C (en) 2018-07-24
US20170357711A1 (en) 2017-12-14
CA3074633C (en) 2022-11-08
US8112441B2 (en) 2012-02-07

Similar Documents

Publication Publication Date Title
US11947576B2 (en) Systems and methods for facilitating improved automated document indexing utilizing manual indexing input
US9262584B2 (en) Systems and methods for managing a master patient index including duplicate record detection
US11036808B2 (en) System and method for indexing electronic discovery data
US8315997B1 (en) Automatic identification of document versions
US10572461B2 (en) Systems and methods for managing a master patient index including duplicate record detection
US8671112B2 (en) Methods and apparatus for automated image classification
US20200005329A1 (en) Unique documents determination
US20080147642A1 (en) System for discovering data artifacts in an on-line data object
US20080147641A1 (en) Method for prioritizing search results retrieved in response to a computerized search query
US20080147588A1 (en) Method for discovering data artifacts in an on-line data object
Dusetzina et al. An overview of record linkage methods
US20110023034A1 (en) Reducing processing overhead and storage cost by batching task records and converting to audit records
Durham et al. Private medical record linkage with approximate matching
US20130212118A1 (en) System for managing litigation history and methods thereof
US8819021B1 (en) Efficient and phased method of processing large collections of electronic data known as “best match first”™ for electronic discovery and other related applications
US11593439B1 (en) Identifying similar documents in a file repository using unique document signatures

Legal Events

Date Code Title Description
EEER Examination request

Effective date: 20170807