WO2001011492A1 - System and method for language extraction and encoding - Google Patents
System and method for language extraction and encoding Download PDFInfo
- Publication number
- WO2001011492A1 WO2001011492A1 PCT/US2000/021515 US0021515W WO0111492A1 WO 2001011492 A1 WO2001011492 A1 WO 2001011492A1 US 0021515 W US0021515 W US 0021515W WO 0111492 A1 WO0111492 A1 WO 0111492A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- text data
- parsing
- referring
- structured
- segmenting
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/253—Grammatical analysis; Style critique
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
- G06F40/35—Discourse or dialogue representation
Definitions
- Appendix A A microfiche appendix containing source code utilized in practicing an exemplary embodiment of the invention is included as part of the Specification and is hereinafter referred to as Appendix A.
- Appendix A includes a total of 5 microfiche and a total of 465 frames.
- This invention relates to the computerized processing of natural- language phrases used in specialized areas of expertise such as medicine, clinical sciences, genomics, etc. More particularly, the present invention is related to the extraction and encoding of information from natural-language text sources such as physician reports and technical and scientific literature. BACKGROUND OF THE INVENTION
- RECIT uses syntactical information to recognize the structure of local phrases and interleaves phrase recognition with semantic knowledge in order to assemble semantically relevant groupings and representations. See Zweigenbaum et al., "A Multi-Lingual Architecture for Building a Normalized Conceptual Representation from Medical Language," Proceedings of the 19th Annual SCAMC. pp. 357-361 (1995).
- SPRUS which was initially purely semantically driven, uses semantic information relating to words in a sentence along with expectations about findings, locations and conditions associated with the words. See, e.g., G. Hripcsak et al., Unlocking Clinical Data from Narrative Reports," Ann, of Int.
- a special interest group has been formed to further promote the use of SGML and XML in the electronic patient records.
- HL7 SGML/XML Special Interest Group web site at http://www.mcis.duke.edu/standards/HL7/committees/sgml/.
- the group's effort involves specifications for embedding XML within the HL7 structure and for developing a model of medical documents to facilitate exchange of documents between users.
- the HL7 Document Patient Record Architecture http://www.mcis.duke.edu standards/HL7/committee/sgml/WhitePapers/Prap.
- Another goal is to enable automated applications to process the documents after the document exchange has been made.
- Zweigenbaum also proposed the adoption of an "enriched-document” paradigm based on SGML and natural language processing to further the dissemination of applications that utilize natural language processing methodology. See P. Zweigenbaum et al., "From Text to Knowledge: a Unifying Document- oriented View of Analyzed Medical Language,” Workshop on Medical Concept Representation and Natural Language Processing,” IMIA WG6. pp. 21-29 (1997). A number of benefits of using a document-oriented model were discussed including the ability to use annotated text as a valuable resource to further the development of language processing systems.
- Zweigenbaum also proposed embedding a conceptual graphical representation into each sentence of a document.
- a primary object is to provide a natural language processing system for extracting information from a natural language document input that can be easily adapted for use in a variety of areas of expertise by modifying, if necessary, one or more corresponding knowledge components.
- the natural language processing system of the present invention can be used for extracting medical/clinical data from physician reports and genomics-related information from electronic text sources.
- a preferred method for extracting information from natural language data includes basic steps here designated as phrase parsing and regularizing and, optionally, code selection. Further included, preferably, is a step of pre-processing prior to phrase parsing, and a step of output filtering.
- a structured output can be generated in the form of a printout, as a monitor display, as a database entry, or via the Internet, for example. Preferably, the structured output is then mapped back to the words in the original sentences of the text data input.
- one or several parameters are referred to.
- the parameters are associated with options. To choose an option, the appropriate value is assigned to the parameter.
- a parameter can have a value by default. Of particular importance is the inclusion of a parameter which is associated with the medical/clinical domain or sub-field of the input data. Other parameters may be associated with the level of parsing accuracy desired, whether code selection is desired, the type of filtering, or the format of the output.
- the method can be expressed in a high-level computer language such as Prolog, for example, for execution as a system on a suitable general-purpose computer.
- Prolog a high-level computer language
- the method and the system will be referred to by the acronym MedLEE, short for Medical Language Extraction and Encoding. Further objects, features and advantages of the invention will become apparent from the following detailed description taken in conjunction with the accompanying figures showing illustrative embodiments of the invention.
- FIG. 1 is a block diagram of an information extraction system in accordance with a preferred embodiment of the present invention
- FIG. 2 is a block diagram of a system or application having an interface for the MedLEE system of FIG. 1 ;
- FIG. 3 is a block diagram of an information extraction system in accordance with another preferred embodiment of the present invention.
- FIG. 4 is an example XML tagging scheme for address entry;
- FIG. 5 is an example document type definition (DTD) corresponding to the tagging scheme of FIG. 4);
- FIG. 6 is an example of a structured component in accordance with the information extraction system of FIG. 3;
- FIG. 7 is an example of a tagged text component in accordance with the information extraction system of FIG. 3;
- FIG. 8 is another example of a structured component in accordance with the information extraction system of FIG. 3;
- FIG. 9 is a sample output wherein information extracted from a document input is highlighted and viewable via a web browser in accordance with the information extraction system of FIG. 3.
- a microfiche appendix attached hereto includes a printout of computer source code for the MedLEE computer program.
- FIG. 1 is a block diagram of an information extraction system in accordance with a preferred embodiment of the present invention.
- the information extraction system of FIG. 1 known as MedLEE, is designed for use as a general processor within the medical domain, e.g., radiography, mammography, neuroradiology, pathology, and electrocardiography, etc. Although used for language extraction in the medical/clincal context, MedLEE can be adapted for use in other domains such as genomics.
- FIG. 3 An additional preferred embodiment of MedLEE is shown in FIG. 3, wherein the MedLEE embodiment is written in Quintus Prolog and uses the Unix or Windows operating systems, is described in detail below.
- the Appendix below provides both "stand-alone” and “server” versions, ml_parser.pl and ml_server.pl, respectively, of the information extraction system of FIG.
- the stand-alone version ml_parser.pl is a top level control program that establishes input and output data streams, writes error messages to the user, and calls the main program radrec.pl.
- the server version ml_server.pl is similar to the stand-alone except that it is designed to begin processing upon receiving a request from a client server.
- Medleefunc.pl is a function that allows another process to compile the MedLEE program into a host system by using a function call.
- a natural-language phrase included in text document is understood as a delimited string comprising natural-language terms or words.
- the string is computer- readable as obtained, e.g., from a pre-existing database, or from keyboard input, optical scanning of typed or handwritten text, or processed voice input.
- the delimiter may be a period, a semicolon, an end-of-message signal, a new-paragraph signal, or any other suitable symbol recognizable for this purpose.
- the terms are separated by another delimiter, e.g., a blank or another suitable symbol.
- phrases in a natural-language phrase are classified, e.g., as referring to a body part, a body location, a clinical condition or a degree of certainty of a clinical condition, and the relationships between the terms are established and represented in a standard form.
- “moderate cardiac enlargement” “moderate” is related to "enlargement” and cardiac is also related to "enlargement”.
- protein X and “gene Y” are related to the action “activate” in the term "X activates Y”.
- parsing can be domain or sub-domain specific in accordance with the value of a domain parameter used by the system. Depending on the value of the domain parameter, the appropriate rules can be referred to in parsing by the system.
- parsing may be based primarily on semantics or meaning, use of syntactic or grammatical information is not precluded.
- Regularizing involves bringing together terms which may be discontiguous in a natural-language phrase but which belong together conceptually as a structured word term. Regular forms or composites ("compositional mappings") are obtained. Regularizing may involve reference to a separate knowledge base. For example, from each of the phrases “heart is enlarged”, “enlarged heart”, “heart shows enlargement” and “cardiac enlargement", a regularizer can generate "enlarged heart”.
- code selection which is optional, a common, unique vocabulary term or code is assigned to each regular term by reference to yet another knowledge base which may also be chosen domain specific.
- code selection which is optional, a common, unique vocabulary term or code is assigned to each regular term by reference to yet another knowledge base which may also be chosen domain specific.
- the term "cystic disease” has a different meaning as compared with the domain of mammography.
- FIG. 1 shows a preprocessor module 11 by which natural-language input text is received.
- the preprocessor uses the lexicon knowledge base 101 and handles abbreviations, which may be domain dependent.
- the preprocessor 11 thus performs lexical lookup to identify and categorize multi-word and single word phases within each sentence.
- the output of this component consists of a list of word positions where each position is associated with a word or multi-word phrase in the report. For example, assuming that the sentence "spleen appears to be moderately enlarged" is at the beginning of the report, it would be represented as the list where position 1 is associated with “spleen", position 2 with the multi-word phrase "appears to be", position 5 with "moderately", and 6 with "enlarged”. The remainder of the list of word positions would be associated with the remaining words in the report.
- the preprocessor refers to the proper knowledge base. For example, depending on the domain, the abbreviation "P.E.” can be understood as physical examination or as pleural effusion. Also, the preprocessor determines phrase or sentence boundaries, and generates a list form for each phrase for further processing by the parser module 12.
- the second component of the MedLEE system is the parser. It utilizes the grammar and categories assigned to the phrases of a sentence to recognize well- formed syntactic and semantic patterns in the sentence and to generate intermediate forms.
- the target form generated by the parser for the sample phrase "spleen is moderately enlarged" would be the frame:
- each frame is a number representing the position of the corresponding phrase in the report.
- the number will be replaced by an output form that is the canonical output specified by the lexical entry of the word or phrase in that position and a reference to the position in the text.
- the parser proceeds by starting at the beginning of the sentence position list and following the grammar rules. When a semantic or syntactic category is reached in the grammar, the lexical item corresponding to the next available unmatched position is obtained and its corresponding lexical definition is checked to see whether or not it matches the grammar category.
- the parser module 12 uses the lexicon 101, and a grammar module
- sub-phrase parsing can be used to advantage where highest accuracy is not required.
- one or several attempts can be made to parse a portion of the phrase for obtaining useful information in spite of some possible loss of information. For example, in the phrase "spleen was enlarged after going to the movies", the words “spleen was enlarged” are processed and the remaining words are skipped.
- the next component of the natural language processing system performs phrase regularization. It first replaces each position number with the canonical output form specified in the lexical definition of the phrase associated with its position in the report. It also adds a new modifier frame idref for each position number that is replaced. For example, the sample output form shown above would be changed to:
- This stage also composes multi-word phrases, i.e., compositional mappings, which are separated in the documents.
- multi-word phrases i.e., compositional mappings
- the individual components of the multi-word term "enlarged spleen” are separated.
- "Spleen” and “enlarged” are composed during phrase regularization and mapped into the target form "enlarged spleen” so that the output at this stage would be: [problem,enlarged spleen,[idref,[6,l]],[bodyloc,spleen,[idref,l]],
- maps.pl amd mmaps.pl are used by the regularizer 13.
- maps.pl is a knowledge base of "standard”
- automatically generated compositional mappings and mmaps.pl is a knowledge base of manually generated compositional mappings.
- the phrase regularizer 13 composes regular terms as described above.
- the filter module 14 deletes information on the basis of parameter settings. For example, a parameter can be set to call for removal of negative findings.
- a preferred embodiment of the filter module 14 is shown by the removefromtarg routine of the radrec.pl file provided in the Appendix.
- the next component performs the encoding. This consists of mapping the canonical forms into controlled vocabulary terms if applicable. In this example, we assume the controlled term for "enlarged spleen" is “splenomegaly", the controlled term for "moderate” is "moderate degree”, and the controlled term for "appears” is "moderate certainty". The target form would be translated into:
- the encoder module 15 uses a table of codes 104 to translate the regularized forms into unique concepts which are compatible with a clinical controlled vocabulary.
- a preferred embodiment of the encoder module 15 is shown by the computecode routine of the newform.pl file provided in the Appendix.
- tagger 16 of FIG. 3 is used to "tag" the original text data with a structured data component.
- system of the present invention will generate the following output for the phrase "spleen is moderately enlarged" discussed above:
- FIG. 2 shows an interface module 21, and the MedLEE program 22 of FIG. 1.
- the interface module 21 may be domain-specific, and it may serve, e.g., to separate formatted sections from non-formatted sections in a report. Also, the interface 21 may serve to pass chosen parameter values to the MedLEE system 22 and to pass output from the MedLEE system. For example, such an interface can be designed for communication over the World-Wide Web or a local network, for input to or output from MedLEE.
- each module is software-implemented and stored in random-access memory of a suitable computer, e.g., a work-station computer.
- the software can be in the form of executable object code, obtained, e.g., by compiling from source code. Source code interpretation is not precluded.
- Source code can be in the form of sequence-controlled instructions as in Fortran, Pascal or "C", for example.
- a rule-based system can be used such a Prolog, where suitable sequencing is chosen by the system at run-time.
- Process sents with getjnputsents, process sects and outputresults reads in an input stream, processes sections of the input stream according to parameter settings, and produces output according to the settings.
- parameters supplied to Process _sents are the following: Exam (specifying the sub-domain in a medical/clinical domain), Mode (specifying the parsing mode), Amount (specifying the type of filtering), Type (specifying the output format) and Protocol (html or plain).
- Process _sents is called by another predicate, after user- specified parameters have been processed.
- error recovery block 17 utilizes various error recovery techniques in order to achieve at least a partial analysis of the phrase. These error recovery techniques include, for example, segmenting a sentence or phrase at pre-defined locations and processing the corresponding sentence portions or sub-phrases. Each recovery technique is likely to increase sensitivity but decrease specificity and precision.
- Sensitivity is the performance measure equal to the true positive information rate of the natural language system, i.e., the ratio of the amount of information actually extracted by the natural language processing system to the amount of information that should have been extracted.
- Specificity is the performance measure equal to the true negative information rate of the system, i.e., the ratio of the amount of information not extracted to the amount of information that should not have been extracted. In processing a report, the most specific mode is attempted first, and successive less specific modes are used only if needed.
- the parser 12 of FIGS. 1 and 3 includes five parsing modes, Modes 1 through 5, for parsing sentences or phrases. Nominally, the parser 12 is configured to first select Mode 1 and then Modes 2 through 5 successively until parsing is completed. With Mode 1 , the initial segment is the entire sentence and all words in the segment must be defined. This mode requires a well-formed pattern for the complete segment. Mode 2 requires that the sentence or phrase be segmented at certain types of words or phrases, e.g.,” ...consistent with ... ".
- Mode 3 requires a well-formed pattern for the "largest" prefix of the segment, i.e., usually a prefix occurring in the beginning of a sentence.
- Mode 3 is useful when a sentence contains a pattern at the end which is not included in the grammar but a beginning portion that is included. For example, in the phrase "severe pain in arm developed on the fifth floor ", the beginning of the phrase “on the fifth floor” will be skipped and “severe pain in arm developed” will be parsed.
- Mode 4 requires that undefined words be skipped and an analysis be attempted in accordance with mode 1. Mode 4 processing is useful where there are typographical errors and unknown words. For example, in the phrase “a lxgre suspicious calcification was seen” the term “lrgx" will be skipped but the remainder of the phrase will be parsed.
- Mode 5 first requires that the first word or phrase in the segment associated with a primary finding, e.g., "infiltrate”, “mastectomy”, “penicillin”, etc.) be found. Next, an attempt is made to recognize the phrase starting with the leftmost recognizable modifier. For example, “in during severe pain in arm up to the fifth floor”, the phrase “severe pain in arm” will be parsed and the remaining words will be skipped. If no analysis is found, recognition is retried at the next modifier to the right.
- a primary finding e.g., "infiltrate”, “mastectomy”, “penicillin”, etc.
- Setargs sets arguments or parameter values based on user input or by default.
- Removefromtarg filters formatted output by leaving only positive clinical information and by removing negative findings and possibly findings associated with past information from the formatted output. If an input parameter is pos, only negative findings are removed; if the parameter is pac, both negative and positive findings are removed. Any number of different filters can be included as required.
- Write _structured generates the structured component of the output depending on the output format specified by the user.
- Writelines produces one line per finding in list format, whereas writenested generates all findings at once.
- Writeindentform and writeindentform2 produce output in indented form.
- writesgmlfroms generates output in XML form, and writetabular generates output in tabular form.
- the routine markupsents envelopes the original sentence with tags so that the clinical information is highlighted.
- the markup Jext routine is used to add identifier tags to the original text. Different types of information can be highlighted in different colors by using an appropriate Internet browser program such as
- the outputhl! routine is used to convert the MedLEE output to an appropriate form for storage in database (xformtodb) and to write the MedLEE output in an HL7 in coded format.
- This process uses synonym knowledge and an encoding knowledge base.
- the output generated by the MedLEE program is a frame-based representation wherein each frame specifies the informational type, value, and modifier slots (which are also frames). See C. Friedman, J. Starren, S. Johnson, "Architectural Requirements For a Multipurpose Natural Language Processor in The Clinical Environment," Proceedings of SCAMC. pp. 347-351 (1995).
- a corresponding intermediate output is a frame denoting a problem, which has the value enlarged; in addition, there are degree and body location modifiers with the values moderate and spleen respectively:
- the intermediate output undergoes several mappings before the corresponding structured word term (structured output) is created.
- Compositional mapping for example, is required in order to compose components of multi-word phrases that are separated in the original text; another type of mapping is necessary to translate target terms into controlled vocabulary concepts.
- a final mapping is generally performed in order to translate the frame format to the final structured output format of the MedLEE program.
- FIG. 3 shows a block diagram of a second embodiment of the information extraction (MedLEE) program of FIG. 1.
- the modified program 300 includes a tagger routine 16 for linking the structured output described previously with respect to FIG. 1 to the corresponding words in the original sentences of the text data input.
- the tagger 16 utilizes markup languages, such as Hypertext Markup Language (HTML) and Extensible Markup Language (XML), which are derived from Standard Generalized Markup Language (SGML) and which are used rendering documents for the World Wide Web.
- markup languages such as Hypertext Markup Language (HTML) and Extensible Markup Language (XML), which are derived from Standard Generalized Markup Language (SGML) and which are used rendering documents for the World Wide Web.
- HTML Hypertext Markup Language
- XML Extensible Markup Language
- SGML Standard Generalized Markup Language
- markup languages Widespread adoption of markup languages are evidenced by: the Text Encoding Initiative (TEI) which uses SGML to encode literature; Chemical Markup Language (CML), which involves documentation of chemical compounds using SGML; and Open Financial Exchange (OFE), which is an SGML standard format for interchange of financial transactions.
- TEI Text Encoding Initiative
- CML Chemical Markup Language
- OFE Open Financial Exchange
- the tagging schema disclosed herein integrates content-centric and document-centric approaches in that salient clinical information is represented in a structured XML form that contains references to identifiers in the unstructured report where the original words and phrases are assigned unique identifiers.
- This design is optimal both for searching because it is not dependent on the ordering of the phrases within the text, and for rendering text to users because the structured XML form contains references to appropriate portions of the original text.
- XML is a subset of SGML that is computationally less complex than SGML, and therefore simpler and more efficient to process.
- XML is a language that provides the ability to augment additional elements of information, i.e., tags, to textual documents so as to provide documents that have machine independent.
- Documents with such formats can be easily manipulated across a variety of different computing platforms, and are structured using varying levels of complexity, i.e., sections, paragraphs, sentences, phrases, etc.
- FIG. 4 below is an example of a document which represents an address using XML tags.
- the street, city, state and zipcode tags are nested within an address tag
- the number and street_name tags are nested within the street tag.
- Having an address in this form provides a way to manipulate documents with address tags in different ways. For example, documents with a specified zip code and street name can be retrieved easily by searching for the text enclosed by the zipcode and street_name tags.
- the structure of XML documents is specified using a DTD, which is a set of blueprints related to information about the organization of the document type and consists of specifications concerning the structure of the document. The DTD is used by an XML parser to ensure that a document is valid according to the DTD.
- FIGS. 5-7 are examples of a document tagging schema in accordance with a preferred embodiment of the present invention.
- the document tagging schema uses a document structure based on Extensible Markup Language (XML), a subset of Standard Generalized Markup Language (SGML), designed for ease of implementation and interoperability with SGML and HTML standards used by most Internet web browsers.
- XML Extensible Markup Language
- SGML Standard Generalized Markup Language
- HTML HyperText Markup Language
- the schema embeds a tagged, structured and encoded representation of the informational content of an original document within an enriched version of the original document.
- radiologists can enrich reports by mapping textual findings of references to regions of a digitized image.
- FIG. 5 shows, by way of example and not limitation, a (simplified) document structure or document type definition (DTD) of a clinical report (medleeOut) generated by the MedLEE computer program of FIG. 3. DTD's can be further customized as required depending upon the specific report being generated.
- DTD document type definition
- the section element of the DTD includes two components: a structured component structured containing structured data, and a tagged textual element tt.
- the structured component provides a content-centric view of the report, and is essential for enabling reliable and efficient access to information in the document.
- the structured component also contains information that references corresponding textual portions of the report.
- the structured component itself includes one or more components corresponding to a primary finding called problem or procedure.
- the component problem in turn contains one or more components corresponding to modifiers of the findings, for example, certainty, degree, status, change, bodyloc, region, sid, and idref.
- the modifier components are also defined in the DTD; those having no nested structures are defined using the keyword EMPTY, e.g., the definition of sid which specifies a sentence identifier.
- tags representing the primary findings and modifiers also have attributes.
- problem has an attribute v which must be present
- the tagged textual element tt is also specified in FIG. 5. It provides a document-centric view of the report because it consists of the original report enriched with tags that delineate and identify textual elements sent (marking sentences) and #PCDATA which is the original textual data.
- the component sent consists of textual data, phrases phr, or undefined words undef.
- the component phr has an attribute id whose value is a unique identifier within the report.
- the idref attributes of the elements of the structured components correspond to the id attributes of the phrases.
- the idref attributes of the sid elements of the structured components correspond to the id attributes of the sentences (sent).
- FIG. 6 shows an example of a structured component utilized by MedLEE for "tagging" the following input from a "History of Present Illness” section of a physician report: "Intermittent pain in lower abdomen developed on 3/4/95.
- the structured output includes two problem tags corresponding to the informational type problem.
- the first problem tag has the value "pain,” itself having a reference to identifier p2 along with other modifiers, themselves also having their own values and identifiers.
- the second problem tag has the value "swelling" and reference identifier pl3, the "swelling value itself having the modifiers certainty, body location (bodyloc) and sentence identifier (sid).
- the sentence identifier sid includes a section number, paragraph number within the section, and the sentence number within the paragraph.
- the problem tag also has embedded tags which are modifiers.
- the bodyloc modifier has an attribute v whose value is "abdomen" and also an idref attribute.
- Tags that correspond to phrases in the original textual report have idref attributes. However some tags do not have an idref attribute because they do not correspond to a phrase in the original report but to contextual information added during parsing.
- parsemode specifies the method used to structure the information. The parse mode is a measure of accuracy of the output based on the mode used to interpret the sentence and obtain the structured form. Mode 1 is likely to be the most accurate interpretation whereas Mode 5 is likely to be the least accurate.
- the values of the v attribute are frequently the same as the corresponding words and phrases in the report.
- the value of the v attribute can be different from the corresponding phrase in the actual report because it corresponds to a controlled vocabulary term, e.g., "splenomegaly", which is different from the canonical textual form, e.g., "enlarged spleen”.
- FIG. 7 shows a tagged text element tt corresponding to the text data input described in connection with FIG. 6.
- the tt component is the same as the original report except that it is enriched with tags that uniquely identify sentences and phrases.
- a tag sent notes the beginning of a new sentence and includes an attribute id whose value identifies the section number, paragraph number and sentence number of the sentence in the original report. This information is useful for certain applications. For example, discharge summaries in hospitals generally include a "History of Present Illness" section containing the chief complaint, and a "Hospital Course” section containing the discharge plan.
- sentences which are adjacent and in the same paragraph generally refer to the same body locations and time period, unless another body location or time period is explicitly stated.
- Writesgmlform writes structured output in XML form similar to the example shown in FIG. 6.
- Writeflats write structures output in tabular form which is convenient for importing into a database or spreadsheet.
- Markup Jext writes the tagged textual portion similar to the example shown in FIG. 7.
- the tag phr denotes the beginning of a single or multi-word phrase of the report. It has an id attribute, whose value is a unique identifier of the phrase within the report. Phrases which are referenced in the structured component are shown in FIG. 7 and those that are not referenced are omitted. For example, the word “no" in the phrase “no evidence of is preceded by the begin phr tag identified by "pi 1", and the word "of is followed by the end phr tag. Sent may also have an element which is an undef tag. This tag surrounds words which are not found in the lexicon. This may prove useful for other applications, such as further training of the NLP system or identification of proper names.
- FIG. 8 shows a structured component for the phrase "the spleen and liver appear to be moderately enlarged".
- the values of the id attributes of the tag phr are based on the assumption that the phrase appears at the beginning of the report so that the first word "the” of the phrase is assigned a position 1.
- the attribute idref for splenomegaly has two values that reference the individual components "enlarged” and "spleen” that constitute the concept splenomegaly.
- FIG. 9 shows a sample "tagged” report generated by the MedLEE program of FIG. 3.
- the output shows the "description" section of a radiological report associated with the clinical condition "congestive heart failure” where terms associated with congestive heart failure are highlighted.
- the report was retrieved and highlighted using a JAVA program and structured output generated by MedLEE.
- the identifiers (idrefs) corresponding to the structured findings associated with the condition were used to highlight the appropriate phrases in the textual report.
- the tagged report of FIG. 9 was retrieved as being positive for a finding of congestive heart failure.
- congestive heart failure is not in the report, but findings suggestive of congestive heart failure are, i.e., cardiomegaly, pulmonary vascular congestion, and pleural effusions. Also notice that these three phrases are highlighted in the report. When a relevant finding associated with congestive heart failure was detected, the value of the idref attribute(s) is used to identify textual phrases to be highlighted.
- the structured output in the text report is placed at the beginning of each section of the report.
- it may be placed at the beginning of the report so that conceptually it is thought of as an index or codification of the contents of the report. It could also be made to precede each sentence.
- more substantial variation of the schema involves more information in the phr tags by adding additional attribute-value pairs other than the id attribute.
- the semantic and syntactic categories of the phrases could also be supplied by adding the appropriate attributes sem and syn to the phrase tag.
- it may be desirable to display different types of relevant information for example, modifiers may be displayed as well as primary findings. As such, it may be desirable to highlight more than just the primary finding by using different colors to highlight different types of information including but not limited to body location, degree and certainty modifiers.
- the XML output documents were then parsed successfully using the DTD and an XML validating parser.
- the validating parser was used to automatically convert the XML structured output (output A) to a line format output (output B).
- a previous version of MedLEE was used to process the same report inputs and generate a structured output (output C) in the same line format used to create output B. Outputs B and C were then compared and verified to be identical.
- a computer system that embeds structured encoded information within a textual report using XML. Having the capability to associate structured output with portions of the original report adds significant functionality to the report.
- Applications or user of the above-described system can utilize the structured component of the XML output to obtain highly specific retrieval capabilities and then be able to highlight relevant information, thereby facilitating manual review.
- a special browser can be used to highlight specific information, such as diagnoses, procedures performed, medications given, or pertinent history, in order to assist the user in the reading of a report.
- natural language processing can be used to automatically creating an enriched document that contains a structured component whose elements are linked to corresponding portions of the original textual report.
- the integrated document model used by the tagging feature of the above-described system provides a representation wherein textual documents or reports containing specific information can be accurately and efficiently retrieved automatically by querying the structured components. If manual review of the documents is desired, the salient information in the original reports can also be identified and highlighted.
- Using an XML model of tagging provides an additional benefit that software tools that manipulate XML documents are readily available.
- the above-described natural language processing system of FIGS. 1-3 can further be adapted to extract a variety of different information from scientific or technical natural language text sources.
- the natural language processing system of the present invention can be adapted for extracting, for example, gene, protein and other related information from genomics-related literature.
- An example of such a lexicon, lexsemsub.tmp, lexsemact.tmp, and lexsyn.tmp collectively, is provided in the Appendix;
- gengram.pl which also provided in the Appendix, is an example of a grammar for use with genomics literature.
- the genomics-related application of the above- described natural language processing system may also require related mapping and coding knowledge bases.
Abstract
Description
Claims
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CA2381251A CA2381251C (en) | 1999-08-06 | 2000-08-04 | System and method for language extraction and encoding |
AU65263/00A AU773723B2 (en) | 1999-08-06 | 2000-08-04 | System and method for language extraction and encoding |
GB0203590A GB2368432B (en) | 1999-08-06 | 2000-08-04 | System and method for language extraction and encoding |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US09/370,329 | 1999-08-06 | ||
US09/370,329 US6182029B1 (en) | 1996-10-28 | 1999-08-06 | System and method for language extraction and encoding utilizing the parsing of text data in accordance with domain parameters |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2001011492A1 true WO2001011492A1 (en) | 2001-02-15 |
Family
ID=23459192
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2000/021515 WO2001011492A1 (en) | 1999-08-06 | 2000-08-04 | System and method for language extraction and encoding |
Country Status (5)
Country | Link |
---|---|
US (1) | US6182029B1 (en) |
AU (1) | AU773723B2 (en) |
CA (1) | CA2381251C (en) |
GB (1) | GB2368432B (en) |
WO (1) | WO2001011492A1 (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2003042859A2 (en) * | 2001-11-15 | 2003-05-22 | Forinnova As | Method and apparatus for textual exploration and discovery |
WO2005050475A1 (en) * | 2003-11-21 | 2005-06-02 | Agency For Science, Technology And Research | Method and system for validating the content of technical documents |
EP1803076A2 (en) * | 2004-10-20 | 2007-07-04 | Motorola, Inc. | An electronic device and method for visual text interpretation |
WO2008042716A2 (en) * | 2006-09-29 | 2008-04-10 | Agiledelta, Inc. | Knowledge based encoding of data with multiplexing to facilitate compression |
US7509572B1 (en) * | 1999-07-16 | 2009-03-24 | Oracle International Corporation | Automatic generation of document summaries through use of structured text |
US7650573B2 (en) | 2005-08-11 | 2010-01-19 | Microsoft Corporation | Layout rules for whitespace sensitive literals |
TWI406199B (en) * | 2009-02-17 | 2013-08-21 | Univ Nat Yunlin Sci & Tech | Online system and method for reading text |
US9152623B2 (en) | 2012-11-02 | 2015-10-06 | Fido Labs, Inc. | Natural language processing system and method |
US9800536B2 (en) | 2015-03-05 | 2017-10-24 | International Business Machines Corporation | Automated document lifecycle management |
US10956670B2 (en) | 2018-03-03 | 2021-03-23 | Samurai Labs Sp. Z O.O. | System and method for detecting undesirable and potentially harmful online behavior |
Families Citing this family (335)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6816830B1 (en) * | 1997-07-04 | 2004-11-09 | Xerox Corporation | Finite state data structures with paths representing paired strings of tags and tag combinations |
US6694055B2 (en) | 1998-07-15 | 2004-02-17 | Microsoft Corporation | Proper name identification in chinese |
US6915254B1 (en) | 1998-07-30 | 2005-07-05 | A-Life Medical, Inc. | Automatically assigning medical codes using natural language processing |
AUPP577298A0 (en) * | 1998-09-09 | 1998-10-01 | Oon, Yeong Kuang Dr | Automation oriented health care delivery system based on medical scripting language |
JP2000099419A (en) * | 1998-09-18 | 2000-04-07 | Matsushita Graphic Communication Systems Inc | Device and method for editing electronic mail address, and input device |
US8006177B1 (en) * | 1998-10-16 | 2011-08-23 | Open Invention Network, Llc | Documents for commerce in trading partner networks and interface definitions based on the documents |
US20070156458A1 (en) * | 2005-10-04 | 2007-07-05 | Anuthep Benja-Athon | Sieve of words in health-care data |
US6314125B1 (en) * | 1998-12-09 | 2001-11-06 | Qualcomm Incorporated | Method and apparatus for the construction and transmission of binary quasi orthogonal vectors |
GB9904662D0 (en) * | 1999-03-01 | 1999-04-21 | Canon Kk | Natural language search method and apparatus |
US6633819B2 (en) * | 1999-04-15 | 2003-10-14 | The Trustees Of Columbia University In The City Of New York | Gene discovery through comparisons of networks of structural and functional relationships among known genes and proteins |
WO2000077709A1 (en) * | 1999-06-14 | 2000-12-21 | Integral Development Corporation | System and method for conducting web-based financial transactions in capital markets |
US8862507B2 (en) | 1999-06-14 | 2014-10-14 | Integral Development Corporation | System and method for conducting web-based financial transactions in capital markets |
US7882011B2 (en) * | 2000-10-31 | 2011-02-01 | Integral Development Corp. | Systems and methods of conducting financial transactions |
US6405211B1 (en) * | 1999-07-08 | 2002-06-11 | Cohesia Corporation | Object-oriented representation of technical content and management, filtering, and synthesis of technical content using object-oriented representations |
US6741992B1 (en) * | 1999-07-29 | 2004-05-25 | Xtenit | Flexible rule-based communication system and method for controlling the flow of and access to information between computer users |
US6907564B1 (en) * | 1999-08-30 | 2005-06-14 | International Business Machines Corporation | Representing IMS messages as XML documents |
US7086002B2 (en) * | 1999-09-27 | 2006-08-01 | International Business Machines Corporation | System and method for creating and editing, an on-line publication |
JP2003520366A (en) | 1999-11-01 | 2003-07-02 | インテグラル ディヴェロップメント コーポレイション | System and method for conducting web-based financial transactions in a capital market |
US6678409B1 (en) * | 2000-01-14 | 2004-01-13 | Microsoft Corporation | Parameterized word segmentation of unsegmented text |
JP3368883B2 (en) * | 2000-02-04 | 2003-01-20 | インターナショナル・ビジネス・マシーンズ・コーポレーション | Data compression device, database system, data communication system, data compression method, storage medium, and program transmission device |
US7069592B2 (en) | 2000-04-26 | 2006-06-27 | Ford Global Technologies, Llc | Web-based document system |
US6745181B1 (en) | 2000-05-02 | 2004-06-01 | Iphrase.Com, Inc. | Information access method |
US6704728B1 (en) * | 2000-05-02 | 2004-03-09 | Iphase.Com, Inc. | Accessing information from a collection of data |
US8478732B1 (en) * | 2000-05-02 | 2013-07-02 | International Business Machines Corporation | Database aliasing in information access system |
US7127450B1 (en) | 2000-05-02 | 2006-10-24 | International Business Machines Corporation | Intelligent discard in information access system |
US6711561B1 (en) | 2000-05-02 | 2004-03-23 | Iphrase.Com, Inc. | Prose feedback in information access system |
US7099809B2 (en) * | 2000-05-04 | 2006-08-29 | Dov Dori | Modeling system |
US6996776B1 (en) * | 2000-05-16 | 2006-02-07 | International Business Machines Corporation | Method and system for SGML-to-HTML migration to XML-based system |
US6684202B1 (en) * | 2000-05-31 | 2004-01-27 | Lexis Nexis | Computer-based system and method for finding rules of law in text |
US7712024B2 (en) | 2000-06-06 | 2010-05-04 | Microsoft Corporation | Application program interfaces for semantically labeling strings and providing actions based on semantically labeled strings |
US7421645B2 (en) | 2000-06-06 | 2008-09-02 | Microsoft Corporation | Method and system for providing electronic commerce actions based on semantically labeled strings |
US7716163B2 (en) | 2000-06-06 | 2010-05-11 | Microsoft Corporation | Method and system for defining semantic categories and actions |
US7770102B1 (en) | 2000-06-06 | 2010-08-03 | Microsoft Corporation | Method and system for semantically labeling strings and providing actions based on semantically labeled strings |
US7788602B2 (en) * | 2000-06-06 | 2010-08-31 | Microsoft Corporation | Method and system for providing restricted actions for recognized semantic categories |
US6757692B1 (en) * | 2000-06-09 | 2004-06-29 | Northrop Grumman Corporation | Systems and methods for structured vocabulary search and classification |
US9699129B1 (en) | 2000-06-21 | 2017-07-04 | International Business Machines Corporation | System and method for increasing email productivity |
US6408277B1 (en) | 2000-06-21 | 2002-06-18 | Banter Limited | System and method for automatic task prioritization |
US8290768B1 (en) | 2000-06-21 | 2012-10-16 | International Business Machines Corporation | System and method for determining a set of attributes based on content of communications |
US20050027570A1 (en) * | 2000-08-11 | 2005-02-03 | Maier Frith Ann | Digital image collection and library system |
US11526940B2 (en) | 2000-10-31 | 2022-12-13 | Integral Development Corporation | System and method for conducting web-based financial transactions in capital markets |
US7213265B2 (en) * | 2000-11-15 | 2007-05-01 | Lockheed Martin Corporation | Real time active network compartmentalization |
US7225467B2 (en) * | 2000-11-15 | 2007-05-29 | Lockheed Martin Corporation | Active intrusion resistant environment of layered object and compartment keys (airelock) |
CA2431341A1 (en) * | 2000-12-12 | 2002-06-20 | Time Warner Entertainment Company, L.P. | Digital asset data type definitions |
US7644057B2 (en) * | 2001-01-03 | 2010-01-05 | International Business Machines Corporation | System and method for electronic communication management |
US6714939B2 (en) * | 2001-01-08 | 2004-03-30 | Softface, Inc. | Creation of structured data from plain text |
US20020091614A1 (en) * | 2001-01-09 | 2002-07-11 | Ramzi Yehia | Method and system for automatic contract reconciliation in a multilateral environment |
US20020091579A1 (en) * | 2001-01-09 | 2002-07-11 | Partnercommunity, Inc. | Method and system for managing and correlating orders in a multilateral environment |
US7249018B2 (en) * | 2001-01-12 | 2007-07-24 | International Business Machines Corporation | System and method for relating syntax and semantics for a conversational speech application |
JP2002269114A (en) * | 2001-03-14 | 2002-09-20 | Kousaku Ookubo | Knowledge database, and method for constructing knowledge database |
EP1246077A1 (en) * | 2001-03-26 | 2002-10-02 | LION Bioscience AG | Method and apparatus for structuring and searching sets of signals |
US7373600B2 (en) * | 2001-03-27 | 2008-05-13 | Koninklijke Philips Electronics N.V. | DICOM to XML generator |
US7136846B2 (en) | 2001-04-06 | 2006-11-14 | 2005 Keel Company, Inc. | Wireless information retrieval |
US7778816B2 (en) * | 2001-04-24 | 2010-08-17 | Microsoft Corporation | Method and system for applying input mode bias |
US7802183B1 (en) * | 2001-05-17 | 2010-09-21 | Essin Daniel J | Electronic record management system |
AUPR511301A0 (en) * | 2001-05-18 | 2001-06-14 | Mastersoft Research Pty Limited | Parsing system |
WO2002095616A1 (en) * | 2001-05-18 | 2002-11-28 | Mastersoft Research Pty Limited | Parsing system |
US6990451B2 (en) * | 2001-06-01 | 2006-01-24 | Qwest Communications International Inc. | Method and apparatus for recording prosody for fully concatenated speech |
US6829745B2 (en) * | 2001-06-28 | 2004-12-07 | Koninklijke Philips Electronics N.V. | Method and system for transforming an XML document to at least one XML document structured according to a subset of a set of XML grammar rules |
US20030028401A1 (en) * | 2001-07-17 | 2003-02-06 | Leon Kaufman | Customizable lung report generator |
US7130457B2 (en) * | 2001-07-17 | 2006-10-31 | Accuimage Diagnostics Corp. | Systems and graphical user interface for analyzing body images |
US6901277B2 (en) | 2001-07-17 | 2005-05-31 | Accuimage Diagnostics Corp. | Methods for generating a lung report |
US6877000B2 (en) * | 2001-08-22 | 2005-04-05 | International Business Machines Corporation | Tool for converting SQL queries into portable ODBC |
US8234412B2 (en) * | 2001-09-10 | 2012-07-31 | International Business Machines Corporation | Method and system for transmitting compacted text data |
US7849400B2 (en) * | 2001-09-13 | 2010-12-07 | Speech Products, Inc. | Electronic charting system |
US7483938B2 (en) * | 2001-09-27 | 2009-01-27 | International Business Machines Corporation | System for character validation and method therefor |
US7555425B2 (en) | 2001-10-18 | 2009-06-30 | Oon Yeong K | System and method of improved recording of medical transactions |
US7437302B2 (en) * | 2001-10-22 | 2008-10-14 | Siemens Medical Solutions Usa, Inc. | System for managing healthcare related information supporting operation of a healthcare enterprise |
GB2383662B (en) * | 2001-11-26 | 2005-05-11 | Evolution Consulting Group Plc | Creating XML documents |
JP3773447B2 (en) * | 2001-12-21 | 2006-05-10 | 株式会社日立製作所 | Binary relation display method between substances |
US6917969B2 (en) * | 2002-01-03 | 2005-07-12 | International Business Machines Corporation | Portable bean-based content rendering |
US7194402B2 (en) | 2002-01-09 | 2007-03-20 | International Business Machines Corporation | Method and system for converting files to a specified markup language |
US20040073453A1 (en) * | 2002-01-10 | 2004-04-15 | Nenov Valeriy I. | Method and system for dispensing communication devices to provide access to patient-related information |
US20030144886A1 (en) * | 2002-01-29 | 2003-07-31 | Taira Rick K. | Method and system for generating textual medical reports |
US20030149562A1 (en) * | 2002-02-07 | 2003-08-07 | Markus Walther | Context-aware linear time tokenizer |
US7343372B2 (en) * | 2002-02-22 | 2008-03-11 | International Business Machines Corporation | Direct navigation for information retrieval |
US7325194B2 (en) | 2002-05-07 | 2008-01-29 | Microsoft Corporation | Method, system, and apparatus for converting numbers between measurement systems based upon semantically labeled strings |
US7805302B2 (en) * | 2002-05-20 | 2010-09-28 | Microsoft Corporation | Applying a structured language model to information extraction |
US7707024B2 (en) * | 2002-05-23 | 2010-04-27 | Microsoft Corporation | Method, system, and apparatus for converting currency values based upon semantically labeled strings |
US7742048B1 (en) | 2002-05-23 | 2010-06-22 | Microsoft Corporation | Method, system, and apparatus for converting numbers based upon semantically labeled strings |
US7281245B2 (en) * | 2002-06-05 | 2007-10-09 | Microsoft Corporation | Mechanism for downloading software components from a remote source for use by a local software application |
US7827546B1 (en) | 2002-06-05 | 2010-11-02 | Microsoft Corporation | Mechanism for downloading software components from a remote source for use by a local software application |
US7356537B2 (en) * | 2002-06-06 | 2008-04-08 | Microsoft Corporation | Providing contextually sensitive tools and help content in computer-generated documents |
US20030236813A1 (en) * | 2002-06-24 | 2003-12-25 | Abjanic John B. | Method and apparatus for off-load processing of a message stream |
US7716676B2 (en) | 2002-06-25 | 2010-05-11 | Microsoft Corporation | System and method for issuing a message to a program |
US7392479B2 (en) | 2002-06-27 | 2008-06-24 | Microsoft Corporation | System and method for providing namespace related information |
US7533335B1 (en) | 2002-06-28 | 2009-05-12 | Microsoft Corporation | Representing fields in a markup language document |
US7584419B1 (en) | 2002-06-28 | 2009-09-01 | Microsoft Corporation | Representing non-structured features in a well formed document |
US7650566B1 (en) | 2002-06-28 | 2010-01-19 | Microsoft Corporation | Representing list definitions and instances in a markup language document |
US7562295B1 (en) | 2002-06-28 | 2009-07-14 | Microsoft Corporation | Representing spelling and grammatical error state in an XML document |
US7607081B1 (en) | 2002-06-28 | 2009-10-20 | Microsoft Corporation | Storing document header and footer information in a markup language document |
US7209915B1 (en) | 2002-06-28 | 2007-04-24 | Microsoft Corporation | Method, system and apparatus for routing a query to one or more providers |
US7565603B1 (en) | 2002-06-28 | 2009-07-21 | Microsoft Corporation | Representing style information in a markup language document |
US7523394B2 (en) * | 2002-06-28 | 2009-04-21 | Microsoft Corporation | Word-processing document stored in a single XML file that may be manipulated by applications that understand XML |
US7028038B1 (en) | 2002-07-03 | 2006-04-11 | Mayo Foundation For Medical Education And Research | Method for generating training data for medical text abbreviation and acronym normalization |
US7567902B2 (en) * | 2002-09-18 | 2009-07-28 | Nuance Communications, Inc. | Generating speech recognition grammars from a large corpus of data |
US7151864B2 (en) * | 2002-09-18 | 2006-12-19 | Hewlett-Packard Development Company, L.P. | Information research initiated from a scanned image media |
US20040083466A1 (en) * | 2002-10-29 | 2004-04-29 | Dapp Michael C. | Hardware parser accelerator |
US20070061884A1 (en) * | 2002-10-29 | 2007-03-15 | Dapp Michael C | Intrusion detection accelerator |
US7146643B2 (en) * | 2002-10-29 | 2006-12-05 | Lockheed Martin Corporation | Intrusion detection accelerator |
US7080094B2 (en) | 2002-10-29 | 2006-07-18 | Lockheed Martin Corporation | Hardware accelerated validating parser |
US7725330B2 (en) * | 2002-12-03 | 2010-05-25 | Siemens Medical Solutions Usa, Inc. | Systems and methods for automated extraction and processing of billing information in patient records |
US20040167910A1 (en) * | 2002-12-06 | 2004-08-26 | Attensity Corporation | Integrated data products of processes of integrating mixed format data |
US7233938B2 (en) * | 2002-12-27 | 2007-06-19 | Dictaphone Corporation | Systems and methods for coding information |
US8065277B1 (en) | 2003-01-17 | 2011-11-22 | Daniel John Gardner | System and method for a data extraction and backup database |
US8375008B1 (en) | 2003-01-17 | 2013-02-12 | Robert Gomes | Method and system for enterprise-wide retention of digital or electronic data |
US8943024B1 (en) | 2003-01-17 | 2015-01-27 | Daniel John Gardner | System and method for data de-duplication |
US8630984B1 (en) | 2003-01-17 | 2014-01-14 | Renew Data Corp. | System and method for data extraction from email files |
US7606714B2 (en) * | 2003-02-11 | 2009-10-20 | Microsoft Corporation | Natural language classification within an automated response system |
US20040162724A1 (en) * | 2003-02-11 | 2004-08-19 | Jeffrey Hill | Management of conversations |
US7783614B2 (en) | 2003-02-13 | 2010-08-24 | Microsoft Corporation | Linking elements of a document to corresponding fields, queries and/or procedures in a database |
US20040172287A1 (en) * | 2003-02-19 | 2004-09-02 | O'toole Michael | Method and apparatus for obtaining and distributing healthcare information |
US7958443B2 (en) | 2003-02-28 | 2011-06-07 | Dictaphone Corporation | System and method for structuring speech recognized text into a pre-selected document format |
US20040172584A1 (en) * | 2003-02-28 | 2004-09-02 | Microsoft Corporation | Method and system for enhancing paste functionality of a computer software application |
AU2003277247A1 (en) * | 2003-02-28 | 2004-09-28 | Lockheed Martin Corporation | Hardware accelerator state table compiler |
US20040243552A1 (en) * | 2003-05-30 | 2004-12-02 | Dictaphone Corporation | Method, system, and apparatus for viewing data |
US20040243545A1 (en) * | 2003-05-29 | 2004-12-02 | Dictaphone Corporation | Systems and methods utilizing natural language medical records |
US8095544B2 (en) * | 2003-05-30 | 2012-01-10 | Dictaphone Corporation | Method, system, and apparatus for validation |
US8290958B2 (en) * | 2003-05-30 | 2012-10-16 | Dictaphone Corporation | Method, system, and apparatus for data reuse |
US7711550B1 (en) | 2003-04-29 | 2010-05-04 | Microsoft Corporation | Methods and system for recognizing names in a computer-generated document and for providing helpful actions associated with recognized names |
US8495002B2 (en) * | 2003-05-06 | 2013-07-23 | International Business Machines Corporation | Software tool for training and testing a knowledge base |
US20050187913A1 (en) * | 2003-05-06 | 2005-08-25 | Yoram Nelken | Web-based customer service interface |
US8504380B2 (en) * | 2003-06-05 | 2013-08-06 | Medidata Solutions, Inc. | Assistance for clinical trial protocols |
WO2005001651A2 (en) * | 2003-06-23 | 2005-01-06 | Wms Gaming Inc. | Gaming network environment providing a cashless gaming service |
US7739588B2 (en) * | 2003-06-27 | 2010-06-15 | Microsoft Corporation | Leveraging markup language data for semantically labeling text strings and data and for providing actions based on semantically labeled text strings and data |
US7296027B2 (en) | 2003-08-06 | 2007-11-13 | Sbc Knowledge Ventures, L.P. | Rhetorical content management with tone and audience profiles |
US7860717B2 (en) * | 2003-09-25 | 2010-12-28 | Dictaphone Corporation | System and method for customizing speech recognition input and output |
US20050120300A1 (en) * | 2003-09-25 | 2005-06-02 | Dictaphone Corporation | Method, system, and apparatus for assembly, transport and display of clinical data |
US7542909B2 (en) * | 2003-09-30 | 2009-06-02 | Dictaphone Corporation | Method, system, and apparatus for repairing audio recordings |
US8024176B2 (en) * | 2003-09-30 | 2011-09-20 | Dictaphone Corporation | System, method and apparatus for prediction using minimal affix patterns |
US7818308B2 (en) * | 2003-10-01 | 2010-10-19 | Nuance Communications, Inc. | System and method for document section segmentation |
US20050144184A1 (en) * | 2003-10-01 | 2005-06-30 | Dictaphone Corporation | System and method for document section segmentation |
US7774196B2 (en) * | 2003-10-01 | 2010-08-10 | Dictaphone Corporation | System and method for modifying a language model and post-processor information |
US7996223B2 (en) * | 2003-10-01 | 2011-08-09 | Dictaphone Corporation | System and method for post processing speech recognition output |
US20050131725A1 (en) * | 2003-10-14 | 2005-06-16 | Gretchen Sleeper | Mapping algorithm for identifying data required to file for state and federal tax credits related to enterprise zones, renewal communities, and empowerment zones |
US7315852B2 (en) * | 2003-10-31 | 2008-01-01 | International Business Machines Corporation | XPath containment for index and materialized view matching |
US20050108316A1 (en) * | 2003-11-18 | 2005-05-19 | Sbc Knowledge Ventures, L.P. | Methods and systems for organizing related communications |
WO2005050474A2 (en) | 2003-11-21 | 2005-06-02 | Philips Intellectual Property & Standards Gmbh | Text segmentation and label assignment with user interaction by means of topic specific language models and topic-specific label statistics |
US7404195B1 (en) | 2003-12-09 | 2008-07-22 | Microsoft Corporation | Programmable object model for extensible markup language markup in an application |
US7178102B1 (en) | 2003-12-09 | 2007-02-13 | Microsoft Corporation | Representing latent data in an extensible markup language document |
US7434157B2 (en) | 2003-12-09 | 2008-10-07 | Microsoft Corporation | Programmable object model for namespace or schema library support in a software application |
US7487515B1 (en) | 2003-12-09 | 2009-02-03 | Microsoft Corporation | Programmable object model for extensible markup language schema validation |
US7315811B2 (en) * | 2003-12-31 | 2008-01-01 | Dictaphone Corporation | System and method for accented modification of a language model |
US7509573B1 (en) | 2004-02-17 | 2009-03-24 | Microsoft Corporation | Anti-virus security information in an extensible markup language document |
US20050182617A1 (en) * | 2004-02-17 | 2005-08-18 | Microsoft Corporation | Methods and systems for providing automated actions on recognized text strings in a computer-generated document |
US7783474B2 (en) * | 2004-02-27 | 2010-08-24 | Nuance Communications, Inc. | System and method for generating a phrase pronunciation |
CA2498728A1 (en) * | 2004-02-27 | 2005-08-27 | Dictaphone Corporation | A system and method for normalization of a string of words |
US7415106B2 (en) * | 2004-03-09 | 2008-08-19 | Sbc Knowledge Ventures, Lp | Network-based voice activated auto-attendant service with B2B connectors |
US7899827B2 (en) * | 2004-03-09 | 2011-03-01 | International Business Machines Corporation | System and method for the indexing of organic chemical structures mined from text documents |
US20050203776A1 (en) * | 2004-03-15 | 2005-09-15 | Godwin Sharen A. | Method of identifying clinical trial participants |
US7379946B2 (en) * | 2004-03-31 | 2008-05-27 | Dictaphone Corporation | Categorization of information using natural language processing and predefined templates |
US20050223316A1 (en) * | 2004-04-01 | 2005-10-06 | Sun Microsystems, Inc. | Compiled document type definition verifier |
US7933763B2 (en) * | 2004-04-30 | 2011-04-26 | Mdl Information Systems, Gmbh | Method and software for extracting chemical data |
US20050273705A1 (en) * | 2004-06-08 | 2005-12-08 | Fortellio, Llc | Method and system for automatically creating network software applications |
US8335688B2 (en) | 2004-08-20 | 2012-12-18 | Multimodal Technologies, Llc | Document transcription system training |
US7584103B2 (en) * | 2004-08-20 | 2009-09-01 | Multimodal Technologies, Inc. | Automated extraction of semantic content and generation of a structured document from speech |
US8412521B2 (en) * | 2004-08-20 | 2013-04-02 | Multimodal Technologies, Llc | Discriminative training of document transcription system |
US7925658B2 (en) * | 2004-09-17 | 2011-04-12 | Actuate Corporation | Methods and apparatus for mapping a hierarchical data structure to a flat data structure for use in generating a report |
US7970600B2 (en) * | 2004-11-03 | 2011-06-28 | Microsoft Corporation | Using a first natural language parser to train a second parser |
US8677274B2 (en) * | 2004-11-10 | 2014-03-18 | Apple Inc. | Highlighting items for search results |
CN101031912A (en) * | 2004-11-12 | 2007-09-05 | 佳思腾软件公司 | Data processing device and data processing method |
US8756234B1 (en) * | 2004-11-16 | 2014-06-17 | The General Hospital Corporation | Information theory entropy reduction program |
US8069151B1 (en) | 2004-12-08 | 2011-11-29 | Chris Crafford | System and method for detecting incongruous or incorrect media in a data recovery process |
US20060168511A1 (en) * | 2005-01-21 | 2006-07-27 | International Business Machines Corporation | Method of passing information from a preprocessor to a parser |
US8527468B1 (en) | 2005-02-08 | 2013-09-03 | Renew Data Corp. | System and method for management of retention periods for content in a computing system |
US7657521B2 (en) * | 2005-04-15 | 2010-02-02 | General Electric Company | System and method for parsing medical data |
US20060277028A1 (en) * | 2005-06-01 | 2006-12-07 | Microsoft Corporation | Training a statistical parser on noisy data by filtering |
CA2614233A1 (en) * | 2005-07-05 | 2007-01-11 | Dictaphone Corporation | System and method for auto-reuse of document text |
KR100631086B1 (en) | 2005-07-22 | 2006-10-04 | 한국전자통신연구원 | Method and apparatus for text normalization using extensible markup language(xml) |
US7564999B2 (en) * | 2005-07-25 | 2009-07-21 | Carestream Health, Inc. | Method for identifying markers in radiographic images |
US7992085B2 (en) | 2005-09-26 | 2011-08-02 | Microsoft Corporation | Lightweight reference user interface |
US7788590B2 (en) * | 2005-09-26 | 2010-08-31 | Microsoft Corporation | Lightweight reference user interface |
US20070169021A1 (en) * | 2005-11-01 | 2007-07-19 | Siemens Medical Solutions Health Services Corporation | Report Generation System |
US7665016B2 (en) * | 2005-11-14 | 2010-02-16 | Sun Microsystems, Inc. | Method and apparatus for virtualized XML parsing |
US7725417B2 (en) * | 2006-02-09 | 2010-05-25 | Ebay Inc. | Method and system to analyze rules based on popular query coverage |
US8380698B2 (en) * | 2006-02-09 | 2013-02-19 | Ebay Inc. | Methods and systems to generate rules to identify data items |
US7849047B2 (en) | 2006-02-09 | 2010-12-07 | Ebay Inc. | Method and system to analyze domain rules based on domain coverage of the domain rules |
US9443333B2 (en) | 2006-02-09 | 2016-09-13 | Ebay Inc. | Methods and systems to communicate information |
US7739225B2 (en) * | 2006-02-09 | 2010-06-15 | Ebay Inc. | Method and system to analyze aspect rules based on domain coverage of an aspect-value pair |
US7640234B2 (en) * | 2006-02-09 | 2009-12-29 | Ebay Inc. | Methods and systems to communicate information |
US7949538B2 (en) | 2006-03-14 | 2011-05-24 | A-Life Medical, Inc. | Automated interpretation of clinical encounters with cultural cues |
US7610192B1 (en) * | 2006-03-22 | 2009-10-27 | Patrick William Jamieson | Process and system for high precision coding of free text documents against a standard lexicon |
US8731954B2 (en) * | 2006-03-27 | 2014-05-20 | A-Life Medical, Llc | Auditing the coding and abstracting of documents |
WO2007115095A2 (en) * | 2006-03-29 | 2007-10-11 | The Trustees Of Columbia University In The City Ofnew York | Systems and methods for using molecular networks in genetic linkage analysis of complex traits |
US7831423B2 (en) * | 2006-05-25 | 2010-11-09 | Multimodal Technologies, Inc. | Replacing text representing a concept with an alternate written form of the concept |
US8150827B2 (en) * | 2006-06-07 | 2012-04-03 | Renew Data Corp. | Methods for enhancing efficiency and cost effectiveness of first pass review of documents |
US20070299665A1 (en) * | 2006-06-22 | 2007-12-27 | Detlef Koll | Automatic Decision Support |
US10796390B2 (en) * | 2006-07-03 | 2020-10-06 | 3M Innovative Properties Company | System and method for medical coding of vascular interventional radiology procedures |
US8346555B2 (en) * | 2006-08-22 | 2013-01-01 | Nuance Communications, Inc. | Automatic grammar tuning using statistical language model generation |
US20080126385A1 (en) * | 2006-09-19 | 2008-05-29 | Microsoft Corporation | Intelligent batching of electronic data interchange messages |
US20080071806A1 (en) * | 2006-09-20 | 2008-03-20 | Microsoft Corporation | Difference analysis for electronic data interchange (edi) data dictionary |
US8108767B2 (en) * | 2006-09-20 | 2012-01-31 | Microsoft Corporation | Electronic data interchange transaction set definition based instance editing |
US20080126386A1 (en) * | 2006-09-20 | 2008-05-29 | Microsoft Corporation | Translation of electronic data interchange messages to extensible markup language representation(s) |
US8161078B2 (en) * | 2006-09-20 | 2012-04-17 | Microsoft Corporation | Electronic data interchange (EDI) data dictionary management and versioning system |
US20080120142A1 (en) * | 2006-11-20 | 2008-05-22 | Vivalog Llc | Case management for image-based training, decision support, and consultation |
US20080140722A1 (en) * | 2006-11-20 | 2008-06-12 | Vivalog Llc | Interactive viewing, asynchronous retrieval, and annotation of medical images |
US20080168081A1 (en) * | 2007-01-09 | 2008-07-10 | Microsoft Corporation | Extensible schemas and party configurations for edi document generation or validation |
US20080168109A1 (en) * | 2007-01-09 | 2008-07-10 | Microsoft Corporation | Automatic map updating based on schema changes |
WO2008112548A1 (en) * | 2007-03-09 | 2008-09-18 | The Trustees Of Columbia University In The City Of New York | Methods and system for extracting phenotypic information from the literature via natural language processing |
US7945438B2 (en) * | 2007-04-02 | 2011-05-17 | International Business Machines Corporation | Automated glossary creation |
US7908552B2 (en) | 2007-04-13 | 2011-03-15 | A-Life Medical Inc. | Mere-parsing with boundary and semantic driven scoping |
US8682823B2 (en) | 2007-04-13 | 2014-03-25 | A-Life Medical, Llc | Multi-magnitudinal vectors with resolution based on source vector features |
US7895189B2 (en) * | 2007-06-28 | 2011-02-22 | International Business Machines Corporation | Index exploitation |
US8086597B2 (en) * | 2007-06-28 | 2011-12-27 | International Business Machines Corporation | Between matching |
US9946846B2 (en) | 2007-08-03 | 2018-04-17 | A-Life Medical, Llc | Visualizing the documentation and coding of surgical procedures |
US20090048866A1 (en) * | 2007-08-17 | 2009-02-19 | Prakash Mahesh | Rules-Based System For Routing Evidence and Recommendation Information to Patients and Physicians By a Specialist Based on Mining Report Text |
US8654139B2 (en) * | 2007-08-29 | 2014-02-18 | Mckesson Technologies Inc. | Methods and systems to transmit, view, and manipulate medical images in a general purpose viewing agent |
US8239455B2 (en) * | 2007-09-07 | 2012-08-07 | Siemens Aktiengesellschaft | Collaborative data and knowledge integration |
US8868479B2 (en) * | 2007-09-28 | 2014-10-21 | Telogis, Inc. | Natural language parsers to normalize addresses for geocoding |
US20090093686A1 (en) * | 2007-10-08 | 2009-04-09 | Xiao Hu | Multi Automated Severity Scoring |
EP2211688A4 (en) * | 2007-10-08 | 2012-01-11 | Univ California Ucla Office Of Intellectual Property | Generation and dissemination of automatically pre-populated clinical notes |
US20090132285A1 (en) * | 2007-10-31 | 2009-05-21 | Mckesson Information Solutions Llc | Methods, computer program products, apparatuses, and systems for interacting with medical data objects |
US8520978B2 (en) * | 2007-10-31 | 2013-08-27 | Mckesson Technologies Inc. | Methods, computer program products, apparatuses, and systems for facilitating viewing and manipulation of an image on a client device |
US20090192822A1 (en) * | 2007-11-05 | 2009-07-30 | Medquist Inc. | Methods and computer program products for natural language processing framework to assist in the evaluation of medical care |
KR100966590B1 (en) * | 2007-12-11 | 2010-06-29 | 한국전자통신연구원 | Method and system for collaborating of physiological signal measure devices |
US8615490B1 (en) | 2008-01-31 | 2013-12-24 | Renew Data Corp. | Method and system for restoring information from backup storage media |
US9864838B2 (en) * | 2008-02-20 | 2018-01-09 | Medicomp Systems, Inc. | Clinically intelligent parsing |
US20090217194A1 (en) * | 2008-02-24 | 2009-08-27 | Neil Martin | Intelligent Dashboards |
US20100057646A1 (en) * | 2008-02-24 | 2010-03-04 | Martin Neil A | Intelligent Dashboards With Heuristic Learning |
US8510126B2 (en) * | 2008-02-24 | 2013-08-13 | The Regents Of The University Of California | Patient monitoring |
US8924881B2 (en) * | 2008-02-24 | 2014-12-30 | The Regents Of The University Of California | Drill down clinical information dashboard |
US9424339B2 (en) | 2008-08-15 | 2016-08-23 | Athena A. Smyros | Systems and methods utilizing a search engine |
US7742933B1 (en) * | 2009-03-24 | 2010-06-22 | Harrogate Holdings | Method and system for maintaining HIPAA patient privacy requirements during auditing of electronic patient medical records |
US8634677B2 (en) * | 2009-03-30 | 2014-01-21 | The Regents Of The University Of California | PACS optimization techniques |
US8600772B2 (en) * | 2009-05-28 | 2013-12-03 | 3M Innovative Properties Company | Systems and methods for interfacing with healthcare organization coding system |
US10586616B2 (en) * | 2009-05-28 | 2020-03-10 | 3M Innovative Properties Company | Systems and methods for generating subsets of electronic healthcare-related documents |
US20100305969A1 (en) * | 2009-05-28 | 2010-12-02 | 3M Innovative Properties Company | Systems and methods for generating subsets of electronic healthcare-related documents |
US20110009731A1 (en) | 2009-07-08 | 2011-01-13 | Fonar Corporation | Method and system for performing upright magnetic resonance imaging of various anatomical and physiological conditions |
US9317256B2 (en) | 2009-11-24 | 2016-04-19 | International Business Machines Corporation | Identifying syntaxes of disparate components of a computer-to-computer message |
WO2011072172A1 (en) * | 2009-12-09 | 2011-06-16 | Renew Data Corp. | System and method for quickly determining a subset of irrelevant data from large data content |
WO2011075610A1 (en) | 2009-12-16 | 2011-06-23 | Renew Data Corp. | System and method for creating a de-duplicated data set |
US9773056B1 (en) * | 2010-03-23 | 2017-09-26 | Intelligent Language, LLC | Object location and processing |
US8832079B2 (en) * | 2010-04-05 | 2014-09-09 | Mckesson Financial Holdings | Methods, apparatuses, and computer program products for facilitating searching |
US8463673B2 (en) * | 2010-09-23 | 2013-06-11 | Mmodal Ip Llc | User feedback in semi-automatic question answering systems |
US8959102B2 (en) | 2010-10-08 | 2015-02-17 | Mmodal Ip Llc | Structured searching of dynamic structured document corpuses |
BR112013015641A2 (en) * | 2010-12-23 | 2016-10-11 | Koninkl Philips Electronics Nv | system and method for automatically extracting a site of an abnormality from an anatomical structure from a computer program report, workstation, and product to be loaded by a computer array |
US8799021B2 (en) | 2011-02-18 | 2014-08-05 | Nuance Communications, Inc. | Methods and apparatus for analyzing specificity in clinical documentation |
US10032127B2 (en) | 2011-02-18 | 2018-07-24 | Nuance Communications, Inc. | Methods and apparatus for determining a clinician's intent to order an item |
US10460288B2 (en) | 2011-02-18 | 2019-10-29 | Nuance Communications, Inc. | Methods and apparatus for identifying unspecified diagnoses in clinical documentation |
US8694335B2 (en) | 2011-02-18 | 2014-04-08 | Nuance Communications, Inc. | Methods and apparatus for applying user corrections to medical fact extraction |
US9916420B2 (en) | 2011-02-18 | 2018-03-13 | Nuance Communications, Inc. | Physician and clinical documentation specialist workflow integration |
US8768723B2 (en) | 2011-02-18 | 2014-07-01 | Nuance Communications, Inc. | Methods and apparatus for formatting text for clinical fact extraction |
US9679107B2 (en) | 2011-02-18 | 2017-06-13 | Nuance Communications, Inc. | Physician and clinical documentation specialist workflow integration |
US8788289B2 (en) | 2011-02-18 | 2014-07-22 | Nuance Communications, Inc. | Methods and apparatus for linking extracted clinical facts to text |
US9904768B2 (en) | 2011-02-18 | 2018-02-27 | Nuance Communications, Inc. | Methods and apparatus for presenting alternative hypotheses for medical facts |
US8738403B2 (en) | 2011-02-18 | 2014-05-27 | Nuance Communications, Inc. | Methods and apparatus for updating text in clinical documentation |
US9412369B2 (en) * | 2011-06-17 | 2016-08-09 | Microsoft Technology Licensing, Llc | Automated adverse drug event alerts |
GB2506807A (en) * | 2011-07-29 | 2014-04-09 | Trustees Of Columbia In The City Of New York | System and method for language extraction and encoding |
US8949111B2 (en) | 2011-12-14 | 2015-02-03 | Brainspace Corporation | System and method for identifying phrases in text |
US8793199B2 (en) | 2012-02-29 | 2014-07-29 | International Business Machines Corporation | Extraction of information from clinical reports |
US20130311207A1 (en) * | 2012-05-17 | 2013-11-21 | Innodata Synodex, Llc | Medical Record Processing |
KR101416712B1 (en) * | 2012-07-12 | 2014-07-09 | 김영근 | Method For Implementation Of XML Document With Formal Data and Informal Data |
EP2883203B1 (en) | 2012-08-13 | 2018-10-03 | MModal IP LLC | Maintaining a discrete data representation that corresponds to information contained in free-form text |
US9710431B2 (en) | 2012-08-18 | 2017-07-18 | Health Fidelity, Inc. | Systems and methods for processing patient information |
US9460069B2 (en) | 2012-10-19 | 2016-10-04 | International Business Machines Corporation | Generation of test data using text analytics |
US8478584B1 (en) * | 2012-11-06 | 2013-07-02 | AskZiggy, Inc. | Method and system for domain-optimized semantic tagging and task execution using task classification encoding |
US8996353B2 (en) * | 2013-02-08 | 2015-03-31 | Machine Zone, Inc. | Systems and methods for multi-user multi-lingual communications |
US8990068B2 (en) | 2013-02-08 | 2015-03-24 | Machine Zone, Inc. | Systems and methods for multi-user multi-lingual communications |
US9031829B2 (en) | 2013-02-08 | 2015-05-12 | Machine Zone, Inc. | Systems and methods for multi-user multi-lingual communications |
US9298703B2 (en) | 2013-02-08 | 2016-03-29 | Machine Zone, Inc. | Systems and methods for incentivizing user feedback for translation processing |
US10650103B2 (en) | 2013-02-08 | 2020-05-12 | Mz Ip Holdings, Llc | Systems and methods for incentivizing user feedback for translation processing |
US9600473B2 (en) | 2013-02-08 | 2017-03-21 | Machine Zone, Inc. | Systems and methods for multi-user multi-lingual communications |
US8996352B2 (en) | 2013-02-08 | 2015-03-31 | Machine Zone, Inc. | Systems and methods for correcting translations in multi-user multi-lingual communications |
US9231898B2 (en) | 2013-02-08 | 2016-01-05 | Machine Zone, Inc. | Systems and methods for multi-user multi-lingual communications |
US8996355B2 (en) | 2013-02-08 | 2015-03-31 | Machine Zone, Inc. | Systems and methods for reviewing histories of text messages from multi-user multi-lingual communications |
US9218568B2 (en) | 2013-03-15 | 2015-12-22 | Business Objects Software Ltd. | Disambiguating data using contextual and historical information |
US9262550B2 (en) | 2013-03-15 | 2016-02-16 | Business Objects Software Ltd. | Processing semi-structured data |
US9299041B2 (en) | 2013-03-15 | 2016-03-29 | Business Objects Software Ltd. | Obtaining data from unstructured data for a structured data collection |
US8688447B1 (en) | 2013-08-21 | 2014-04-01 | Ask Ziggy, Inc. | Method and system for domain-specific noisy channel natural language processing (NLP) |
WO2015035193A1 (en) | 2013-09-05 | 2015-03-12 | A-Life Medical, Llc | Automated clinical indicator recognition with natural language processing |
US10133727B2 (en) | 2013-10-01 | 2018-11-20 | A-Life Medical, Llc | Ontologically driven procedure coding |
GB2542288A (en) | 2014-04-25 | 2017-03-15 | Mayo Foundation | Enhancing reading accuracy, efficiency and retention |
US10162811B2 (en) | 2014-10-17 | 2018-12-25 | Mz Ip Holdings, Llc | Systems and methods for language detection |
US9372848B2 (en) | 2014-10-17 | 2016-06-21 | Machine Zone, Inc. | Systems and methods for language detection |
US20160162467A1 (en) * | 2014-12-09 | 2016-06-09 | Idibon, Inc. | Methods and systems for language-agnostic machine learning in natural language processing using feature extraction |
US9678941B2 (en) | 2014-12-23 | 2017-06-13 | International Business Machines Corporation | Domain-specific computational lexicon formation |
US10490306B2 (en) | 2015-02-20 | 2019-11-26 | Cerner Innovation, Inc. | Medical information translation system |
US10332511B2 (en) | 2015-07-24 | 2019-06-25 | International Business Machines Corporation | Processing speech to text queries by optimizing conversion of speech queries to text |
US10180989B2 (en) | 2015-07-24 | 2019-01-15 | International Business Machines Corporation | Generating and executing query language statements from natural language |
US10765956B2 (en) | 2016-01-07 | 2020-09-08 | Machine Zone Inc. | Named entity recognition on chat data |
US10042613B2 (en) * | 2016-08-19 | 2018-08-07 | International Business Machines Corporation | System, method, and recording medium for validating computer documentation |
CN107977368B (en) * | 2016-10-21 | 2021-12-10 | 京东方科技集团股份有限公司 | Information extraction method and system |
WO2019060353A1 (en) | 2017-09-21 | 2019-03-28 | Mz Ip Holdings, Llc | System and method for translating chat messages |
US10923232B2 (en) | 2018-01-09 | 2021-02-16 | Healthcare Interactive, Inc. | System and method for improving the speed of determining a health risk profile of a patient |
US20200183936A1 (en) * | 2018-12-10 | 2020-06-11 | Teradata Us, Inc. | Predictive query parsing time and optimization |
US11263396B2 (en) * | 2019-01-09 | 2022-03-01 | Woodpecker Technologies, LLC | System and method for document conversion to a template |
US11471729B2 (en) | 2019-03-11 | 2022-10-18 | Rom Technologies, Inc. | System, method and apparatus for a rehabilitation machine with a simulated flywheel |
US11185735B2 (en) | 2019-03-11 | 2021-11-30 | Rom Technologies, Inc. | System, method and apparatus for adjustable pedal crank |
US20200289889A1 (en) | 2019-03-11 | 2020-09-17 | Rom Technologies, Inc. | Bendable sensor device for monitoring joint extension and flexion |
US11157475B1 (en) * | 2019-04-26 | 2021-10-26 | Bank Of America Corporation | Generating machine learning models for understanding sentence context |
US11801423B2 (en) | 2019-05-10 | 2023-10-31 | Rehab2Fit Technologies, Inc. | Method and system for using artificial intelligence to interact with a user of an exercise device during an exercise session |
US11433276B2 (en) | 2019-05-10 | 2022-09-06 | Rehab2Fit Technologies, Inc. | Method and system for using artificial intelligence to independently adjust resistance of pedals based on leg strength |
US11904207B2 (en) | 2019-05-10 | 2024-02-20 | Rehab2Fit Technologies, Inc. | Method and system for using artificial intelligence to present a user interface representing a user's progress in various domains |
US11957960B2 (en) | 2019-05-10 | 2024-04-16 | Rehab2Fit Technologies Inc. | Method and system for using artificial intelligence to adjust pedal resistance |
US11599720B2 (en) * | 2019-07-29 | 2023-03-07 | Shl (India) Private Limited | Machine learning models for electronic messages analysis |
US11071597B2 (en) | 2019-10-03 | 2021-07-27 | Rom Technologies, Inc. | Telemedicine for orthopedic treatment |
US11701548B2 (en) | 2019-10-07 | 2023-07-18 | Rom Technologies, Inc. | Computer-implemented questionnaire for orthopedic treatment |
USD928635S1 (en) | 2019-09-18 | 2021-08-24 | Rom Technologies, Inc. | Goniometer |
US20210142893A1 (en) | 2019-10-03 | 2021-05-13 | Rom Technologies, Inc. | System and method for processing medical claims |
US20210134412A1 (en) | 2019-10-03 | 2021-05-06 | Rom Technologies, Inc. | System and method for processing medical claims using biometric signatures |
US11887717B2 (en) | 2019-10-03 | 2024-01-30 | Rom Technologies, Inc. | System and method for using AI, machine learning and telemedicine to perform pulmonary rehabilitation via an electromechanical machine |
US11282608B2 (en) | 2019-10-03 | 2022-03-22 | Rom Technologies, Inc. | Method and system for using artificial intelligence and machine learning to provide recommendations to a healthcare provider in or near real-time during a telemedicine session |
US20210134432A1 (en) | 2019-10-03 | 2021-05-06 | Rom Technologies, Inc. | Method and system for implementing dynamic treatment environments based on patient information |
US20210134425A1 (en) | 2019-10-03 | 2021-05-06 | Rom Technologies, Inc. | System and method for using artificial intelligence in telemedicine-enabled hardware to optimize rehabilitative routines capable of enabling remote rehabilitative compliance |
US11955220B2 (en) | 2019-10-03 | 2024-04-09 | Rom Technologies, Inc. | System and method for using AI/ML and telemedicine for invasive surgical treatment to determine a cardiac treatment plan that uses an electromechanical machine |
US11923065B2 (en) | 2019-10-03 | 2024-03-05 | Rom Technologies, Inc. | Systems and methods for using artificial intelligence and machine learning to detect abnormal heart rhythms of a user performing a treatment plan with an electromechanical machine |
US11515028B2 (en) | 2019-10-03 | 2022-11-29 | Rom Technologies, Inc. | Method and system for using artificial intelligence and machine learning to create optimal treatment plans based on monetary value amount generated and/or patient outcome |
US11515021B2 (en) | 2019-10-03 | 2022-11-29 | Rom Technologies, Inc. | Method and system to analytically optimize telehealth practice-based billing processes and revenue while enabling regulatory compliance |
US20210134463A1 (en) | 2019-10-03 | 2021-05-06 | Rom Technologies, Inc. | Systems and methods for remotely-enabled identification of a user infection |
US11915816B2 (en) | 2019-10-03 | 2024-02-27 | Rom Technologies, Inc. | Systems and methods of using artificial intelligence and machine learning in a telemedical environment to predict user disease states |
US11955223B2 (en) | 2019-10-03 | 2024-04-09 | Rom Technologies, Inc. | System and method for using artificial intelligence and machine learning to provide an enhanced user interface presenting data pertaining to cardiac health, bariatric health, pulmonary health, and/or cardio-oncologic health for the purpose of performing preventative actions |
US11282604B2 (en) | 2019-10-03 | 2022-03-22 | Rom Technologies, Inc. | Method and system for use of telemedicine-enabled rehabilitative equipment for prediction of secondary disease |
US11955222B2 (en) | 2019-10-03 | 2024-04-09 | Rom Technologies, Inc. | System and method for determining, based on advanced metrics of actual performance of an electromechanical machine, medical procedure eligibility in order to ascertain survivability rates and measures of quality-of-life criteria |
US11101028B2 (en) | 2019-10-03 | 2021-08-24 | Rom Technologies, Inc. | Method and system using artificial intelligence to monitor user characteristics during a telemedicine session |
US11955221B2 (en) | 2019-10-03 | 2024-04-09 | Rom Technologies, Inc. | System and method for using AI/ML to generate treatment plans to stimulate preferred angiogenesis |
US11265234B2 (en) | 2019-10-03 | 2022-03-01 | Rom Technologies, Inc. | System and method for transmitting data and ordering asynchronous data |
US11139060B2 (en) | 2019-10-03 | 2021-10-05 | Rom Technologies, Inc. | Method and system for creating an immersive enhanced reality-driven exercise experience for a user |
US20210134458A1 (en) | 2019-10-03 | 2021-05-06 | Rom Technologies, Inc. | System and method to enable remote adjustment of a device during a telemedicine session |
US11830601B2 (en) | 2019-10-03 | 2023-11-28 | Rom Technologies, Inc. | System and method for facilitating cardiac rehabilitation among eligible users |
US11337648B2 (en) | 2020-05-18 | 2022-05-24 | Rom Technologies, Inc. | Method and system for using artificial intelligence to assign patients to cohorts and dynamically controlling a treatment apparatus based on the assignment during an adaptive telemedical session |
US20210127974A1 (en) | 2019-10-03 | 2021-05-06 | Rom Technologies, Inc. | Remote examination through augmented reality |
US20210128080A1 (en) | 2019-10-03 | 2021-05-06 | Rom Technologies, Inc. | Augmented reality placement of goniometer or other sensors |
US11075000B2 (en) | 2019-10-03 | 2021-07-27 | Rom Technologies, Inc. | Method and system for using virtual avatars associated with medical professionals during exercise sessions |
US11317975B2 (en) | 2019-10-03 | 2022-05-03 | Rom Technologies, Inc. | Method and system for treating patients via telemedicine using sensor data from rehabilitation or exercise equipment |
US11087865B2 (en) | 2019-10-03 | 2021-08-10 | Rom Technologies, Inc. | System and method for use of treatment device to reduce pain medication dependency |
US11915815B2 (en) | 2019-10-03 | 2024-02-27 | Rom Technologies, Inc. | System and method for using artificial intelligence and machine learning and generic risk factors to improve cardiovascular health such that the need for additional cardiac interventions is mitigated |
US11325005B2 (en) | 2019-10-03 | 2022-05-10 | Rom Technologies, Inc. | Systems and methods for using machine learning to control an electromechanical device used for prehabilitation, rehabilitation, and/or exercise |
US11282599B2 (en) | 2019-10-03 | 2022-03-22 | Rom Technologies, Inc. | System and method for use of telemedicine-enabled rehabilitative hardware and for encouragement of rehabilitative compliance through patient-based virtual shared sessions |
US11069436B2 (en) | 2019-10-03 | 2021-07-20 | Rom Technologies, Inc. | System and method for use of telemedicine-enabled rehabilitative hardware and for encouraging rehabilitative compliance through patient-based virtual shared sessions with patient-enabled mutual encouragement across simulated social networks |
US11961603B2 (en) | 2019-10-03 | 2024-04-16 | Rom Technologies, Inc. | System and method for using AI ML and telemedicine to perform bariatric rehabilitation via an electromechanical machine |
US11270795B2 (en) | 2019-10-03 | 2022-03-08 | Rom Technologies, Inc. | Method and system for enabling physician-smart virtual conference rooms for use in a telehealth context |
US11756666B2 (en) | 2019-10-03 | 2023-09-12 | Rom Technologies, Inc. | Systems and methods to enable communication detection between devices and performance of a preventative action |
US11826613B2 (en) | 2019-10-21 | 2023-11-28 | Rom Technologies, Inc. | Persuasive motivation for orthopedic treatment |
USD907143S1 (en) | 2019-12-17 | 2021-01-05 | Rom Technologies, Inc. | Rehabilitation device |
US11107591B1 (en) * | 2020-04-23 | 2021-08-31 | Rom Technologies, Inc. | Method and system for describing and recommending optimal treatment plans in adaptive telemedical or other contexts |
US20220261538A1 (en) * | 2021-02-17 | 2022-08-18 | Inteliquet, Inc. | Skipping natural language processor |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5799268A (en) * | 1994-09-28 | 1998-08-25 | Apple Computer, Inc. | Method for extracting knowledge from online documentation and creating a glossary, index, help database or the like |
US5832496A (en) * | 1995-10-12 | 1998-11-03 | Ncr Corporation | System and method for performing intelligent analysis of a computer database |
US6038668A (en) * | 1997-09-08 | 2000-03-14 | Science Applications International Corporation | System, method, and medium for retrieving, organizing, and utilizing networked data |
US6055494A (en) * | 1996-10-28 | 2000-04-25 | The Trustees Of Columbia University In The City Of New York | System and method for medical language extraction and encoding |
US6076088A (en) * | 1996-02-09 | 2000-06-13 | Paik; Woojin | Information extraction system and method using concept relation concept (CRC) triples |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4358824A (en) * | 1979-12-28 | 1982-11-09 | International Business Machines Corporation | Office correspondence storage and retrieval system |
EP0280866A3 (en) * | 1987-03-03 | 1992-07-08 | International Business Machines Corporation | Computer method for automatic extraction of commonly specified information from business correspondence |
US5708825A (en) * | 1995-05-26 | 1998-01-13 | Iconovex Corporation | Automatic summary page creation and hyperlink generation |
US5920854A (en) * | 1996-08-14 | 1999-07-06 | Infoseek Corporation | Real-time document collection search engine with phrase indexing |
EP0864988A1 (en) * | 1997-03-11 | 1998-09-16 | Matsushita Electric Industrial Co., Ltd. | Document management system and document management method |
WO1999005614A1 (en) * | 1997-07-23 | 1999-02-04 | Datops S.A. | Information mining tool |
-
1999
- 1999-08-06 US US09/370,329 patent/US6182029B1/en not_active Expired - Lifetime
-
2000
- 2000-08-04 AU AU65263/00A patent/AU773723B2/en not_active Ceased
- 2000-08-04 GB GB0203590A patent/GB2368432B/en not_active Expired - Fee Related
- 2000-08-04 CA CA2381251A patent/CA2381251C/en not_active Expired - Fee Related
- 2000-08-04 WO PCT/US2000/021515 patent/WO2001011492A1/en active IP Right Grant
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5799268A (en) * | 1994-09-28 | 1998-08-25 | Apple Computer, Inc. | Method for extracting knowledge from online documentation and creating a glossary, index, help database or the like |
US5832496A (en) * | 1995-10-12 | 1998-11-03 | Ncr Corporation | System and method for performing intelligent analysis of a computer database |
US6076088A (en) * | 1996-02-09 | 2000-06-13 | Paik; Woojin | Information extraction system and method using concept relation concept (CRC) triples |
US6055494A (en) * | 1996-10-28 | 2000-04-25 | The Trustees Of Columbia University In The City Of New York | System and method for medical language extraction and encoding |
US6038668A (en) * | 1997-09-08 | 2000-03-14 | Science Applications International Corporation | System, method, and medium for retrieving, organizing, and utilizing networked data |
Non-Patent Citations (1)
Title |
---|
FRIEDMAN C. ET AL.: "Natural language processing in an operational clinical information system", NATURAL LANGUAGE ENGINEERING, vol. 1, no. 1, May 1995 (1995-05-01), pages 83 - 108, XP002932997 * |
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7509572B1 (en) * | 1999-07-16 | 2009-03-24 | Oracle International Corporation | Automatic generation of document summaries through use of structured text |
US8032827B2 (en) | 1999-07-16 | 2011-10-04 | Oracle International Corporation | Automatic generation of document summaries through use of structured text |
WO2003042859A3 (en) * | 2001-11-15 | 2003-09-18 | Forinnova As | Method and apparatus for textual exploration and discovery |
US8265925B2 (en) | 2001-11-15 | 2012-09-11 | Texturgy As | Method and apparatus for textual exploration discovery |
WO2003042859A2 (en) * | 2001-11-15 | 2003-05-22 | Forinnova As | Method and apparatus for textual exploration and discovery |
WO2005050475A1 (en) * | 2003-11-21 | 2005-06-02 | Agency For Science, Technology And Research | Method and system for validating the content of technical documents |
GB2424103A (en) * | 2003-11-21 | 2006-09-13 | Agency Science Tech & Res | Method and system for validating the content of technical documents |
EP1803076A4 (en) * | 2004-10-20 | 2008-03-05 | Motorola Inc | An electronic device and method for visual text interpretation |
EP1803076A2 (en) * | 2004-10-20 | 2007-07-04 | Motorola, Inc. | An electronic device and method for visual text interpretation |
US7650573B2 (en) | 2005-08-11 | 2010-01-19 | Microsoft Corporation | Layout rules for whitespace sensitive literals |
WO2008042716A3 (en) * | 2006-09-29 | 2008-07-10 | Agiledelta Inc | Knowledge based encoding of data with multiplexing to facilitate compression |
WO2008042716A2 (en) * | 2006-09-29 | 2008-04-10 | Agiledelta, Inc. | Knowledge based encoding of data with multiplexing to facilitate compression |
US8120515B2 (en) | 2006-09-29 | 2012-02-21 | Agiledelta, Inc. | Knowledge based encoding of data with multiplexing to facilitate compression |
TWI406199B (en) * | 2009-02-17 | 2013-08-21 | Univ Nat Yunlin Sci & Tech | Online system and method for reading text |
US9152623B2 (en) | 2012-11-02 | 2015-10-06 | Fido Labs, Inc. | Natural language processing system and method |
US9800536B2 (en) | 2015-03-05 | 2017-10-24 | International Business Machines Corporation | Automated document lifecycle management |
US10956670B2 (en) | 2018-03-03 | 2021-03-23 | Samurai Labs Sp. Z O.O. | System and method for detecting undesirable and potentially harmful online behavior |
US11151318B2 (en) | 2018-03-03 | 2021-10-19 | SAMURAI LABS sp. z. o.o. | System and method for detecting undesirable and potentially harmful online behavior |
US11507745B2 (en) | 2018-03-03 | 2022-11-22 | Samurai Labs Sp. Z O.O. | System and method for detecting undesirable and potentially harmful online behavior |
US11663403B2 (en) | 2018-03-03 | 2023-05-30 | Samurai Labs Sp. Z O.O. | System and method for detecting undesirable and potentially harmful online behavior |
Also Published As
Publication number | Publication date |
---|---|
CA2381251C (en) | 2011-02-15 |
US6182029B1 (en) | 2001-01-30 |
GB0203590D0 (en) | 2002-04-03 |
GB2368432B (en) | 2004-05-19 |
AU6526300A (en) | 2001-03-05 |
AU773723B2 (en) | 2004-06-03 |
CA2381251A1 (en) | 2001-02-15 |
GB2368432A (en) | 2002-05-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CA2381251C (en) | System and method for language extraction and encoding | |
Friedman et al. | Representing information in patient reports using natural language processing and the extensible markup language | |
US20220269865A1 (en) | System for knowledge acquisition | |
US6055494A (en) | System and method for medical language extraction and encoding | |
Mutalik et al. | Use of general-purpose negation detection to augment concept indexing of medical documents: a quantitative study using the UMLS | |
US7373597B2 (en) | Conversion of text data into a hypertext markup language | |
US7610192B1 (en) | Process and system for high precision coding of free text documents against a standard lexicon | |
US7233938B2 (en) | Systems and methods for coding information | |
Friedman et al. | Natural language processing in an operational clinical information system | |
CA2813608C (en) | Structured searching of dynamic structured document corpuses | |
US8442814B2 (en) | Conceptual world representation natural language understanding system and method | |
US20040168119A1 (en) | method and apparatus for creating a report | |
Kugler et al. | Translator’s workbench: Tools and terminology for translation and text processing | |
Abulaish et al. | A concept-driven biomedical knowledge extraction and visualization framework for conceptualization of text corpora | |
Hishiki et al. | Developing NLP tools for genome informatics: An information extraction perspective | |
Friedman | Semantic text parsing for patient records | |
Grover et al. | XML-based data preparation for robust deep parsing | |
JP2002534741A (en) | Method and apparatus for processing semi-structured text data | |
US20040117776A1 (en) | Type-specific objects from markup and web-oriented languages, and systems and methods therefor | |
Wang et al. | Radiology text analysis system (RadText): architecture and evaluation | |
Georg et al. | A document engineering environment for clinical guidelines | |
JP2004334382A (en) | Structured document summarizing apparatus, program, and recording medium | |
DeRose et al. | The TEI hypertext guidelines | |
Wilks et al. | LaSIE jumps the GATE | |
WO2001024053A9 (en) | System and method for automatic context creation for electronic documents |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2381251 Country of ref document: CA |
|
WWE | Wipo information: entry into national phase |
Ref document number: 65263/00 Country of ref document: AU |
|
ENP | Entry into the national phase |
Ref country code: GB Ref document number: 200203590 Kind code of ref document: A Format of ref document f/p: F |
|
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 10048686 Country of ref document: US |
|
122 | Ep: pct application non-entry in european phase | ||
NENP | Non-entry into the national phase |
Ref country code: JP |
|
WWG | Wipo information: grant in national office |
Ref document number: 65263/00 Country of ref document: AU |