US20120197910A1 - Method and system for performing classified document research - Google Patents
Method and system for performing classified document research Download PDFInfo
- Publication number
- US20120197910A1 US20120197910A1 US13/501,362 US201013501362A US2012197910A1 US 20120197910 A1 US20120197910 A1 US 20120197910A1 US 201013501362 A US201013501362 A US 201013501362A US 2012197910 A1 US2012197910 A1 US 2012197910A1
- Authority
- US
- United States
- Prior art keywords
- classification
- unique classification
- unique
- classification values
- search
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/93—Document management systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
Definitions
- the present invention relates to the field of document research, and more particularly to methods and systems for locating relevant classifications.
- Document research involves finding relevant subject matter within a set of documents as may be found in a document repository.
- Search engines for example, use “key” words or phrases as search arguments to locate text passages containing those words or phrases.
- Classification systems provide another means for assessing context. In a classification system, documents with common threads are grouped together in classes. A field of context, therefore, can be narrowed by selecting relevant classes.
- Patents and patent-related documentation databases are examples of database repositories that implement classification systems. The most commonly used classification system for patents and published patent applications, at least in the U.S., is the USPTO (United States Patent and Trademark Office) Patent Classification System. Two other classification systems in common usage on the international scene include: the “IPC” (International Patent Classification) and the “ECLA” (European Classification).
- the USPTO classification system currently comprises at least 984 classes and numerous digests (collections of certain subjects) within each class. Each class is broken into subclasses; each subclass may be further broken into subclasses and so on. Patents are thus grouped into categories, which are broken down into sub-categories, and sub-categories into more sub's, as required.
- the USPTO examiners decide the class/subclass in which to file a particular invention. To add further complexity, any one invention can be filed in more than one class/subclass, and most are filed in several classes/subclasses.
- U.S. Patent Application 20020022974 to Lindh shows a method for display of patent information that involves applying statistical analysis to groups of references containing classifications. Lindh does not show additional cross referencing to a search history in order to locate unsearched classifications, which again is important in progressively narrowing and focusing a search.
- U.S. Patent Application 20090313221 to Chen shows a patent technology association classification method. While Chen has shown the method of removing classifications and counting frequency, Chen fails to show the additional function of comparing classification frequencies to search histories, nor does Chen show additional broad and narrow reporting schemes for use at different stages of a patent search.
- Huang shows a technical classification method for searching patents, which includes generating counts from a group of references. The method shows the researcher a quality of a search, but falls short in that Huang does not assist the researcher in locating additional classification areas to search in a next iteration.
- U.S. Patent Application 20020073095 to Ohga shows a patent classification displaying method and apparatus having some similarities to the present invention.
- the apparatus provides a classification counting system, wherein the most frequently occurring codes are sorted to the top of the list.
- Other systems such as Thompson Delphion, have reporting features like this.
- Several critical components are however missing when viewed next to the present invention.
- the classification codes on the report should be cross referenced against a running tally of codes kept by the researcher in a given search project. With this additional function, the researcher sees not only relevant classifications, but also classifications that have not been searched yet.
- Ohga fails to show additional modes of class counting and weighting that are used at different stages of a patent research project, such that the researcher can use broad analysis in the beginning and narrow analysis during the iterative part of the patent search.
- U.S. Patent Application 20010027452 to Tropper shows a system and method to identify documents in a database which relate to a given document by using recursive searching and no keywords. While Tropper realizes the benefits of using latest search results to form new searches, he fails to teach the accumulation of classification codes, weighting the codes, ranking of the codes and then comparing the rankings to the researchers search history.
- the present invention provides a system and method for efficiently and accurately identifying relevant document classifications.
- the system receives one or more classified reference documents in a document set along with a relevancy indicator for each document.
- the system retrieves all document classifications from the document set, and arranges a classification analysis interface.
- the researcher has four modes for the interface, which are called: Main, Parents, Subclass, and Primary mode—wherein Main is the broadest and Primary is the narrowest.
- Main is the broadest and Primary is the narrowest.
- the researcher is provided GUI tools to select classification codes from the classification analysis interface, and add them to a classification search history which is stored along with the document set in a project file.
- the researcher uses the Main and the Parents mode during the first hour of the search project, and the Subclass mode for the remaining 3-4 hours.
- the Main mode the researcher is shown occurrence of main classes in the document set, which provides a broad base for class/text searching.
- Parents mode the researcher is shown common occurrence of parent sub-classifications of the document classifications, while the document classifications are not shown. With this information, the researcher can inspect child classifications of the parents in a classification schedule.
- the researcher uses the Subclass mode.
- the document classifications are collected, counted, scored, and sorted—providing the researcher quick viewing of potentially relevant classifications.
- the researcher locates potentially relevant classifications, he or she executes searches in the newly located classifications, and then adds documents along with relevancy indicators to the expanding document set.
- the researcher then re-executes Subclass Mode classification analysis on the document set.
- the classification analysis module scores classification codes and then cross references against the classification search history.
- the resulting classification analysis interface is displayed along with various sensory indicators (e.g. a color) that show the researcher relevant classifications that are 1) un-searched, 2) partially searched, or 3) fully searched. In this manner the researcher may quickly determine where a next iteration in the search project should be directed.
- the researcher may continuously iterate through the process of locating new classification areas, searching the new classification areas, augmenting the document set with new documents, and then using the classification analysis tool to locate additional unsearched classification areas.
- the researcher is encouraged to add many (ie. 50-100) documents to the project file using a document management interface to tag even moderately relevant documents for the purpose of utilizing many hundreds of classification codes in the scoring.
- the process continues until the top 5-10 classifications presented by the classification analysis interface are indicated as fully searched, at which point the search project can be brought to a close.
- important classification areas are very difficult to overlook, regardless of the experience level of the researcher.
- FIG. 1A is a block diagram illustrating a document research system in accordance with an exemplary embodiment of the invention.
- FIG. 1B is a sample of a document.
- FIG. 2A is an interface diagram in accordance with an exemplary embodiment of the invention.
- FIG. 2B is an interface diagram in accordance with an exemplary embodiment of the invention.
- FIG. 2C is an interface diagram in accordance with an exemplary embodiment of the invention.
- FIG. 2D is an interface diagram in accordance with an exemplary embodiment of the invention.
- FIG. 2E is an interface diagram in accordance with an exemplary embodiment of the invention.
- FIG. 2F is an interface diagram in accordance with an exemplary embodiment of the invention.
- FIG. 2G is an interface diagram in accordance with an exemplary embodiment of the invention.
- FIG. 2H is a diagram of a project file created and used by the present invention.
- FIG. 2I is an interface diagram in accordance with an exemplary embodiment of the invention.
- FIG. 2J is an interface diagram in accordance with an exemplary embodiment of the invention.
- FIG. 3A is a flow diagram illustrating a process that may be carried out in accordance with the exemplary system of FIG. 1A .
- FIG. 3B is a flow diagram illustrating a process that may be carried out in accordance with the exemplary system of FIG. 1A .
- FIG. 3C is a flow diagram illustrating a process that may be carried out in accordance with the exemplary system of FIG. 1A .
- FIG. 3D is a search indicator to sensory indicator color scheme table.
- FIG. 4 is a block diagram illustrating a document analysis system in accordance with another exemplary embodiment of the invention.
- FIG. 5 is a flow diagram illustrating a process that may be carried out in accordance with the exemplary system of FIG. 1A .
- the document analysis system 1000 comprises a client device 1010 , which may be a computer.
- the client device 1010 includes a classification analysis module 1012 , an interface module 1014 and a user Input/Output (I/O) interface 1018 .
- the client device 1010 may be a computing device having a processor such as personal computer, a phone, a mobile phone, or a personal digital assistant.
- the document analysis system 1000 may also comprise a document provider 1030 , a classification data provider 1040 and a network 1020 .
- the document provider 1030 is configured to deliver one or more documents, labeled generally as 1032 .
- the documents 1032 may be electronic files containing patent data or any type of electronic file that contains textual data. See FIG. 1B for an example of a document 1032 .
- the document 1032 has multiple document classifications 135 that are further divided into a class 136 and a subclass 137 .
- the body of the document is composed of multiple sections (eg. Abstract, description, claims), and that section are further divided into paragraphs 138 .
- the textual data of each document 1032 includes content data and one or more classification codes 135 .
- the document provider 1030 may be a remote server and may also include a search engine 1034 for retrieving the one or more documents 1032 from a document data repository (not shown) based on a search query.
- the search engine may be that provided by the United States Patent and Trademark Office (USPTO) FreePatentsOnLine, Micropatent®, Delphian®, PatentCafe®, Thompson Innovation or Google®.
- the document provider 1030 may retrieve the document data from a local repository or from one or more remote documents repositories. Examples of such a document repository include patent databases including those provided by EP (European patents), WO (PCT publications), JP (Japan abstracts) and DWPI (Derwent World Patent Index for patent families).
- the document provider 1030 may be cloud based bulk storage system, such as Amazon Simple Storage Service.
- the classification data provider 1040 is configured to provide access to a classification data repository 1042 .
- the classification data repository 1042 may be a database or file storage element that stores hierarchical classification data entries 1044 .
- Each classification data entry 1044 includes a classification code.
- Each classification data entry 1044 may also include a classification code description field.
- the classification data provider 1040 may be a remote server provided by the United States Patent and Trademark Office (USPTO).
- USPTO United States Patent and Trademark Office
- the classification data may be representative of a document classification system such as the Manual of Classification issued by the USPTO.
- the classification data provider may retrieve the document data from a local repository or from one or more remote documents repositories. It is noted that while shown as separate components, the document provider and classification data provider may be co-located on a single remote server.
- the interface module 1014 is configured to receive one or more documents 1032 from the document provider 1030 , and to retrieve classification data 1044 from the classification data provider 1040 by way of network 1020 .
- the network may be the Internet.
- the interface module 1014 may alternatively be configured to receive the documents 1032 or classification data 1044 through the user I/O interface 1018 .
- the documents 1032 may be stored on a portable storage device (not shown) such as a CD, DVD or solid state device and the user I/O interface 1018 may include a communications interface such as a wireless interface, a CD/DVD drive or a USB drive for retrieving data from the personal storage device.
- the documents 1032 may alternately be paper-based documents and may be provided to the interface module 1014 by use of a scanner (not shown) that is configured with the I/O interface 1018 .
- the client device 1010 may also include a data storage element 1016 , which may be at least one of a computer readable medium and a memory.
- the interface module 1014 may also be configured to receive a set of one or more concepts from a researcher by way of the I/O interface 1018 .
- the I/O interface 1018 may also include at least one input device such as a keyboard, mouse, microphone or a touch screen for receiving the concepts from the researcher.
- Each concept is comprised of one or more text-based keywords or sets of text-based keywords which are used to determine the relevancy each of the documents 1032 .
- the client device 1010 may alternatively include a document analysis module that generates statistical data based on the user-defined concepts and the documents 1032 .
- the statistical data may be used by the researcher to quickly assess the relevancy of each document 1032 to each of the user-defined concepts.
- the document analysis module may transmit the statistical data to the interface module 1014 which presents the data to the researcher by way of the I/O interface 1018 .
- the I/O interface 118 may also include a display such as an LCD or CRT monitor configured to display a graphical user interface (GUI) for presenting information such as the statistical data to the researcher.
- GUI graphical user interface
- FIG. 2A , FIG. 2B , and FIG. 2C , FIG. 2D , FIG. 2E , FIG. 2F and FIG. 2G diagrams are shown illustrating a document analysis GUI 200 in accordance with an exemplary embodiment of the invention.
- FIG. 3A which illustrates an exemplary method 1300 for performing classification-based document analysis will also be discussed.
- the interface module 1014 may receive concept data from the researcher.
- the interface module 1014 first generates a document analysis GUI 200 and displays the GUI 200 to the researcher by way of the display device included with user I/O interface 1018 . As shown in FIG.
- the document analysis GUI 200 includes a document relevance interface 220 , a document management interface 250 , and a document image window 254 .
- the researcher may start a research project by entering one more concepts 272 .
- Each concept 272 may have one or more words or word groups associated therewith.
- the document analysis GUI 200 includes a keyword entry interface 210 .
- the keyword entry interface 210 comprises multiple rows of alphanumeric entry fields 212 .
- One or more keywords 213 may be entered by a researcher into each entry field 212 , wherein each keyword 213 is conceptually related such that each line represents a keyword group 214 .
- the researcher is also provided with a user thesaurus 211 and web thesaurus 219 .
- the user thesaurus 211 can be edited and stored in the data storage element 1016 , and the web thesaurus 219 may be accessed through the network 1020 by the interface module 1014 .
- Five alphanumeric entry fields 212 are shown to be filled in FIG. 2B .
- Each concept 272 and corresponding keyword group 214 may be determined manually by the researcher or may be received from an external source.
- the concepts may be reduced to a manageable number of concepts (e.g. 4-5 concepts). Keywords 213 may then be chosen for each of the concepts and entered into one of the alphanumeric fields 212 to form the keyword group 214 .
- the researcher may then exit the keyword entry interface 210 and proceed to analysis of a set of documents based on the user-defined concepts.
- the interface module 1014 will receive one or more reference documents 1032 .
- the interface module 1014 is configured to receive one or more documents 1032 from the document provider 1030 by way of network 1020 .
- the interface module 1014 may be configured to allow the researcher to request a predetermined set of documents 1032 .
- the researcher may initiate a request for a specific set of patent documents or a set of patent documents that fall within a specific category or classification.
- the researcher may also initiate a search of a remote document repository through a search interface window 230 (shown in FIG. 2D ) provided by the document analysis GUI 200 .
- the search may be initiated by entering a set of search parameters, such as keywords, into one or more search fields 232 located on the search interface window 230 .
- Boolean operators, wildcards and proximity indicators may be used to link the keywords together in logic sets.
- the search interface window 230 may also provide a search assistance window 234 that allows the previously defined keywords 213 to be added to the set of search parameters in response to a single user action (e.g. a mouse click).
- the search assistance window 234 thereby facilitates the loading of search parameters into the one or more search fields 232 .
- the researcher is provided with a classification search history 290 , which contains a table for documenting the search project strategy (discussed in detail later). The researcher may pick classification codes from the classification search history 290 .
- the interface module 1014 may alternatively be configured to receive one or more documents 1032 through the user I/O interface 1018 .
- the documents 1032 may be stored on a portable storage device (not shown) such as a CD, DVD or solid state device and the user I/O interface 1018 may include a communications interface such as a wireless interface, a CD/DVD drive or a USB drive for retrieving data from the personal storage device.
- the interface module 1014 Upon receiving the one or more reference documents the interface module 1014 will populate a document management table 252 located on a document management interface 250 (shown in FIG. 2E ) with selectable rows 253 each having information descriptive of one of the received documents 1032 .
- each row may include a reference document number 255 and document title 256 .
- the interface module 1014 receives and stores data from the researcher that indicates relevancy of a currently selected document 1032 to the one or more user-defined concepts.
- the interface module 1012 will populate the document management table 252 (shown in FIG. 2E ) with selectable rows 253 each having information descriptive of one of the received reference documents.
- the document management table 252 also includes one or more additional columns for allowing the researcher to indicate (by way of a mouse-click or similar navigation event) the relevance of the currently selected document.
- Each row of the document management table 252 may have a relevancy column 257 that contains an input field for indicating an overall relevance of the associated reference document.
- the interface module 1014 may provide the researcher with the ability to select an indicia (e.g. using a drop-down menu list) such as “A” for highest relevance, “B” for suspected relevance, and “C” for uncertain relevance.
- Irrelevant documents may be marked with an “I” to place a marker in a project file 205 ( FIG. 2H ) indicating that a reference document was reviewed.
- Each row of the document management table 252 may also have one or more additional columns labeled generally as 258 that contain an input field for indicating whether a specific concept has been verified to appear in the currently selected reference document.
- the interface module 1014 may provide the researcher with the ability to toggle a field (one such field is labeled as 259 ) corresponding to a specific concept “on” or “off” (e.g. by a mouse-click) when indicating whether a particular concept 272 does or does not exist inside the selected document.
- a column may be provided for each of the previously discussed concepts 272 .
- the interface module 1014 may provide the researcher with a concept management window 270 (see FIG. 2F ) for allowing the researcher to define different concepts 272 which the additional columns 258 may be derived from. In this manner, the researcher is able to track higher-level or more abstract concepts than were initially defined and may also provide more user-friendly naming of the concepts 272 (useful for example for report generation).
- the interface module 1014 may also store the previously discussed relevancy indicators in the project file 205 , which is located in the data storage element 1016 in FIG. 1A . By storing each of the indicators, the interface module 1014 is able to provide information to the classification analysis module 1012 .
- the classification analysis module 1012 will now be discussed in greater detail.
- classification analysis begins with the interface module 1014 first displaying a classification analysis interface 280 , which is shown in FIG. 2G .
- the classification analysis interface 280 can include a classification search history 290 , which is retrieved by the interface module 1014 from the project file 205 .
- the classification search history 290 shows a previously identified classification code 291 and a corresponding previously identified classification title 292 .
- Each previously identified classification code 291 also has a search extent indicator 294 and a search status indicator 293 , both of which can be manipulated by the researcher to various states.
- the classification analysis interface 280 may include a document selection field 281 and a classification analysis mode selection field 282 .
- the document selection field 281 provides one or more options to the researcher for selecting a set of documents which the classification analysis will be performed on.
- the researcher may select all documents in the project file 205 that have previously been indicated to be relevant to any of the concepts 272 (i.e.
- the classification analysis interface 280 also has a class weighting 286 option and a relevancy weighting 287 option.
- the class weighting 286 instructs the classification analysis module 1012 to account for total size of a classification, which balances the effect of large classifications overshadowing smaller classifications in un-weighted frequency counts.
- the relevancy weighting 287 allows the researcher to assign greater weight in the scoring to documents 1032 of higher relevance recorded in the relevancy column 257 .
- the classification analysis mode selection field 282 provides one or more options to the researcher for selecting the mode of classification analysis to be performed. The most common mode is the Subclass mode which is discussed in the next step. (Detailed discussions all four modes are found immediately following.)
- Step 1340 may proceed after the researcher confirms the previously described classification analysis options.
- the interface module 1014 then instructs the classification analysis module 1012 to perform classification analysis on the selected set of documents.
- documents 1032 have one or more document classifications 135 associated therewith, which can be further divided into a class 136 and a subclass 137 .
- the classification analysis module 1012 will retrieve the document classifications 135 from each document and then generate a count of instances of each document classifications 135 over the entire selected set.
- the classification analysis module 1012 will then send each document classification 135 and its corresponding count or score to the interface module 1012 to be displayed (step 1350 ) via the classification analysis interface 280 where each unique code will be displayed in a separate row.
- the unique codes may be displayed in a classification code column 284 while the corresponding score will be displayed in a classification score column 283 .
- the rows may be sorted based on the score of each unique code.
- the score for each code may be multiplied by a weighting factor that accounts for the size of each subclass (ie the number of documents in the subclass) or by a weighting factor that accounts for the document relevance.
- the interface module 1014 may also retrieve a classification description for each unique code from the classification data provider 1040 , using each unique classification code to look up the corresponding classification code entry 1044 .
- the classification description may also be displayed in a classification title column 285 of the classification analysis interface 280 .
- the classification analysis module 1012 will use a search indicator to sensory indicator table 241 , as seen in FIG. 3D , to determine a sensory indicator (e.g. a color) for each unique classification code that appears in the classification analysis interface.
- the classification analysis module 1012 determines the sensory indicator by first determining whether the corresponding classification code has been previously searched and to what extent. If a code appears in the classification code column 284 , and does not appear as a previously identified classification code 291 in the classification search history 290 , then the code is assumed to be unsearched. If a code appears in the classification code column 284 , and also appears as a previously identified classification code 291 , and the corresponding search status indicator 293 shows “No”, then the code is assumed to be at partially searched.
- the classification analysis window 280 thus allows the researcher to quickly determine (e.g. by visual inspection) which classification codes have been cited most frequently as well as which classification codes have not yet been searched. In this manner the researcher may very quickly determine where a next iteration of a search project should be directed.
- step 1350 the researcher will determine whether to add a new classification code to the search project.
- the researcher is provided the ability to quickly add entries to the classification search history 290 directly from the classification code column 284 using a mouse click. In doing so, the process will return to step 1320 , as indicated by dashed arrow 1360 , at which point the interface module 1014 provides a new search inquiry to the document provider 1030 and a new set of reference documents 1032 will be received.
- steps 1330 through 1350 are repeated to determine the relevancy of the new set of reference documents to the user-defined concepts and whether the search should be expanded to a new classification. Steps 1320 through 1350 may be repeated until the researcher is satisfied that the most relevant classes have been searched.
- the researcher may make this determination when a threshold number of the most frequently occurring classifications are highlighted in red, which indicates that all are present on the classification search history 290 , and all are indicated as complete by the search status indicator 293 .
- the threshold may be least ten red highlighted classifications in the classification analysis interface 280 .
- the classification analysis performed by the classification module 1012 may be performed by first specifying a mode using the classification analysis mode selection field 282 .
- the classification analysis modes may include: a Main Classes mode, a Subclass Parents mode, a Subclass Mode and a Primary Subclass mode. Referring to FIG. 3B , all four modes are shown, and will now be discussed in detail. In addition, FIG. 3C shows the process of FIG. 3B along with actual numbers. Steps 701 - 706 are run in all modes, and will be discussed first.
- the classification analysis module 1012 retrieves the documents 1032 from the project file 205 .
- the documents are then filtered according to the preference of the researcher using document selection field 281 .
- the researcher may run just “B” tagged documents or just documents having a specific element tagged in the document management table 252 .
- the classification analysis module 1012 compiles all document classifications 135 into a 2D-Array 750 containing document classification 135 , relevancy, score, and primary (see for example array 750 in FIG. 3C ).
- the relevancy is originally set by the researcher in relevancy column 257 as A,B,C,D, or E. Score is initially set to 1.
- Primary is an indication as to whether the document classification 135 is the first listed.
- step 703 if the class weighting 286 is turned on, then move to step 704 .
- the interface module 1014 requests the classification size (ie. the total number of documents currently classified therein) for each classification in the 2D-Array 750 from the classification data provider 1040 .
- the classification analysis module 1012 divides the score in 2D-Array 750 by the classification size, which effectively weights each classification inversely according to classification size.
- step 705 if the relevancy weighting 287 is turned on, then move to step 706 .
- the classification analysis module 1012 multiplies the score in 2D-Array 750 by a relevancy factor according to the relevancy listed in 2D-Array 750 .
- step 718 If classification analysis mode selection field 282 is set to “Main” then proceed through step 717 to step 718 .
- the document classifications 135 in the 2D-Array 750 are rewritten to show only the classes 136 .
- step 718 the 2D-Array 750 is rearranged by summing the scores of repeat classification entries and eliminating all repeats. The 2D-Array 750 is then sorted high to low according to score, and the class description is added for step 720 , which is the display in interface 280 . See FIG. 2 i for an example of the interface 280 after a run in Main Classes mode.
- SubClass Parents Mode If classification analysis mode selection field 282 is set to “Subclass Parents” then proceed through step 714 and on to step 715 .
- the classification analysis module 1012 requests all ancestors of the document classifications 135 in the 2D-Array 750 from the classification data provider 1040 via the interface module 1014 .
- the ancestors are then inserted into the 2D-Array 750 , and simultaneously the original document classifications are deleted from the 2D-Array 750 .
- the 2D-Array 750 is rearranged by summing the scores of repeat classification entries and eliminating all repeats.
- the resulting table is displayed in the classification analysis interface 280 . See FIG. 2J for an example of the interface 280 after a run in SubClass Parents mode.
- SubClass Mode If classification analysis mode selection field 282 is set to “Subclass” then proceed through step 710 and on to step 711 .
- the classification analysis module 1012 rearranges the previously generated 2D-Array 750 by summing the scores and eliminating repeats. The resulting 2D-Array 750 is sorted according to score from high to low.
- the classification analysis module 1012 compares all rows in 2D-Array 750 to all rows of the classification search history 290 , and assigns colors according to the following scheme (see also FIG.
- 3D for the scheme 1) if a classification is in 2D-Array 750 and is not in the classification search history 290 then assign green, 2) if a classification is in 2D-Array 750 and is in the classification search history 290 with a search status 293 of “No” and a search extent 294 of “No” then assign light yellow, 3) if a classification is in 2D-Array 750 and is in the classification search history 290 with a search status 293 of “Yes” then assign red. 2) if a classification is in 2D-Array 750 and is in the classification search history 290 with a search status 293 of “No” and a search extent 294 of “Yes” then assign bright yellow.
- the resulting table is displayed along with the color scheme in the classification analysis interface 280 . See FIG. 2G for an example of the interface 280 after a run in SubClass mode.
- classification analysis mode selection field 282 is set to “Primary” then proceed through step 707 and on to step 708 .
- the classification analysis module 1012 sorts through 2D-Array 750 and removes all but the entries labeled as primary.
- the resulting table is displayed in the classification analysis interface 280 .
- document analysis system 800 is similar to the document analysis system of FIG. 1A however provides a client-server architecture. Accordingly, document analysis system 800 includes a client device 810 and a server device 880 .
- the server device 880 may be a computing device having a processor such as personal computer or may be implemented on a high performance server, such as a HP, IBM or Sun computer using an operating system such as, but not limited to, Windows, Solaris or UNIX.
- the server device 880 includes a classification analysis module similar in function to the document analysis module of 1012 of the embodiment of FIG. 1A .
- an exemplary method 2100 of performing a patent search using multiple modes of the present invention comprised of the following:
- Step 2101 Synthesizing a proposition into one or more key concepts 272 ;
- Step 2102 Developing one or more keyword groups 214 based on the key concepts 272 ;
- Step 2103 Conducting a text search with text search inquiry over a database of documents having text, images and one or more document classifications 135 therein using the keyword groups 214 ;
- Step 2104 Compiling a search file of documents 1032 from the text search inquiry
- Step 2105 Selecting a first set of documents from the file of documents 1032 and creating a project file 205 ;
- Step 2106 Tagging documents 1032 in the project file 205 using a document management interface 250 , with indicia in a relevancy column 257 and concepts 272 in additional columns 258 ;
- Step 2107 Instructing a classification analysis module 1012 to run in Main Class Mode to locate a set of classes 136 by counting and ranking according to frequency;
- Step 2108 Conducting a first class & text search over the database using the top-ranked classes 136 combined with text from the keyword groups 214 ;
- Step 2109 Compiling a second search file of documents 1032 from the classification & text search;
- Step 2110 Selecting a second set of 4-5 and appending the set to the project file 205 ;
- Step 2111 Tagging untagged documents in the project file 250 as appropriate, and particularly the second set of documents, using a document management interface 250 , with indicia in a relevancy column 257 and concepts 272 in additional columns 258 ;
- Step 2112 Instructing the classification analysis module 1012 to run in Subclass Parents Mode to locate a second set of document classifications 135 by counting and ranking according to frequency;
- Step 2113 Inspecting a classification schedule to locate potentially relevant child classifications of the second set located in step 2112 and adding said classifications to the classification search history 290 ;
- Step 2114 Conducting a third classification & text search over the database using the classifications from 2113 combined with text from the keyword groups 214 ;
- Step 2115 Compiling a third search file of documents 1032 from the third classification & text search;
- Step 2116 Selecting a third set of 4-5 documents 1032 and appending the set to the project file 205 ;
- Step 2117 Tagging untagged documents in the project file 250 as appropriate, and particularly the third set of documents, using a document management interface 250 , with indicia in a relevancy column 257 and concepts 272 in additional columns 258 ;
- Step 2118 Instructing the classification analysis module 1012 to run in Subclass Mode by counting and ranking document classifications 135 according to frequency and cross referencing results against the classification search history 290 to locate an nth document classification 135 to add to the classification search history 290 ;
- Step 2119 Conducting an nth search over the database using the nth classification from step 2118 either combined with text from the keyword groups 214 or inspecting the nth classification in its entirety;
- Step 2120 Compiling an nth search file of documents 1032 from the nth classification & text search;
- Step 2121 Selecting all relevant documents 1032 and appending the set to the project file 205 ;
- Step 2122 Tagging untagged documents in the project file 250 as appropriate, and particularly the nth set of documents, using a document management interface 250 , with indicia in a relevancy column 257 and concepts 272 in additional columns 258 ;
- Step 2123 Inspecting the classification search history 290 for minimum of ten document classification codes and optionally repeating from 2118 to 2123 ;
- Step 2124 Conducting forward and backward citation search (not shown) on the selected high-relevance documents from the project file 205 and adding relevant documents to the project file;
- Step 2125 End.
Abstract
A system and method for efficiently and accurately identifying relevant document classifications is contemplated. The document analysis system receives classified reference documents along with a relevancy indicator for each document and generates sensory indicators that assist a researcher in identifying relevant classifications that have not been previously researched. In one aspect, the document analysis system generates a table of classifications, the classifications being determined by scoring of each classification cited within each relevant document. The system then determines a sensory indicator (e.g. a color) for each classification that indicates the extent to which the classification has been previously searched. The classification analysis window thus allows the researcher to quickly determine (e.g. by visual inspection) which classification codes have been cited most frequently as well as which classification codes require further search. In this manner the researcher may quickly determine where to direct a next iteration of a search.
Description
- This application claims the benefit of U.S. Provisional Patent Application Ser. No. 61/250,557 filed on Oct. 11, 2009 by the inventor of the present invention, the entire contents of which are incorporated herein by reference.
- The present invention relates to the field of document research, and more particularly to methods and systems for locating relevant classifications.
- Document research involves finding relevant subject matter within a set of documents as may be found in a document repository. Search engines, for example, use “key” words or phrases as search arguments to locate text passages containing those words or phrases. Classification systems provide another means for assessing context. In a classification system, documents with common threads are grouped together in classes. A field of context, therefore, can be narrowed by selecting relevant classes. Patents and patent-related documentation databases are examples of database repositories that implement classification systems. The most commonly used classification system for patents and published patent applications, at least in the U.S., is the USPTO (United States Patent and Trademark Office) Patent Classification System. Two other classification systems in common usage on the international scene include: the “IPC” (International Patent Classification) and the “ECLA” (European Classification).
- Documentation classifications systems provide a means for improving the productivity of a document researcher. However, in many large-scale databases the classification system itself may be complex. Patents and patent-related documentation databases provide examples of such large-scale classified database systems with corresponding complex classification systems. The USPTO classification system currently comprises at least 984 classes and numerous digests (collections of certain subjects) within each class. Each class is broken into subclasses; each subclass may be further broken into subclasses and so on. Patents are thus grouped into categories, which are broken down into sub-categories, and sub-categories into more sub's, as required. The USPTO examiners decide the class/subclass in which to file a particular invention. To add further complexity, any one invention can be filed in more than one class/subclass, and most are filed in several classes/subclasses.
- The challenge of performing document research in such a large-scale document repository, therefore is to develop an experienced understanding of the classification system. Existing classification analysis tools provide some assistance in navigating classification. See for example U.S. Pat. No. 7,333,984 to Oosta. A counting and sorting technique is shown in
FIG. 8 . However, the analysis is broad, and does not show a researcher where he or she needs to search, which is important because a patent search involves many iterations of polling a database, and with each iteration, the researcher should progressively narrow the size of the field of interest. - U.S. Patent Application 20020022974 to Lindh shows a method for display of patent information that involves applying statistical analysis to groups of references containing classifications. Lindh does not show additional cross referencing to a search history in order to locate unsearched classifications, which again is important in progressively narrowing and focusing a search.
- U.S. Patent Application 20090313221 to Chen shows a patent technology association classification method. While Chen has shown the method of removing classifications and counting frequency, Chen fails to show the additional function of comparing classification frequencies to search histories, nor does Chen show additional broad and narrow reporting schemes for use at different stages of a patent search.
- U.S. Patent Application 20080228724 to Huang et al. seek to assist a researcher in performing classification-based research. Huang shows a technical classification method for searching patents, which includes generating counts from a group of references. The method shows the researcher a quality of a search, but falls short in that Huang does not assist the researcher in locating additional classification areas to search in a next iteration.
- U.S. Patent Application 20020073095 to Ohga shows a patent classification displaying method and apparatus having some similarities to the present invention. As seen in
FIG. 4 , the apparatus provides a classification counting system, wherein the most frequently occurring codes are sorted to the top of the list. Other systems, such as Thompson Delphion, have reporting features like this. Several critical components are however missing when viewed next to the present invention. First, the classification codes on the report should be cross referenced against a running tally of codes kept by the researcher in a given search project. With this additional function, the researcher sees not only relevant classifications, but also classifications that have not been searched yet. In addition, Ohga fails to show additional modes of class counting and weighting that are used at different stages of a patent research project, such that the researcher can use broad analysis in the beginning and narrow analysis during the iterative part of the patent search. - U.S. Patent Application 20010027452 to Tropper shows a system and method to identify documents in a database which relate to a given document by using recursive searching and no keywords. While Tropper realizes the benefits of using latest search results to form new searches, he fails to teach the accumulation of classification codes, weighting the codes, ranking of the codes and then comparing the rankings to the researchers search history.
- A need thus exists for an improved classification analysis system, not only for the less-experienced document researcher, but also for the efficiency of those with established skill and experience with a particular classification system. Embodiments of the present invention address many of the shortfalls in the prior art while presenting, what will hereinafter become apparent to be, a pioneering document analysis technology.
- It is a first object of the present invention to provide a classification analysis system that equips a researcher with broad scope reporting for the initial phase of a search project. It is a second object to enable the researcher to progressively narrow the scope of the search project. Yet another object of the present invention is to enable the researcher to track a classification search history such that duplication is avoided. Still another object of the present invention is to provide a system of narrow classification analysis cross referenced against the classification search history. Yet another object of the present invention is to enable the researcher to effectively cycle through the narrow phase of a search project. Still another object of the present invention is to provide a system that permits the researcher to confidently end a classification based search project.
- The present invention provides a system and method for efficiently and accurately identifying relevant document classifications. The system receives one or more classified reference documents in a document set along with a relevancy indicator for each document. The system retrieves all document classifications from the document set, and arranges a classification analysis interface. The researcher has four modes for the interface, which are called: Main, Parents, Subclass, and Primary mode—wherein Main is the broadest and Primary is the narrowest. The researcher is provided GUI tools to select classification codes from the classification analysis interface, and add them to a classification search history which is stored along with the document set in a project file.
- In use, the researcher uses the Main and the Parents mode during the first hour of the search project, and the Subclass mode for the remaining 3-4 hours. In the Main mode, the researcher is shown occurrence of main classes in the document set, which provides a broad base for class/text searching. In Parents mode, the researcher is shown common occurrence of parent sub-classifications of the document classifications, while the document classifications are not shown. With this information, the researcher can inspect child classifications of the parents in a classification schedule. For the bulk of the search project, the researcher uses the Subclass mode. In the Subclass mode, the document classifications are collected, counted, scored, and sorted—providing the researcher quick viewing of potentially relevant classifications. Once the researcher locates potentially relevant classifications, he or she executes searches in the newly located classifications, and then adds documents along with relevancy indicators to the expanding document set. The researcher then re-executes Subclass Mode classification analysis on the document set. The classification analysis module scores classification codes and then cross references against the classification search history. The resulting classification analysis interface is displayed along with various sensory indicators (e.g. a color) that show the researcher relevant classifications that are 1) un-searched, 2) partially searched, or 3) fully searched. In this manner the researcher may quickly determine where a next iteration in the search project should be directed. The researcher may continuously iterate through the process of locating new classification areas, searching the new classification areas, augmenting the document set with new documents, and then using the classification analysis tool to locate additional unsearched classification areas. The researcher is encouraged to add many (ie. 50-100) documents to the project file using a document management interface to tag even moderately relevant documents for the purpose of utilizing many hundreds of classification codes in the scoring. The process continues until the top 5-10 classifications presented by the classification analysis interface are indicated as fully searched, at which point the search project can be brought to a close. With the present invention, important classification areas are very difficult to overlook, regardless of the experience level of the researcher.
-
FIG. 1A is a block diagram illustrating a document research system in accordance with an exemplary embodiment of the invention. -
FIG. 1B is a sample of a document. -
FIG. 2A is an interface diagram in accordance with an exemplary embodiment of the invention. -
FIG. 2B is an interface diagram in accordance with an exemplary embodiment of the invention. -
FIG. 2C is an interface diagram in accordance with an exemplary embodiment of the invention. -
FIG. 2D is an interface diagram in accordance with an exemplary embodiment of the invention. -
FIG. 2E is an interface diagram in accordance with an exemplary embodiment of the invention. -
FIG. 2F is an interface diagram in accordance with an exemplary embodiment of the invention. -
FIG. 2G is an interface diagram in accordance with an exemplary embodiment of the invention. -
FIG. 2H is a diagram of a project file created and used by the present invention. -
FIG. 2I is an interface diagram in accordance with an exemplary embodiment of the invention. -
FIG. 2J is an interface diagram in accordance with an exemplary embodiment of the invention. -
FIG. 3A is a flow diagram illustrating a process that may be carried out in accordance with the exemplary system ofFIG. 1A . -
FIG. 3B is a flow diagram illustrating a process that may be carried out in accordance with the exemplary system ofFIG. 1A . -
FIG. 3C is a flow diagram illustrating a process that may be carried out in accordance with the exemplary system ofFIG. 1A . -
FIG. 3D is a search indicator to sensory indicator color scheme table. -
FIG. 4 is a block diagram illustrating a document analysis system in accordance with another exemplary embodiment of the invention. -
FIG. 5 is a flow diagram illustrating a process that may be carried out in accordance with the exemplary system ofFIG. 1A . - Reference will now be made in detail to the present exemplary embodiments of the invention, examples of which are illustrated in the accompanying drawings.
- Referring to
FIG. 1A , a block diagram is shown illustrating adocument analysis system 1000, or search system, in accordance with an exemplary embodiment of the invention. Thedocument analysis system 1000 comprises aclient device 1010, which may be a computer. Theclient device 1010 includes aclassification analysis module 1012, aninterface module 1014 and a user Input/Output (I/O)interface 1018. By way of example, theclient device 1010 may be a computing device having a processor such as personal computer, a phone, a mobile phone, or a personal digital assistant. Thedocument analysis system 1000 may also comprise adocument provider 1030, aclassification data provider 1040 and anetwork 1020. Thedocument provider 1030 is configured to deliver one or more documents, labeled generally as 1032. By way of example, thedocuments 1032 may be electronic files containing patent data or any type of electronic file that contains textual data. SeeFIG. 1B for an example of adocument 1032. As seen, thedocument 1032 hasmultiple document classifications 135 that are further divided into aclass 136 and asubclass 137. In addition, notice the body of the document is composed of multiple sections (eg. Abstract, description, claims), and that section are further divided intoparagraphs 138. The textual data of eachdocument 1032 includes content data and one ormore classification codes 135. Thedocument provider 1030 may be a remote server and may also include asearch engine 1034 for retrieving the one ormore documents 1032 from a document data repository (not shown) based on a search query. By way of example, the search engine may be that provided by the United States Patent and Trademark Office (USPTO) FreePatentsOnLine, Micropatent®, Delphian®, PatentCafe®, Thompson Innovation or Google®. Thedocument provider 1030 may retrieve the document data from a local repository or from one or more remote documents repositories. Examples of such a document repository include patent databases including those provided by EP (European patents), WO (PCT publications), JP (Japan abstracts) and DWPI (Derwent World Patent Index for patent families). Moreover, thedocument provider 1030 may be cloud based bulk storage system, such as Amazon Simple Storage Service. - The
classification data provider 1040 is configured to provide access to aclassification data repository 1042. Theclassification data repository 1042 may be a database or file storage element that stores hierarchicalclassification data entries 1044. Eachclassification data entry 1044 includes a classification code. Eachclassification data entry 1044 may also include a classification code description field. Theclassification data provider 1040 may be a remote server provided by the United States Patent and Trademark Office (USPTO). The classification data may be representative of a document classification system such as the Manual of Classification issued by the USPTO. The classification data provider may retrieve the document data from a local repository or from one or more remote documents repositories. It is noted that while shown as separate components, the document provider and classification data provider may be co-located on a single remote server. - The
interface module 1014 is configured to receive one ormore documents 1032 from thedocument provider 1030, and to retrieveclassification data 1044 from theclassification data provider 1040 by way ofnetwork 1020. By way of example, the network may be the Internet. Theinterface module 1014 may alternatively be configured to receive thedocuments 1032 orclassification data 1044 through the user I/O interface 1018. In such an embodiment, thedocuments 1032 may be stored on a portable storage device (not shown) such as a CD, DVD or solid state device and the user I/O interface 1018 may include a communications interface such as a wireless interface, a CD/DVD drive or a USB drive for retrieving data from the personal storage device. Thedocuments 1032 may alternately be paper-based documents and may be provided to theinterface module 1014 by use of a scanner (not shown) that is configured with the I/O interface 1018. Theclient device 1010 may also include adata storage element 1016, which may be at least one of a computer readable medium and a memory. Theinterface module 1014 may also be configured to receive a set of one or more concepts from a researcher by way of the I/O interface 1018. The I/O interface 1018 may also include at least one input device such as a keyboard, mouse, microphone or a touch screen for receiving the concepts from the researcher. Each concept is comprised of one or more text-based keywords or sets of text-based keywords which are used to determine the relevancy each of thedocuments 1032. Theclient device 1010 may alternatively include a document analysis module that generates statistical data based on the user-defined concepts and thedocuments 1032. The statistical data may be used by the researcher to quickly assess the relevancy of eachdocument 1032 to each of the user-defined concepts. The document analysis module may transmit the statistical data to theinterface module 1014 which presents the data to the researcher by way of the I/O interface 1018. The I/O interface 118 may also include a display such as an LCD or CRT monitor configured to display a graphical user interface (GUI) for presenting information such as the statistical data to the researcher. The GUI will now be discussed in greater detail. - Referring now to
FIG. 2A ,FIG. 2B , andFIG. 2C ,FIG. 2D ,FIG. 2E ,FIG. 2F andFIG. 2G , diagrams are shown illustrating adocument analysis GUI 200 in accordance with an exemplary embodiment of the invention.FIG. 3A which illustrates anexemplary method 1300 for performing classification-based document analysis will also be discussed. At a first step labeled as 1310, theinterface module 1014 may receive concept data from the researcher. Theinterface module 1014 first generates adocument analysis GUI 200 and displays theGUI 200 to the researcher by way of the display device included with user I/O interface 1018. As shown inFIG. 2A , thedocument analysis GUI 200 includes adocument relevance interface 220, adocument management interface 250, and adocument image window 254. As seen inFIG. 2F , the researcher may start a research project by entering onemore concepts 272. Eachconcept 272 may have one or more words or word groups associated therewith. As shown inFIG. 2B , thedocument analysis GUI 200 includes akeyword entry interface 210. Thekeyword entry interface 210 comprises multiple rows of alphanumeric entry fields 212. One ormore keywords 213 may be entered by a researcher into eachentry field 212, wherein eachkeyword 213 is conceptually related such that each line represents akeyword group 214. The researcher is also provided with auser thesaurus 211 andweb thesaurus 219. Theuser thesaurus 211 can be edited and stored in thedata storage element 1016, and theweb thesaurus 219 may be accessed through thenetwork 1020 by theinterface module 1014. Five alphanumeric entry fields 212 are shown to be filled inFIG. 2B . Eachconcept 272 andcorresponding keyword group 214 may be determined manually by the researcher or may be received from an external source. By way of example, the concepts may be reduced to a manageable number of concepts (e.g. 4-5 concepts).Keywords 213 may then be chosen for each of the concepts and entered into one of thealphanumeric fields 212 to form thekeyword group 214. After entering each of the desired concepts, the researcher may then exit thekeyword entry interface 210 and proceed to analysis of a set of documents based on the user-defined concepts. - At a next step labeled as 1320 the
interface module 1014 will receive one ormore reference documents 1032. As discussed theinterface module 1014 is configured to receive one ormore documents 1032 from thedocument provider 1030 by way ofnetwork 1020. Theinterface module 1014 may be configured to allow the researcher to request a predetermined set ofdocuments 1032. By way of example, the researcher may initiate a request for a specific set of patent documents or a set of patent documents that fall within a specific category or classification. The researcher may also initiate a search of a remote document repository through a search interface window 230 (shown inFIG. 2D ) provided by thedocument analysis GUI 200. The search may be initiated by entering a set of search parameters, such as keywords, into one ormore search fields 232 located on thesearch interface window 230. Boolean operators, wildcards and proximity indicators may be used to link the keywords together in logic sets. Thesearch interface window 230 may also provide asearch assistance window 234 that allows the previously definedkeywords 213 to be added to the set of search parameters in response to a single user action (e.g. a mouse click). Thesearch assistance window 234 thereby facilitates the loading of search parameters into the one or more search fields 232. In addition, the researcher is provided with aclassification search history 290, which contains a table for documenting the search project strategy (discussed in detail later). The researcher may pick classification codes from theclassification search history 290. As discussed, theinterface module 1014 may alternatively be configured to receive one ormore documents 1032 through the user I/O interface 1018. In such an embodiment, thedocuments 1032 may be stored on a portable storage device (not shown) such as a CD, DVD or solid state device and the user I/O interface 1018 may include a communications interface such as a wireless interface, a CD/DVD drive or a USB drive for retrieving data from the personal storage device. Upon receiving the one or more reference documents theinterface module 1014 will populate a document management table 252 located on a document management interface 250 (shown inFIG. 2E ) withselectable rows 253 each having information descriptive of one of the receiveddocuments 1032. By way of example, each row may include areference document number 255 anddocument title 256. - At a next step labeled as 1330 the
interface module 1014 receives and stores data from the researcher that indicates relevancy of a currently selecteddocument 1032 to the one or more user-defined concepts. As discussed, theinterface module 1012 will populate the document management table 252 (shown inFIG. 2E ) withselectable rows 253 each having information descriptive of one of the received reference documents. In the exemplary embodiment, the document management table 252 also includes one or more additional columns for allowing the researcher to indicate (by way of a mouse-click or similar navigation event) the relevance of the currently selected document. Each row of the document management table 252 may have arelevancy column 257 that contains an input field for indicating an overall relevance of the associated reference document. By way of example theinterface module 1014 may provide the researcher with the ability to select an indicia (e.g. using a drop-down menu list) such as “A” for highest relevance, “B” for suspected relevance, and “C” for uncertain relevance. Irrelevant documents may be marked with an “I” to place a marker in a project file 205 (FIG. 2H ) indicating that a reference document was reviewed. Each row of the document management table 252 may also have one or more additional columns labeled generally as 258 that contain an input field for indicating whether a specific concept has been verified to appear in the currently selected reference document. Theinterface module 1014 may provide the researcher with the ability to toggle a field (one such field is labeled as 259) corresponding to a specific concept “on” or “off” (e.g. by a mouse-click) when indicating whether aparticular concept 272 does or does not exist inside the selected document. A column may be provided for each of the previously discussedconcepts 272. As discussed, theinterface module 1014 may provide the researcher with a concept management window 270 (seeFIG. 2F ) for allowing the researcher to definedifferent concepts 272 which theadditional columns 258 may be derived from. In this manner, the researcher is able to track higher-level or more abstract concepts than were initially defined and may also provide more user-friendly naming of the concepts 272 (useful for example for report generation). Theinterface module 1014 may also store the previously discussed relevancy indicators in theproject file 205, which is located in thedata storage element 1016 inFIG. 1A . By storing each of the indicators, theinterface module 1014 is able to provide information to theclassification analysis module 1012. Theclassification analysis module 1012 will now be discussed in greater detail. - At a next step labeled as 1340 classification analysis begins with the
interface module 1014 first displaying aclassification analysis interface 280, which is shown inFIG. 2G . Theclassification analysis interface 280 can include aclassification search history 290, which is retrieved by theinterface module 1014 from theproject file 205. Theclassification search history 290 shows a previously identifiedclassification code 291 and a corresponding previously identifiedclassification title 292. Each previously identifiedclassification code 291 also has a search extent indicator 294 and asearch status indicator 293, both of which can be manipulated by the researcher to various states. By way of example, if the researcher has already searched or plans to search previously identifiedclassification code 291 in its entirety, he or she may indicate this with the word “Yes” in the search extent indicator 294. In addition, the researcher may keep record of which previously identifiedclassification codes 291 have been properly addressed with either text limited searching or full searching by similarly indicating in thesearch status indicator 293. Theclassification analysis interface 280 may include adocument selection field 281 and a classification analysismode selection field 282. Thedocument selection field 281 provides one or more options to the researcher for selecting a set of documents which the classification analysis will be performed on. By way of example, the researcher may select all documents in theproject file 205 that have previously been indicated to be relevant to any of the concepts 272 (i.e. all documents selected in any of columns 258), all documents relative to a specific concept (i.e. all documents selected in one of columns 258) or documents that have been indicated to have a specific overall relevance (e.g. all documents having a relevancy of “A’ from relevancy column 257). Theclassification analysis interface 280 also has aclass weighting 286 option and arelevancy weighting 287 option. Theclass weighting 286 instructs theclassification analysis module 1012 to account for total size of a classification, which balances the effect of large classifications overshadowing smaller classifications in un-weighted frequency counts. Therelevancy weighting 287 allows the researcher to assign greater weight in the scoring todocuments 1032 of higher relevance recorded in therelevancy column 257. The classification analysismode selection field 282 provides one or more options to the researcher for selecting the mode of classification analysis to be performed. The most common mode is the Subclass mode which is discussed in the next step. (Detailed discussions all four modes are found immediately following.) -
Step 1340 may proceed after the researcher confirms the previously described classification analysis options. Theinterface module 1014 then instructs theclassification analysis module 1012 to perform classification analysis on the selected set of documents. Referring back toFIG. 1B ,documents 1032 have one ormore document classifications 135 associated therewith, which can be further divided into aclass 136 and asubclass 137. Theclassification analysis module 1012 will retrieve thedocument classifications 135 from each document and then generate a count of instances of eachdocument classifications 135 over the entire selected set. Theclassification analysis module 1012 will then send eachdocument classification 135 and its corresponding count or score to theinterface module 1012 to be displayed (step 1350) via theclassification analysis interface 280 where each unique code will be displayed in a separate row. The unique codes may be displayed in aclassification code column 284 while the corresponding score will be displayed in aclassification score column 283. The rows may be sorted based on the score of each unique code. In an alternative embodiment discussed later, the score for each code may be multiplied by a weighting factor that accounts for the size of each subclass (ie the number of documents in the subclass) or by a weighting factor that accounts for the document relevance. Theinterface module 1014 may also retrieve a classification description for each unique code from theclassification data provider 1040, using each unique classification code to look up the correspondingclassification code entry 1044. The classification description may also be displayed in aclassification title column 285 of theclassification analysis interface 280. Theclassification analysis module 1012 will use a search indicator to sensory indicator table 241, as seen inFIG. 3D , to determine a sensory indicator (e.g. a color) for each unique classification code that appears in the classification analysis interface. Theclassification analysis module 1012 determines the sensory indicator by first determining whether the corresponding classification code has been previously searched and to what extent. If a code appears in theclassification code column 284, and does not appear as a previously identifiedclassification code 291 in theclassification search history 290, then the code is assumed to be unsearched. If a code appears in theclassification code column 284, and also appears as a previously identifiedclassification code 291, and the correspondingsearch status indicator 293 shows “No”, then the code is assumed to be at partially searched. If a code appears in theclassification code column 284, and appears as a previously identifiedclassification code 291, and the correspondingsearch status indicator 293 shows “Yes”, then the code is assumed to be fully searched. The sensory indicator may be a green highlighting if the code is unsearched, a yellow highlighting if it has been partially searched, or a red highlighting if the code has been fully searched. Theclassification analysis window 280 thus allows the researcher to quickly determine (e.g. by visual inspection) which classification codes have been cited most frequently as well as which classification codes have not yet been searched. In this manner the researcher may very quickly determine where a next iteration of a search project should be directed. - At
step 1350 the researcher will determine whether to add a new classification code to the search project. The researcher is provided the ability to quickly add entries to theclassification search history 290 directly from theclassification code column 284 using a mouse click. In doing so, the process will return to step 1320, as indicated by dashedarrow 1360, at which point theinterface module 1014 provides a new search inquiry to thedocument provider 1030 and a new set ofreference documents 1032 will be received. Each ofsteps 1330 through 1350 are repeated to determine the relevancy of the new set of reference documents to the user-defined concepts and whether the search should be expanded to a new classification.Steps 1320 through 1350 may be repeated until the researcher is satisfied that the most relevant classes have been searched. By way of example, the researcher may make this determination when a threshold number of the most frequently occurring classifications are highlighted in red, which indicates that all are present on theclassification search history 290, and all are indicated as complete by thesearch status indicator 293. By way of example, the threshold may be least ten red highlighted classifications in theclassification analysis interface 280. - Modes of Operation: As discussed the classification analysis performed by the
classification module 1012 may be performed by first specifying a mode using the classification analysismode selection field 282. By way of example, the classification analysis modes may include: a Main Classes mode, a Subclass Parents mode, a Subclass Mode and a Primary Subclass mode. Referring toFIG. 3B , all four modes are shown, and will now be discussed in detail. In addition,FIG. 3C shows the process ofFIG. 3B along with actual numbers. Steps 701-706 are run in all modes, and will be discussed first. - As seen at
step 701, theclassification analysis module 1012 retrieves thedocuments 1032 from theproject file 205. The documents are then filtered according to the preference of the researcher usingdocument selection field 281. As an example, the researcher may run just “B” tagged documents or just documents having a specific element tagged in the document management table 252. Next atstep 702, theclassification analysis module 1012 compiles alldocument classifications 135 into a 2D-Array 750 containingdocument classification 135, relevancy, score, and primary (see forexample array 750 inFIG. 3C ). The relevancy is originally set by the researcher inrelevancy column 257 as A,B,C,D, or E. Score is initially set to 1. Primary is an indication as to whether thedocument classification 135 is the first listed. Next atstep 703, if theclass weighting 286 is turned on, then move to step 704. Atstep 704, theinterface module 1014 requests the classification size (ie. the total number of documents currently classified therein) for each classification in the 2D-Array 750 from theclassification data provider 1040. Next theclassification analysis module 1012 divides the score in 2D-Array 750 by the classification size, which effectively weights each classification inversely according to classification size. Next atstep 705, if therelevancy weighting 287 is turned on, then move to step 706. Atstep 706, theclassification analysis module 1012 multiplies the score in 2D-Array 750 by a relevancy factor according to the relevancy listed in 2D-Array 750. Current relevancy factors are A=1.5,B=1,C=0.75,D=0.5,E=0.5. - Main Classes Mode: If classification analysis
mode selection field 282 is set to “Main” then proceed throughstep 717 to step 718. Atstep 718, thedocument classifications 135 in the 2D-Array 750, are rewritten to show only theclasses 136. Next, atstep 718, the 2D-Array 750 is rearranged by summing the scores of repeat classification entries and eliminating all repeats. The 2D-Array 750 is then sorted high to low according to score, and the class description is added forstep 720, which is the display ininterface 280. SeeFIG. 2 i for an example of theinterface 280 after a run in Main Classes mode. - SubClass Parents Mode: If classification analysis
mode selection field 282 is set to “Subclass Parents” then proceed throughstep 714 and on to step 715. Next, theclassification analysis module 1012 requests all ancestors of thedocument classifications 135 in the 2D-Array 750 from theclassification data provider 1040 via theinterface module 1014. The ancestors are then inserted into the 2D-Array 750, and simultaneously the original document classifications are deleted from the 2D-Array 750. Next, atstep 716, the 2D-Array 750 is rearranged by summing the scores of repeat classification entries and eliminating all repeats. The resulting table is displayed in theclassification analysis interface 280. SeeFIG. 2J for an example of theinterface 280 after a run in SubClass Parents mode. - SubClass Mode: If classification analysis
mode selection field 282 is set to “Subclass” then proceed throughstep 710 and on to step 711. Next, theclassification analysis module 1012 rearranges the previously generated 2D-Array 750 by summing the scores and eliminating repeats. The resulting 2D-Array 750 is sorted according to score from high to low. Next, atstep 712, theclassification analysis module 1012 compares all rows in 2D-Array 750 to all rows of theclassification search history 290, and assigns colors according to the following scheme (see alsoFIG. 3D for the scheme): 1) if a classification is in 2D-Array 750 and is not in theclassification search history 290 then assign green, 2) if a classification is in 2D-Array 750 and is in theclassification search history 290 with asearch status 293 of “No” and a search extent 294 of “No” then assign light yellow, 3) if a classification is in 2D-Array 750 and is in theclassification search history 290 with asearch status 293 of “Yes” then assign red. 2) if a classification is in 2D-Array 750 and is in theclassification search history 290 with asearch status 293 of “No” and a search extent 294 of “Yes” then assign bright yellow. Atstep 720, the resulting table is displayed along with the color scheme in theclassification analysis interface 280. SeeFIG. 2G for an example of theinterface 280 after a run in SubClass mode. - Primary Mode: If classification analysis
mode selection field 282 is set to “Primary” then proceed throughstep 707 and on to step 708. Next, theclassification analysis module 1012 sorts through 2D-Array 750 and removes all but the entries labeled as primary. Atstep 720, the resulting table is displayed in theclassification analysis interface 280. - Referring to
FIG. 4 , a block diagram is shown illustrating adocument analysis system 800 in accordance with another exemplary embodiment of the invention. Thedocument analysis system 800 is similar to the document analysis system ofFIG. 1A however provides a client-server architecture. Accordingly,document analysis system 800 includes aclient device 810 and aserver device 880. Theserver device 880 may be a computing device having a processor such as personal computer or may be implemented on a high performance server, such as a HP, IBM or Sun computer using an operating system such as, but not limited to, Windows, Solaris or UNIX. Theserver device 880 includes a classification analysis module similar in function to the document analysis module of 1012 of the embodiment ofFIG. 1A . - Thus, a document analysis system having the benefits of allowing for efficient and accurate identification of potentially relevant classifications is contemplated. Referring now to
FIG. 5 , an exemplary method 2100 of performing a patent search using multiple modes of the present invention comprised of the following: - Step 2101: Synthesizing a proposition into one or more
key concepts 272; - Step 2102: Developing one or
more keyword groups 214 based on thekey concepts 272; - Step 2103: Conducting a text search with text search inquiry over a database of documents having text, images and one or
more document classifications 135 therein using thekeyword groups 214; - Step 2104: Compiling a search file of
documents 1032 from the text search inquiry; - Step 2105: Selecting a first set of documents from the file of
documents 1032 and creating aproject file 205; - Step 2106: Tagging
documents 1032 in theproject file 205 using adocument management interface 250, with indicia in arelevancy column 257 andconcepts 272 inadditional columns 258; - Step 2107: Instructing a
classification analysis module 1012 to run in Main Class Mode to locate a set ofclasses 136 by counting and ranking according to frequency; - Step 2108: Conducting a first class & text search over the database using the top-ranked
classes 136 combined with text from thekeyword groups 214; - Step 2109: Compiling a second search file of
documents 1032 from the classification & text search; - Step 2110: Selecting a second set of 4-5 and appending the set to the
project file 205; - Step 2111: Tagging untagged documents in the
project file 250 as appropriate, and particularly the second set of documents, using adocument management interface 250, with indicia in arelevancy column 257 andconcepts 272 inadditional columns 258; - Step 2112: Instructing the
classification analysis module 1012 to run in Subclass Parents Mode to locate a second set ofdocument classifications 135 by counting and ranking according to frequency; - Step 2113: Inspecting a classification schedule to locate potentially relevant child classifications of the second set located in
step 2112 and adding said classifications to theclassification search history 290; - Step 2114: Conducting a third classification & text search over the database using the classifications from 2113 combined with text from the
keyword groups 214; - Step 2115: Compiling a third search file of
documents 1032 from the third classification & text search; - Step 2116: Selecting a third set of 4-5
documents 1032 and appending the set to theproject file 205; - Step 2117: Tagging untagged documents in the
project file 250 as appropriate, and particularly the third set of documents, using adocument management interface 250, with indicia in arelevancy column 257 andconcepts 272 inadditional columns 258; - Step 2118: Instructing the
classification analysis module 1012 to run in Subclass Mode by counting andranking document classifications 135 according to frequency and cross referencing results against theclassification search history 290 to locate annth document classification 135 to add to theclassification search history 290; - Step 2119: Conducting an nth search over the database using the nth classification from
step 2118 either combined with text from thekeyword groups 214 or inspecting the nth classification in its entirety; - Step 2120: Compiling an nth search file of
documents 1032 from the nth classification & text search; - Step 2121: Selecting all
relevant documents 1032 and appending the set to theproject file 205; - Step 2122: Tagging untagged documents in the
project file 250 as appropriate, and particularly the nth set of documents, using adocument management interface 250, with indicia in arelevancy column 257 andconcepts 272 inadditional columns 258; - Step 2123: Inspecting the
classification search history 290 for minimum of ten document classification codes and optionally repeating from 2118 to 2123; - Step 2124: Conducting forward and backward citation search (not shown) on the selected high-relevance documents from the
project file 205 and adding relevant documents to the project file; - Step 2125: End.
- While the foregoing invention has been described with reference to the above-described embodiments, various modifications and changes can be made without departing from the spirit of the invention. Accordingly, all such modifications and changes are considered to be within the scope of the appended claims.
Claims (35)
1. A search system for searching through a plurality of documents that are organized using a classification system to define each of the plurality of documents as a classified document, wherein a search is conducted based on a predetermined subject matter, the system comprising:
a program module stored on at least one of a computer readable medium and a memory of a computer, the program module comprising instructions executable by a processor of the computer to determine document classifications that are relevant to the subject matter of the search, the program module comprising a classification analysis module;
wherein the classification analysis module:
receives a set of documents, the set of documents including at least one document, each document in the set of documents having a relevancy indicator and at least one classification value, each classification value being defined as a unique classification value;
determines a score of each of the unique classification values appearing in the at least one document in the set of documents, the score being defined as a frequency of occurrence of each of the unique classification values appearing in the at least one document;
determines a search indicator for each of the unique classification values, the search indicator providing an indication of a level to which each of the unique classification values has been previously searched; and
generates and displays a table of each of the unique classification values along with at least one of the score of each of the unique classification values and the search indicator for each of the unique classification values.
2. A system according to claim 1 wherein the table is sorted based on the score of each of the unique classification values.
3. A system according to claim 1 wherein each of the unique classification values is assigned a predetermined value that corresponds to a weight of each of the unique classification values to define a weighted classification value, and wherein the weighted classification value is used to modify the score.
4. A system according to claim 1 wherein each of the unique classification values relating to a document located in the search that is determined to be a relevant document is assigned a predetermined value that corresponds to a weight of each of the unique classification values to define a weighted classification value, wherein the predetermined value is derived from the overall relevance of the document located in the search, and wherein the weighted classification value is used to modify the score.
5. A system according to claim 1 wherein each of the unique classification values relating to a document located in the search is assigned a predetermined value that corresponds to a weight of each of the unique classification values to define a weighted classification value based on the number of documents located in the classification, and wherein the weighted classification value is used to modify the score.
6. A system according to claim 1 wherein the classification analysis module separates the unique classification values to display only the unique classification values of those documents located in the search that were determined to be relevant.
7. A system according to claim 1 wherein each of the unique classification values are organized in a hierarchy providing each of the unique classification values with at least one ancestor node; and wherein each of the unique classification values is replaced with the at least one ancestor node.
8. A system according to claim 1 wherein each of the unique classification values includes a class value and a subclass value; and wherein each of the unique classification values is replaced with the class value.
9. A system according to claim 1 wherein the classification analysis module determines the search indicator for each unique classification value by receiving both an alphanumeric indicator relating to a search status of the unique classification value and an alphanumeric indicator relating to a search extent of the unique classification value.
10. A system according to claim 1 wherein the classification analysis module assigns a color to be displayed on a user interface relating to the search indicator.
11. A system according to claim 1 wherein the computer is a server and the system further comprises a client computer, the server communicatively coupled to the client computer; and wherein the program module is located on the client computer and the classification analysis module is located on the server.
12. A system according to claim 7 wherein each of the unique classification values are grouped by adding the scores of each of the unique classification values after being replaced; wherein the grouped unique classification values are sorted according to the scores; and wherein the sorted grouped unique classification values are displayed on the table.
13. A method of searching through a plurality of documents that are organized using a classification system to define each of the plurality of documents as a classified document, the plurality of classified documents being searched based on a predetermined subject matter, the method comprising:
determining record classifications that are relevant to the subject matter of the search using a program module stored on at least one of a computer readable medium and a memory of a computer, the program module comprising instructions executable by a processor of the computer to determine document classifications;
receiving a set of documents, the set of documents including at least one document, each document in the set of documents having a relevancy indicator and at least one classification value, each classification value being defined as a unique classification value;
determining a score of each of the unique classification values appearing in the at least one document in the set of documents, the score being defined as a frequency of occurrence of each of the unique classification values appearing in the at least one document;
determining a search indicator for each of the unique classification values, the search indicator providing an indication of a level to which each of the unique classification values has been previously searched; and
generating and displaying a table of each of the unique classification values along with at least one of the score of each of the unique classification values and the search indicator for each of the unique classification values.
14. A method according to claim 13 further comprising sorting the table based on the score of each of the unique classification values.
15. A method according to claim 13 further comprising assigning a predetermined value to each of the unique classification values that corresponds to a weight of each of the unique classification values to define a weighted classification value; and wherein the weighted classification value is used to modify the score.
16. A method according to claim 13 further comprising assigning a predetermined value to each of the unique classification values relating to a document located in the search that is determined to be a relevant document that corresponds to a weight of each of the unique classification values to define a weighted classification value, and wherein the weighted classification value is used to modify the score.
17. A method according to claim 13 further comprising assigning a predetermined value to each of the unique classification values relating to a document located in the search that corresponds to a weight of each of the unique classification values to define a weighted classification value based on the number of documents located in the classification, and wherein the weighted classification value is used to modify the score.
18. A method according to claim 13 further comprising separating the unique classification values to display only the unique classification values of those documents located in the search that were determined to be relevant.
19. A method according to claim 13 further comprising organizing each of the unique classification values in a hierarchy providing each of the unique classification values with at least one ancestor node; and further comprising replacing each of the unique classification values with the at least one ancestor node.
20. A method according to claim 13 wherein each of the unique classification values includes a class value and a subclass value; and further comprising replacing each of the unique classification values with the class value.
21. A method according to claim 13 further comprising determining the search indicator for each unique classification value by receiving both an alphanumeric indicator relating to a search status of the unique classification value and an alphanumeric indicator relating to a search extend of the unique classification value.
22. A method according to claim 13 further comprising assigning a color to be displayed on a user interface relating to the search indicator.
23. A method according to claim 19 further comprising grouping each of the unique classification values by adding the scores of each of the unique classification values after being replaced; sorting the grouped unique classification values according to the scores; and displaying the sorted grouped unique classification values on the table.
24. A method of searching through a plurality of documents that are organized using a classification system to define each of the plurality of documents as a classified document, the plurality of classified documents being searched based on a predetermined subject matter, the method comprising:
determining record classifications that are relevant to the subject matter of the search using a program module stored on at least one of a computer readable medium and a memory of a computer, the program module comprising instructions executable by a processor of the computer to determine document classifications;
receiving a set of documents, the set of documents including at least one document, each document in the set of documents having a relevancy indicator and at least one classification value, each classification value being defined as a unique classification value;
determining a score of each of the unique classification values appearing in the at least one document in the set of documents, the score being defined as a frequency of occurrence of each of the unique classification values appearing in the at least one document;
determining a search indicator for each of the unique classification values, the search indicator providing an indication of a level to which each of the unique classification values has been previously searched, wherein the search indicator is determined by receiving both an alphanumeric indicator relating to a search status of the unique classification value and an alphanumeric indicator relating to a search extend of the unique classification value;
generating and displaying a table of each of the unique classification values along with at least one of the score of each of the unique classification values and the search indicator for each of the unique classification values; and
assigning a color to be displayed on a user interface relating to the search indicator.
25. A method according to claim 24 further comprising sorting the table based on the score of each of the unique classification values.
26. A method according to claim 24 further comprising assigning a predetermined value to each of the unique classification values that corresponds to a weight of each of the unique classification values to define a weighted classification value; and wherein the weighted classification value is used to modify the score.
27. A method according to claim 24 further comprising assigning a predetermined value to each of the unique classification values relating to a document located in the search that is determined to be a relevant document that corresponds to a weight of each of the unique classification values to define a weighted classification value, and wherein the weighted classification value is used to modify the score.
28. A method according to claim 24 further comprising assigning a predetermined value to each of the unique classification values relating to a document located in the search that corresponds to a weight of each of the unique classification values to define a weighted classification value based on the number of documents located in the classification, and wherein the weighted classification value is used to modify the score.
29. A method according to claim 24 further comprising separating the unique classification values to display only the unique classification values of those documents located in the search that were determined to be relevant.
30. A method according to claim 24 further comprising organizing each of the unique classification values in a hierarchy providing each of the unique classification values with at least one ancestor node; and further comprising replacing each of the unique classification values with the at least one ancestor node.
31. A method according to claim 24 wherein each of the unique classification values includes a class value and a subclass value; and further comprising replacing each of the unique classification values with the class value.
32. A method according to claim 30 further comprising grouping each of the unique classification values by adding the scores of each of the unique classification values after being replaced; sorting the grouped unique classification values according to the scores; and displaying the sorted grouped unique classification values on the table.
33. A system according to claim 8 wherein each of the unique classification values are grouped by adding the scores of each of the unique classification values after being replaced; wherein the grouped unique classification values are sorted according to the scores; and wherein the sorted grouped unique classification values are displayed on the table.
34. A method according to claim 20 further comprising grouping each of the unique classification values by adding the scores of each of the unique classification values after being replaced; sorting the grouped unique classification values according to the scores; and displaying the sorted grouped unique classification values on the table.
35. A method according to claim 31 further comprising grouping each of the unique classification values by adding the scores of each of the unique classification values after being replaced; sorting the grouped unique classification values according to the scores; and displaying the sorted grouped unique classification values on the table.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/501,362 US20120197910A1 (en) | 2009-10-11 | 2010-10-12 | Method and system for performing classified document research |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US25055709P | 2009-10-11 | 2009-10-11 | |
US13/501,362 US20120197910A1 (en) | 2009-10-11 | 2010-10-12 | Method and system for performing classified document research |
PCT/US2010/052315 WO2011044578A1 (en) | 2009-10-11 | 2010-10-12 | Method and system for performing classified document research |
Publications (1)
Publication Number | Publication Date |
---|---|
US20120197910A1 true US20120197910A1 (en) | 2012-08-02 |
Family
ID=43857190
Family Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/501,370 Expired - Fee Related US8739032B2 (en) | 2009-10-11 | 2010-10-12 | Method and system for document presentation and analysis |
US13/501,362 Abandoned US20120197910A1 (en) | 2009-10-11 | 2010-10-12 | Method and system for performing classified document research |
US14/252,393 Abandoned US20140229475A1 (en) | 2009-10-11 | 2014-04-14 | Method and system for document presentation and analysis |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/501,370 Expired - Fee Related US8739032B2 (en) | 2009-10-11 | 2010-10-12 | Method and system for document presentation and analysis |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/252,393 Abandoned US20140229475A1 (en) | 2009-10-11 | 2014-04-14 | Method and system for document presentation and analysis |
Country Status (2)
Country | Link |
---|---|
US (3) | US8739032B2 (en) |
WO (2) | WO2011044578A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120059851A1 (en) * | 2010-03-05 | 2012-03-08 | Hans Lercher | Function-Oriented Mapping of Technological Concepts |
US20210064621A1 (en) * | 2019-09-04 | 2021-03-04 | Wertintelligence | Optimizing method of search formula for patent document and device therefor |
Families Citing this family (31)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2011044578A1 (en) | 2009-10-11 | 2011-04-14 | Patrick Walsh | Method and system for performing classified document research |
US10956475B2 (en) | 2010-04-06 | 2021-03-23 | Imagescan, Inc. | Visual presentation of search results |
US9836460B2 (en) * | 2010-06-11 | 2017-12-05 | Lexisnexis, A Division Of Reed Elsevier Inc. | Systems and methods for analyzing patent-related documents |
JP5852361B2 (en) * | 2011-08-22 | 2016-02-03 | インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation | Apparatus and method for developing a bill of materials |
US10467273B2 (en) * | 2011-10-24 | 2019-11-05 | Image Scan, Inc. | Apparatus and method for displaying search results using cognitive pattern recognition in locating documents and information within |
US11010432B2 (en) | 2011-10-24 | 2021-05-18 | Imagescan, Inc. | Apparatus and method for displaying multiple display panels with a progressive relationship using cognitive pattern recognition |
US9286273B1 (en) * | 2013-03-11 | 2016-03-15 | Parallels IP Holding GmbH | Method and system for implementing a website builder |
US9128994B2 (en) | 2013-03-14 | 2015-09-08 | Microsoft Technology Licensing, Llc | Visually representing queries of multi-source data |
US20140304579A1 (en) * | 2013-03-15 | 2014-10-09 | SnapDoc | Understanding Interconnected Documents |
US20150113388A1 (en) * | 2013-10-22 | 2015-04-23 | Qualcomm Incorporated | Method and apparatus for performing topic-relevance highlighting of electronic text |
JP6612752B2 (en) * | 2013-11-26 | 2019-11-27 | コーニンクレッカ フィリップス エヌ ヴェ | System and method for determining missing time course information in radiological imaging reports |
US20150294220A1 (en) * | 2014-04-11 | 2015-10-15 | Khalid Ragaei Oreif | Structuring data around a topical matter and a.i./n.l.p./ machine learning knowledge system that enhances source content by identifying content topics and keywords and integrating associated/related contents |
US9886422B2 (en) | 2014-08-06 | 2018-02-06 | International Business Machines Corporation | Dynamic highlighting of repetitions in electronic documents |
FR3027130B1 (en) * | 2014-10-14 | 2016-12-30 | Airbus Operations Sas | AUTOMATIC INTEGRATION OF DATA RELATING TO A MAINTENANCE OPERATION |
JP6447161B2 (en) | 2015-01-20 | 2019-01-09 | 富士通株式会社 | Semantic structure search program, semantic structure search apparatus, and semantic structure search method |
US9928300B2 (en) * | 2015-07-16 | 2018-03-27 | NewsRx, LLC | Artificial intelligence article analysis interface |
US10474672B2 (en) * | 2015-08-25 | 2019-11-12 | Schlafender Hase GmbH Software & Communications | Method for comparing text files with differently arranged text sections in documents |
US20170116194A1 (en) * | 2015-10-23 | 2017-04-27 | International Business Machines Corporation | Ingestion planning for complex tables |
US10552539B2 (en) * | 2015-12-17 | 2020-02-04 | Sap Se | Dynamic highlighting of text in electronic documents |
US10460023B1 (en) | 2016-03-10 | 2019-10-29 | Matthew Connell Shriver | Systems, methods, and computer readable media for creating slide presentations for an annotation set |
US10445327B2 (en) | 2016-04-07 | 2019-10-15 | RELX Inc. | Systems and methods for providing a visualizable results list |
US10445355B2 (en) * | 2016-04-07 | 2019-10-15 | RELX Inc. | Systems and methods for providing a visualizable results list |
US11030259B2 (en) * | 2016-04-13 | 2021-06-08 | Microsoft Technology Licensing, Llc | Document searching visualized within a document |
US10740407B2 (en) | 2016-12-09 | 2020-08-11 | Microsoft Technology Licensing, Llc | Managing information about document-related activities |
US9965460B1 (en) * | 2016-12-29 | 2018-05-08 | Konica Minolta Laboratory U.S.A., Inc. | Keyword extraction for relationship maps |
US10726074B2 (en) | 2017-01-04 | 2020-07-28 | Microsoft Technology Licensing, Llc | Identifying among recent revisions to documents those that are relevant to a search query |
US10891947B1 (en) | 2017-08-03 | 2021-01-12 | Wells Fargo Bank, N.A. | Adaptive conversation support bot |
CN109376238B (en) * | 2018-09-14 | 2021-01-05 | 大连理工大学 | Paper correlation degree quantification method based on reference document list overlapping degree |
US11645295B2 (en) | 2019-03-26 | 2023-05-09 | Imagescan, Inc. | Pattern search box |
US11366964B2 (en) | 2019-12-04 | 2022-06-21 | International Business Machines Corporation | Visualization of the entities and relations in a document |
US20210224264A1 (en) * | 2020-01-17 | 2021-07-22 | nference, inc. | Systems and methods for mapping a term to a vector representation in a semantic space |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060089924A1 (en) * | 2000-09-25 | 2006-04-27 | Bhavani Raskutti | Document categorisation system |
US20060187027A1 (en) * | 2005-02-08 | 2006-08-24 | User-Centric Enterprises, Inc. | Electronically tracking a path history |
US20070027902A1 (en) * | 1999-03-31 | 2007-02-01 | Verizon Laboratories Inc. | Semi-automatic index term augmentation in document retrieval |
US20080162476A1 (en) * | 2006-12-27 | 2008-07-03 | Fujitsu Limited | Medium storing document retrieval program, document retrieval apparatus and document retrieval method |
US20080249999A1 (en) * | 2007-04-06 | 2008-10-09 | Xerox Corporation | Interactive cleaning for automatic document clustering and categorization |
US20090287779A1 (en) * | 2008-05-15 | 2009-11-19 | Internationa Business Machines Corporation | Methods and systems to selectively mark email as partially reviewed |
Family Cites Families (39)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2977260B2 (en) * | 1990-09-27 | 1999-11-15 | 株式会社東芝 | Information presentation device |
US5369577A (en) * | 1991-02-01 | 1994-11-29 | Wang Laboratories, Inc. | Text searching system |
US5940624A (en) * | 1991-02-01 | 1999-08-17 | Wang Laboratories, Inc. | Text management system |
US5724571A (en) * | 1995-07-07 | 1998-03-03 | Sun Microsystems, Inc. | Method and apparatus for generating query responses in a computer-based document retrieval system |
US5920859A (en) * | 1997-02-05 | 1999-07-06 | Idd Enterprises, L.P. | Hypertext document retrieval system and method |
US5920854A (en) * | 1996-08-14 | 1999-07-06 | Infoseek Corporation | Real-time document collection search engine with phrase indexing |
US5845278A (en) * | 1997-09-12 | 1998-12-01 | Inioseek Corporation | Method for automatically selecting collections to search in full text searches |
US6415282B1 (en) * | 1998-04-22 | 2002-07-02 | Nec Usa, Inc. | Method and apparatus for query refinement |
US6363373B1 (en) * | 1998-10-01 | 2002-03-26 | Microsoft Corporation | Method and apparatus for concept searching using a Boolean or keyword search engine |
US6549897B1 (en) * | 1998-10-09 | 2003-04-15 | Microsoft Corporation | Method and system for calculating phrase-document importance |
US6473753B1 (en) * | 1998-10-09 | 2002-10-29 | Microsoft Corporation | Method and system for calculating term-document importance |
US7181438B1 (en) * | 1999-07-21 | 2007-02-20 | Alberti Anemometer, Llc | Database access system |
US7130848B2 (en) | 2000-08-09 | 2006-10-31 | Gary Martin Oosta | Methods for document indexing and analysis |
US6823331B1 (en) | 2000-08-28 | 2004-11-23 | Entrust Limited | Concept identification system and method for use in reducing and/or representing text content of an electronic document |
US20020078045A1 (en) * | 2000-12-14 | 2002-06-20 | Rabindranath Dutta | System, method, and program for ranking search results using user category weighting |
US6694331B2 (en) | 2001-03-21 | 2004-02-17 | Knowledge Management Objects, Llc | Apparatus for and method of searching and organizing intellectual property information utilizing a classification system |
US6952700B2 (en) * | 2001-03-22 | 2005-10-04 | International Business Machines Corporation | Feature weighting in κ-means clustering |
US20020138473A1 (en) | 2001-03-26 | 2002-09-26 | Whewell Jean E. | Preliminary patent prosecution reports |
US20020186252A1 (en) * | 2001-06-07 | 2002-12-12 | International Business Machines Corporation | Method, apparatus and computer program product for providing context to a computer display window |
US7194693B2 (en) | 2002-10-29 | 2007-03-20 | International Business Machines Corporation | Apparatus and method for automatically highlighting text in an electronic document |
US6842182B2 (en) | 2002-12-13 | 2005-01-11 | Sun Microsystems, Inc. | Perceptual-based color selection for text highlighting |
US20050010559A1 (en) | 2003-07-10 | 2005-01-13 | Joseph Du | Methods for information search and citation search |
US20050210042A1 (en) | 2004-03-22 | 2005-09-22 | Goedken James F | Methods and apparatus to search and analyze prior art |
US7702611B2 (en) * | 2005-01-07 | 2010-04-20 | Xerox Corporation | Method for automatically performing conceptual highlighting in electronic text |
US7716226B2 (en) * | 2005-09-27 | 2010-05-11 | Patentratings, Llc | Method and system for probabilistically quantifying and visualizing relevance between two or more citationally or contextually related data objects |
US7770100B2 (en) * | 2006-02-27 | 2010-08-03 | Microsoft Corporation | Dynamic thresholds for conditional formats |
US8131747B2 (en) * | 2006-03-15 | 2012-03-06 | The Invention Science Fund I, Llc | Live search with use restriction |
US8725729B2 (en) * | 2006-04-03 | 2014-05-13 | Steven G. Lisa | System, methods and applications for embedded internet searching and result display |
US8019754B2 (en) * | 2006-04-03 | 2011-09-13 | Needlebot Incorporated | Method of searching text to find relevant content |
US20080103886A1 (en) * | 2006-10-27 | 2008-05-01 | Microsoft Corporation | Determining relevance of a term to content using a combined model |
US7912875B2 (en) * | 2006-10-31 | 2011-03-22 | Business Objects Software Ltd. | Apparatus and method for filtering data using nested panels |
US20080114750A1 (en) * | 2006-11-14 | 2008-05-15 | Microsoft Corporation | Retrieval and ranking of items utilizing similarity |
US8065307B2 (en) * | 2006-12-20 | 2011-11-22 | Microsoft Corporation | Parsing, analysis and scoring of document content |
US7840604B2 (en) | 2007-06-04 | 2010-11-23 | Precipia Systems Inc. | Method, apparatus and computer program for managing the processing of extracted data |
US9053195B2 (en) * | 2007-07-19 | 2015-06-09 | Grant Chieh-Hsiang Yang | Method and system for user and reference ranking in a database |
US7925652B2 (en) * | 2007-12-31 | 2011-04-12 | Mastercard International Incorporated | Methods and systems for implementing approximate string matching within a database |
US20090276694A1 (en) | 2008-05-02 | 2009-11-05 | Accupatent, Inc. | System and Method for Document Display |
US8606815B2 (en) * | 2008-12-09 | 2013-12-10 | International Business Machines Corporation | Systems and methods for analyzing electronic text |
WO2011044578A1 (en) | 2009-10-11 | 2011-04-14 | Patrick Walsh | Method and system for performing classified document research |
-
2010
- 2010-10-12 WO PCT/US2010/052315 patent/WO2011044578A1/en active Application Filing
- 2010-10-12 US US13/501,370 patent/US8739032B2/en not_active Expired - Fee Related
- 2010-10-12 US US13/501,362 patent/US20120197910A1/en not_active Abandoned
- 2010-10-12 WO PCT/US2010/052321 patent/WO2011044579A1/en active Application Filing
-
2014
- 2014-04-14 US US14/252,393 patent/US20140229475A1/en not_active Abandoned
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070027902A1 (en) * | 1999-03-31 | 2007-02-01 | Verizon Laboratories Inc. | Semi-automatic index term augmentation in document retrieval |
US20060089924A1 (en) * | 2000-09-25 | 2006-04-27 | Bhavani Raskutti | Document categorisation system |
US20060187027A1 (en) * | 2005-02-08 | 2006-08-24 | User-Centric Enterprises, Inc. | Electronically tracking a path history |
US20070188320A1 (en) * | 2005-02-08 | 2007-08-16 | User-Centric Ip, Lp | Electronically tracking a path history |
US20080162476A1 (en) * | 2006-12-27 | 2008-07-03 | Fujitsu Limited | Medium storing document retrieval program, document retrieval apparatus and document retrieval method |
US20080249999A1 (en) * | 2007-04-06 | 2008-10-09 | Xerox Corporation | Interactive cleaning for automatic document clustering and categorization |
US20090287779A1 (en) * | 2008-05-15 | 2009-11-19 | Internationa Business Machines Corporation | Methods and systems to selectively mark email as partially reviewed |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120059851A1 (en) * | 2010-03-05 | 2012-03-08 | Hans Lercher | Function-Oriented Mapping of Technological Concepts |
US20210064621A1 (en) * | 2019-09-04 | 2021-03-04 | Wertintelligence | Optimizing method of search formula for patent document and device therefor |
Also Published As
Publication number | Publication date |
---|---|
US8739032B2 (en) | 2014-05-27 |
WO2011044578A1 (en) | 2011-04-14 |
US20140229475A1 (en) | 2014-08-14 |
WO2011044579A1 (en) | 2011-04-14 |
US20120204104A1 (en) | 2012-08-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20120197910A1 (en) | Method and system for performing classified document research | |
CN102226900B (en) | Phrase-based searching in an information retrieval system | |
US9146999B2 (en) | Search keyword improvement apparatus, server and method | |
Janssens et al. | Novel citation-based search method for scientific literature: a validation study | |
EP0979470B1 (en) | Method and apparatus for searching a database of records | |
US7401087B2 (en) | System and method for implementing a knowledge management system | |
US7788265B2 (en) | Taxonomy-based object classification | |
US6725217B2 (en) | Method and system for knowledge repository exploration and visualization | |
US20040015481A1 (en) | Patent data mining | |
US20100125566A1 (en) | System and method for conducting a patent search | |
US20060288001A1 (en) | System and method for dynamically identifying the best search engines and searchable databases for a query, and model of presentation of results - the search assistant | |
US20070198578A1 (en) | Patent mapping | |
US20100174704A1 (en) | Searching method and system | |
US20080228752A1 (en) | Technical correlation analysis method for evaluating patents | |
US8583679B2 (en) | Method of providing by-viewpoint patent map and system thereof | |
EP2394228A1 (en) | Method and apparatus for real time text analysis and text navigation | |
US20120179709A1 (en) | Apparatus, method and program product for searching document | |
WO2007011129A1 (en) | Information search method and information search apparatus on which information value is reflected | |
JP5943756B2 (en) | Search for ambiguous points in data | |
US20010051942A1 (en) | Information retrieval user interface method | |
JP2013174988A (en) | Similar document retrieval support apparatus and similar document retrieval support program | |
US20140201193A1 (en) | Intellectual property asset information retrieval system | |
US20080228725A1 (en) | Problem/function-oriented searching method for a patent database system | |
Janssens et al. | Novel citation-based search method for scientific literature: a validation study | |
Doğan | Google Scholar as a data source for research assessment in the social sciences |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STCV | Information on status: appeal procedure |
Free format text: EXAMINER'S ANSWER TO APPEAL BRIEF MAILED |
|
STCV | Information on status: appeal procedure |
Free format text: ON APPEAL -- AWAITING DECISION BY THE BOARD OF APPEALS |
|
STCV | Information on status: appeal procedure |
Free format text: BOARD OF APPEALS DECISION RENDERED |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- AFTER EXAMINER'S ANSWER OR BOARD OF APPEALS DECISION |