US20120197910A1

US20120197910A1 - Method and system for performing classified document research

Info

Publication number: US20120197910A1
Application number: US13/501,362
Authority: US
Inventors: Patrick Sander Walsh
Original assignee: Individual
Current assignee: Individual
Priority date: 2009-10-11
Filing date: 2010-10-12
Publication date: 2012-08-02
Also published as: US8739032B2; WO2011044578A1; US20140229475A1; WO2011044579A1; US20120204104A1

Abstract

A system and method for efficiently and accurately identifying relevant document classifications is contemplated. The document analysis system receives classified reference documents along with a relevancy indicator for each document and generates sensory indicators that assist a researcher in identifying relevant classifications that have not been previously researched. In one aspect, the document analysis system generates a table of classifications, the classifications being determined by scoring of each classification cited within each relevant document. The system then determines a sensory indicator (e.g. a color) for each classification that indicates the extent to which the classification has been previously searched. The classification analysis window thus allows the researcher to quickly determine (e.g. by visual inspection) which classification codes have been cited most frequently as well as which classification codes require further search. In this manner the researcher may quickly determine where to direct a next iteration of a search.

Description

RELATED APPLICATIONS

This application claims the benefit of U.S. Provisional Patent Application Ser. No. 61/250,557 filed on Oct. 11, 2009 by the inventor of the present invention, the entire contents of which are incorporated herein by reference.

FIELD OF THE INVENTION

The present invention relates to the field of document research, and more particularly to methods and systems for locating relevant classifications.

BACKGROUND

Document research involves finding relevant subject matter within a set of documents as may be found in a document repository. Search engines, for example, use “key” words or phrases as search arguments to locate text passages containing those words or phrases. Classification systems provide another means for assessing context. In a classification system, documents with common threads are grouped together in classes. A field of context, therefore, can be narrowed by selecting relevant classes. Patents and patent-related documentation databases are examples of database repositories that implement classification systems. The most commonly used classification system for patents and published patent applications, at least in the U.S., is the USPTO (United States Patent and Trademark Office) Patent Classification System. Two other classification systems in common usage on the international scene include: the “IPC” (International Patent Classification) and the “ECLA” (European Classification).
Documentation classifications systems provide a means for improving the productivity of a document researcher. However, in many large-scale databases the classification system itself may be complex. Patents and patent-related documentation databases provide examples of such large-scale classified database systems with corresponding complex classification systems. The USPTO classification system currently comprises at least 984 classes and numerous digests (collections of certain subjects) within each class. Each class is broken into subclasses; each subclass may be further broken into subclasses and so on. Patents are thus grouped into categories, which are broken down into sub-categories, and sub-categories into more sub's, as required. The USPTO examiners decide the class/subclass in which to file a particular invention. To add further complexity, any one invention can be filed in more than one class/subclass, and most are filed in several classes/subclasses.
The challenge of performing document research in such a large-scale document repository, therefore is to develop an experienced understanding of the classification system. Existing classification analysis tools provide some assistance in navigating classification. See for example U.S. Pat. No. 7,333,984 to Oosta. A counting and sorting technique is shown in FIG. 8. However, the analysis is broad, and does not show a researcher where he or she needs to search, which is important because a patent search involves many iterations of polling a database, and with each iteration, the researcher should progressively narrow the size of the field of interest.
U.S. Patent Application 20020022974 to Lindh shows a method for display of patent information that involves applying statistical analysis to groups of references containing classifications. Lindh does not show additional cross referencing to a search history in order to locate unsearched classifications, which again is important in progressively narrowing and focusing a search.
U.S. Patent Application 20090313221 to Chen shows a patent technology association classification method. While Chen has shown the method of removing classifications and counting frequency, Chen fails to show the additional function of comparing classification frequencies to search histories, nor does Chen show additional broad and narrow reporting schemes for use at different stages of a patent search.
U.S. Patent Application 20080228724 to Huang et al. seek to assist a researcher in performing classification-based research. Huang shows a technical classification method for searching patents, which includes generating counts from a group of references. The method shows the researcher a quality of a search, but falls short in that Huang does not assist the researcher in locating additional classification areas to search in a next iteration.
U.S. Patent Application 20020073095 to Ohga shows a patent classification displaying method and apparatus having some similarities to the present invention. As seen in FIG. 4, the apparatus provides a classification counting system, wherein the most frequently occurring codes are sorted to the top of the list. Other systems, such as Thompson Delphion, have reporting features like this. Several critical components are however missing when viewed next to the present invention. First, the classification codes on the report should be cross referenced against a running tally of codes kept by the researcher in a given search project. With this additional function, the researcher sees not only relevant classifications, but also classifications that have not been searched yet. In addition, Ohga fails to show additional modes of class counting and weighting that are used at different stages of a patent research project, such that the researcher can use broad analysis in the beginning and narrow analysis during the iterative part of the patent search.
U.S. Patent Application 20010027452 to Tropper shows a system and method to identify documents in a database which relate to a given document by using recursive searching and no keywords. While Tropper realizes the benefits of using latest search results to form new searches, he fails to teach the accumulation of classification codes, weighting the codes, ranking of the codes and then comparing the rankings to the researchers search history.
A need thus exists for an improved classification analysis system, not only for the less-experienced document researcher, but also for the efficiency of those with established skill and experience with a particular classification system. Embodiments of the present invention address many of the shortfalls in the prior art while presenting, what will hereinafter become apparent to be, a pioneering document analysis technology.

BRIEF SUMMARY OF THE PRESENT INVENTION

It is a first object of the present invention to provide a classification analysis system that equips a researcher with broad scope reporting for the initial phase of a search project. It is a second object to enable the researcher to progressively narrow the scope of the search project. Yet another object of the present invention is to enable the researcher to track a classification search history such that duplication is avoided. Still another object of the present invention is to provide a system of narrow classification analysis cross referenced against the classification search history. Yet another object of the present invention is to enable the researcher to effectively cycle through the narrow phase of a search project. Still another object of the present invention is to provide a system that permits the researcher to confidently end a classification based search project.
The present invention provides a system and method for efficiently and accurately identifying relevant document classifications. The system receives one or more classified reference documents in a document set along with a relevancy indicator for each document. The system retrieves all document classifications from the document set, and arranges a classification analysis interface. The researcher has four modes for the interface, which are called: Main, Parents, Subclass, and Primary mode—wherein Main is the broadest and Primary is the narrowest. The researcher is provided GUI tools to select classification codes from the classification analysis interface, and add them to a classification search history which is stored along with the document set in a project file.
In use, the researcher uses the Main and the Parents mode during the first hour of the search project, and the Subclass mode for the remaining 3-4 hours. In the Main mode, the researcher is shown occurrence of main classes in the document set, which provides a broad base for class/text searching. In Parents mode, the researcher is shown common occurrence of parent sub-classifications of the document classifications, while the document classifications are not shown. With this information, the researcher can inspect child classifications of the parents in a classification schedule. For the bulk of the search project, the researcher uses the Subclass mode. In the Subclass mode, the document classifications are collected, counted, scored, and sorted—providing the researcher quick viewing of potentially relevant classifications. Once the researcher locates potentially relevant classifications, he or she executes searches in the newly located classifications, and then adds documents along with relevancy indicators to the expanding document set. The researcher then re-executes Subclass Mode classification analysis on the document set. The classification analysis module scores classification codes and then cross references against the classification search history. The resulting classification analysis interface is displayed along with various sensory indicators (e.g. a color) that show the researcher relevant classifications that are 1) un-searched, 2) partially searched, or 3) fully searched. In this manner the researcher may quickly determine where a next iteration in the search project should be directed. The researcher may continuously iterate through the process of locating new classification areas, searching the new classification areas, augmenting the document set with new documents, and then using the classification analysis tool to locate additional unsearched classification areas. The researcher is encouraged to add many (ie. 50-100) documents to the project file using a document management interface to tag even moderately relevant documents for the purpose of utilizing many hundreds of classification codes in the scoring. The process continues until the top 5-10 classifications presented by the classification analysis interface are indicated as fully searched, at which point the search project can be brought to a close. With the present invention, important classification areas are very difficult to overlook, regardless of the experience level of the researcher.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1A is a block diagram illustrating a document research system in accordance with an exemplary embodiment of the invention.

FIG. 1B is a sample of a document.

FIG. 2A is an interface diagram in accordance with an exemplary embodiment of the invention.

FIG. 2B is an interface diagram in accordance with an exemplary embodiment of the invention.

FIG. 2C is an interface diagram in accordance with an exemplary embodiment of the invention.

FIG. 2D is an interface diagram in accordance with an exemplary embodiment of the invention.

FIG. 2E is an interface diagram in accordance with an exemplary embodiment of the invention.

FIG. 2F is an interface diagram in accordance with an exemplary embodiment of the invention.

FIG. 2G is an interface diagram in accordance with an exemplary embodiment of the invention.

FIG. 2H is a diagram of a project file created and used by the present invention.

FIG. 2I is an interface diagram in accordance with an exemplary embodiment of the invention.

FIG. 2J is an interface diagram in accordance with an exemplary embodiment of the invention.

FIG. 3A is a flow diagram illustrating a process that may be carried out in accordance with the exemplary system of FIG. 1A.

FIG. 3B is a flow diagram illustrating a process that may be carried out in accordance with the exemplary system of FIG. 1A.

FIG. 3C is a flow diagram illustrating a process that may be carried out in accordance with the exemplary system of FIG. 1A.

FIG. 3D is a search indicator to sensory indicator color scheme table.

FIG. 4 is a block diagram illustrating a document analysis system in accordance with another exemplary embodiment of the invention.

FIG. 5 is a flow diagram illustrating a process that may be carried out in accordance with the exemplary system of FIG. 1A.

DETAILED DESCRIPTION

Reference will now be made in detail to the present exemplary embodiments of the invention, examples of which are illustrated in the accompanying drawings.
Referring to FIG. 1A, a block diagram is shown illustrating a document analysis system 1000, or search system, in accordance with an exemplary embodiment of the invention. The document analysis system 1000 comprises a client device 1010, which may be a computer. The client device 1010 includes a classification analysis module 1012, an interface module 1014 and a user Input/Output (I/O) interface 1018. By way of example, the client device 1010 may be a computing device having a processor such as personal computer, a phone, a mobile phone, or a personal digital assistant. The document analysis system 1000 may also comprise a document provider 1030, a classification data provider 1040 and a network 1020. The document provider 1030 is configured to deliver one or more documents, labeled generally as 1032. By way of example, the documents 1032 may be electronic files containing patent data or any type of electronic file that contains textual data. See FIG. 1B for an example of a document 1032. As seen, the document 1032 has multiple document classifications 135 that are further divided into a class 136 and a subclass 137. In addition, notice the body of the document is composed of multiple sections (eg. Abstract, description, claims), and that section are further divided into paragraphs 138. The textual data of each document 1032 includes content data and one or more classification codes 135. The document provider 1030 may be a remote server and may also include a search engine 1034 for retrieving the one or more documents 1032 from a document data repository (not shown) based on a search query. By way of example, the search engine may be that provided by the United States Patent and Trademark Office (USPTO) FreePatentsOnLine, Micropatent®, Delphian®, PatentCafe®, Thompson Innovation or Google®. The document provider 1030 may retrieve the document data from a local repository or from one or more remote documents repositories. Examples of such a document repository include patent databases including those provided by EP (European patents), WO (PCT publications), JP (Japan abstracts) and DWPI (Derwent World Patent Index for patent families). Moreover, the document provider 1030 may be cloud based bulk storage system, such as Amazon Simple Storage Service.
The classification data provider 1040 is configured to provide access to a classification data repository 1042. The classification data repository 1042 may be a database or file storage element that stores hierarchical classification data entries 1044. Each classification data entry 1044 includes a classification code. Each classification data entry 1044 may also include a classification code description field. The classification data provider 1040 may be a remote server provided by the United States Patent and Trademark Office (USPTO). The classification data may be representative of a document classification system such as the Manual of Classification issued by the USPTO. The classification data provider may retrieve the document data from a local repository or from one or more remote documents repositories. It is noted that while shown as separate components, the document provider and classification data provider may be co-located on a single remote server.
The interface module 1014 is configured to receive one or more documents 1032 from the document provider 1030, and to retrieve classification data 1044 from the classification data provider 1040 by way of network 1020. By way of example, the network may be the Internet. The interface module 1014 may alternatively be configured to receive the documents 1032 or classification data 1044 through the user I/O interface 1018. In such an embodiment, the documents 1032 may be stored on a portable storage device (not shown) such as a CD, DVD or solid state device and the user I/O interface 1018 may include a communications interface such as a wireless interface, a CD/DVD drive or a USB drive for retrieving data from the personal storage device. The documents 1032 may alternately be paper-based documents and may be provided to the interface module 1014 by use of a scanner (not shown) that is configured with the I/O interface 1018. The client device 1010 may also include a data storage element 1016, which may be at least one of a computer readable medium and a memory. The interface module 1014 may also be configured to receive a set of one or more concepts from a researcher by way of the I/O interface 1018. The I/O interface 1018 may also include at least one input device such as a keyboard, mouse, microphone or a touch screen for receiving the concepts from the researcher. Each concept is comprised of one or more text-based keywords or sets of text-based keywords which are used to determine the relevancy each of the documents 1032. The client device 1010 may alternatively include a document analysis module that generates statistical data based on the user-defined concepts and the documents 1032. The statistical data may be used by the researcher to quickly assess the relevancy of each document 1032 to each of the user-defined concepts. The document analysis module may transmit the statistical data to the interface module 1014 which presents the data to the researcher by way of the I/O interface 1018. The I/O interface 118 may also include a display such as an LCD or CRT monitor configured to display a graphical user interface (GUI) for presenting information such as the statistical data to the researcher. The GUI will now be discussed in greater detail.
Referring now to FIG. 2A, FIG. 2B, and FIG. 2C, FIG. 2D, FIG. 2E, FIG. 2F and FIG. 2G, diagrams are shown illustrating a document analysis GUI 200 in accordance with an exemplary embodiment of the invention. FIG. 3A which illustrates an exemplary method 1300 for performing classification-based document analysis will also be discussed. At a first step labeled as 1310, the interface module 1014 may receive concept data from the researcher. The interface module 1014 first generates a document analysis GUI 200 and displays the GUI 200 to the researcher by way of the display device included with user I/O interface 1018. As shown in FIG. 2A, the document analysis GUI 200 includes a document relevance interface 220, a document management interface 250, and a document image window 254. As seen in FIG. 2F, the researcher may start a research project by entering one more concepts 272. Each concept 272 may have one or more words or word groups associated therewith. As shown in FIG. 2B, the document analysis GUI 200 includes a keyword entry interface 210. The keyword entry interface 210 comprises multiple rows of alphanumeric entry fields 212. One or more keywords 213 may be entered by a researcher into each entry field 212, wherein each keyword 213 is conceptually related such that each line represents a keyword group 214. The researcher is also provided with a user thesaurus 211 and web thesaurus 219. The user thesaurus 211 can be edited and stored in the data storage element 1016, and the web thesaurus 219 may be accessed through the network 1020 by the interface module 1014. Five alphanumeric entry fields 212 are shown to be filled in FIG. 2B. Each concept 272 and corresponding keyword group 214 may be determined manually by the researcher or may be received from an external source. By way of example, the concepts may be reduced to a manageable number of concepts (e.g. 4-5 concepts). Keywords 213 may then be chosen for each of the concepts and entered into one of the alphanumeric fields 212 to form the keyword group 214. After entering each of the desired concepts, the researcher may then exit the keyword entry interface 210 and proceed to analysis of a set of documents based on the user-defined concepts.
At a next step labeled as 1320 the interface module 1014 will receive one or more reference documents 1032. As discussed the interface module 1014 is configured to receive one or more documents 1032 from the document provider 1030 by way of network 1020. The interface module 1014 may be configured to allow the researcher to request a predetermined set of documents 1032. By way of example, the researcher may initiate a request for a specific set of patent documents or a set of patent documents that fall within a specific category or classification. The researcher may also initiate a search of a remote document repository through a search interface window 230 (shown in FIG. 2D) provided by the document analysis GUI 200. The search may be initiated by entering a set of search parameters, such as keywords, into one or more search fields 232 located on the search interface window 230. Boolean operators, wildcards and proximity indicators may be used to link the keywords together in logic sets. The search interface window 230 may also provide a search assistance window 234 that allows the previously defined keywords 213 to be added to the set of search parameters in response to a single user action (e.g. a mouse click). The search assistance window 234 thereby facilitates the loading of search parameters into the one or more search fields 232. In addition, the researcher is provided with a classification search history 290, which contains a table for documenting the search project strategy (discussed in detail later). The researcher may pick classification codes from the classification search history 290. As discussed, the interface module 1014 may alternatively be configured to receive one or more documents 1032 through the user I/O interface 1018. In such an embodiment, the documents 1032 may be stored on a portable storage device (not shown) such as a CD, DVD or solid state device and the user I/O interface 1018 may include a communications interface such as a wireless interface, a CD/DVD drive or a USB drive for retrieving data from the personal storage device. Upon receiving the one or more reference documents the interface module 1014 will populate a document management table 252 located on a document management interface 250 (shown in FIG. 2E) with selectable rows 253 each having information descriptive of one of the received documents 1032. By way of example, each row may include a reference document number 255 and document title 256.
At a next step labeled as 1330 the interface module 1014 receives and stores data from the researcher that indicates relevancy of a currently selected document 1032 to the one or more user-defined concepts. As discussed, the interface module 1012 will populate the document management table 252 (shown in FIG. 2E) with selectable rows 253 each having information descriptive of one of the received reference documents. In the exemplary embodiment, the document management table 252 also includes one or more additional columns for allowing the researcher to indicate (by way of a mouse-click or similar navigation event) the relevance of the currently selected document. Each row of the document management table 252 may have a relevancy column 257 that contains an input field for indicating an overall relevance of the associated reference document. By way of example the interface module 1014 may provide the researcher with the ability to select an indicia (e.g. using a drop-down menu list) such as “A” for highest relevance, “B” for suspected relevance, and “C” for uncertain relevance. Irrelevant documents may be marked with an “I” to place a marker in a project file 205 (FIG. 2H) indicating that a reference document was reviewed. Each row of the document management table 252 may also have one or more additional columns labeled generally as 258 that contain an input field for indicating whether a specific concept has been verified to appear in the currently selected reference document. The interface module 1014 may provide the researcher with the ability to toggle a field (one such field is labeled as 259) corresponding to a specific concept “on” or “off” (e.g. by a mouse-click) when indicating whether a particular concept 272 does or does not exist inside the selected document. A column may be provided for each of the previously discussed concepts 272. As discussed, the interface module 1014 may provide the researcher with a concept management window 270 (see FIG. 2F) for allowing the researcher to define different concepts 272 which the additional columns 258 may be derived from. In this manner, the researcher is able to track higher-level or more abstract concepts than were initially defined and may also provide more user-friendly naming of the concepts 272 (useful for example for report generation). The interface module 1014 may also store the previously discussed relevancy indicators in the project file 205, which is located in the data storage element 1016 in FIG. 1A. By storing each of the indicators, the interface module 1014 is able to provide information to the classification analysis module 1012. The classification analysis module 1012 will now be discussed in greater detail.
At a next step labeled as 1340 classification analysis begins with the interface module 1014 first displaying a classification analysis interface 280, which is shown in FIG. 2G. The classification analysis interface 280 can include a classification search history 290, which is retrieved by the interface module 1014 from the project file 205. The classification search history 290 shows a previously identified classification code 291 and a corresponding previously identified classification title 292. Each previously identified classification code 291 also has a search extent indicator 294 and a search status indicator 293, both of which can be manipulated by the researcher to various states. By way of example, if the researcher has already searched or plans to search previously identified classification code 291 in its entirety, he or she may indicate this with the word “Yes” in the search extent indicator 294. In addition, the researcher may keep record of which previously identified classification codes 291 have been properly addressed with either text limited searching or full searching by similarly indicating in the search status indicator 293. The classification analysis interface 280 may include a document selection field 281 and a classification analysis mode selection field 282. The document selection field 281 provides one or more options to the researcher for selecting a set of documents which the classification analysis will be performed on. By way of example, the researcher may select all documents in the project file 205 that have previously been indicated to be relevant to any of the concepts 272 (i.e. all documents selected in any of columns 258), all documents relative to a specific concept (i.e. all documents selected in one of columns 258) or documents that have been indicated to have a specific overall relevance (e.g. all documents having a relevancy of “A’ from relevancy column 257). The classification analysis interface 280 also has a class weighting 286 option and a relevancy weighting 287 option. The class weighting 286 instructs the classification analysis module 1012 to account for total size of a classification, which balances the effect of large classifications overshadowing smaller classifications in un-weighted frequency counts. The relevancy weighting 287 allows the researcher to assign greater weight in the scoring to documents 1032 of higher relevance recorded in the relevancy column 257. The classification analysis mode selection field 282 provides one or more options to the researcher for selecting the mode of classification analysis to be performed. The most common mode is the Subclass mode which is discussed in the next step. (Detailed discussions all four modes are found immediately following.)
Step 1340 may proceed after the researcher confirms the previously described classification analysis options. The interface module 1014 then instructs the classification analysis module 1012 to perform classification analysis on the selected set of documents. Referring back to FIG. 1B, documents 1032 have one or more document classifications 135 associated therewith, which can be further divided into a class 136 and a subclass 137. The classification analysis module 1012 will retrieve the document classifications 135 from each document and then generate a count of instances of each document classifications 135 over the entire selected set. The classification analysis module 1012 will then send each document classification 135 and its corresponding count or score to the interface module 1012 to be displayed (step 1350) via the classification analysis interface 280 where each unique code will be displayed in a separate row. The unique codes may be displayed in a classification code column 284 while the corresponding score will be displayed in a classification score column 283. The rows may be sorted based on the score of each unique code. In an alternative embodiment discussed later, the score for each code may be multiplied by a weighting factor that accounts for the size of each subclass (ie the number of documents in the subclass) or by a weighting factor that accounts for the document relevance. The interface module 1014 may also retrieve a classification description for each unique code from the classification data provider 1040, using each unique classification code to look up the corresponding classification code entry 1044. The classification description may also be displayed in a classification title column 285 of the classification analysis interface 280. The classification analysis module 1012 will use a search indicator to sensory indicator table 241, as seen in FIG. 3D, to determine a sensory indicator (e.g. a color) for each unique classification code that appears in the classification analysis interface. The classification analysis module 1012 determines the sensory indicator by first determining whether the corresponding classification code has been previously searched and to what extent. If a code appears in the classification code column 284, and does not appear as a previously identified classification code 291 in the classification search history 290, then the code is assumed to be unsearched. If a code appears in the classification code column 284, and also appears as a previously identified classification code 291, and the corresponding search status indicator 293 shows “No”, then the code is assumed to be at partially searched. If a code appears in the classification code column 284, and appears as a previously identified classification code 291, and the corresponding search status indicator 293 shows “Yes”, then the code is assumed to be fully searched. The sensory indicator may be a green highlighting if the code is unsearched, a yellow highlighting if it has been partially searched, or a red highlighting if the code has been fully searched. The classification analysis window 280 thus allows the researcher to quickly determine (e.g. by visual inspection) which classification codes have been cited most frequently as well as which classification codes have not yet been searched. In this manner the researcher may very quickly determine where a next iteration of a search project should be directed.
At step 1350 the researcher will determine whether to add a new classification code to the search project. The researcher is provided the ability to quickly add entries to the classification search history 290 directly from the classification code column 284 using a mouse click. In doing so, the process will return to step 1320, as indicated by dashed arrow 1360, at which point the interface module 1014 provides a new search inquiry to the document provider 1030 and a new set of reference documents 1032 will be received. Each of steps 1330 through 1350 are repeated to determine the relevancy of the new set of reference documents to the user-defined concepts and whether the search should be expanded to a new classification. Steps 1320 through 1350 may be repeated until the researcher is satisfied that the most relevant classes have been searched. By way of example, the researcher may make this determination when a threshold number of the most frequently occurring classifications are highlighted in red, which indicates that all are present on the classification search history 290, and all are indicated as complete by the search status indicator 293. By way of example, the threshold may be least ten red highlighted classifications in the classification analysis interface 280.
Modes of Operation: As discussed the classification analysis performed by the classification module 1012 may be performed by first specifying a mode using the classification analysis mode selection field 282. By way of example, the classification analysis modes may include: a Main Classes mode, a Subclass Parents mode, a Subclass Mode and a Primary Subclass mode. Referring to FIG. 3B, all four modes are shown, and will now be discussed in detail. In addition, FIG. 3C shows the process of FIG. 3B along with actual numbers. Steps 701-706 are run in all modes, and will be discussed first.
As seen at step 701, the classification analysis module 1012 retrieves the documents 1032 from the project file 205. The documents are then filtered according to the preference of the researcher using document selection field 281. As an example, the researcher may run just “B” tagged documents or just documents having a specific element tagged in the document management table 252. Next at step 702, the classification analysis module 1012 compiles all document classifications 135 into a 2D-Array 750 containing document classification 135, relevancy, score, and primary (see for example array 750 in FIG. 3C). The relevancy is originally set by the researcher in relevancy column 257 as A,B,C,D, or E. Score is initially set to 1. Primary is an indication as to whether the document classification 135 is the first listed. Next at step 703, if the class weighting 286 is turned on, then move to step 704. At step 704, the interface module 1014 requests the classification size (ie. the total number of documents currently classified therein) for each classification in the 2D-Array 750 from the classification data provider 1040. Next the classification analysis module 1012 divides the score in 2D-Array 750 by the classification size, which effectively weights each classification inversely according to classification size. Next at step 705, if the relevancy weighting 287 is turned on, then move to step 706. At step 706, the classification analysis module 1012 multiplies the score in 2D-Array 750 by a relevancy factor according to the relevancy listed in 2D-Array 750. Current relevancy factors are A=1.5,B=1,C=0.75,D=0.5,E=0.5.
Main Classes Mode: If classification analysis mode selection field 282 is set to “Main” then proceed through step 717 to step 718. At step 718, the document classifications 135 in the 2D-Array 750, are rewritten to show only the classes 136. Next, at step 718, the 2D-Array 750 is rearranged by summing the scores of repeat classification entries and eliminating all repeats. The 2D-Array 750 is then sorted high to low according to score, and the class description is added for step 720, which is the display in interface 280. See FIG. 2 i for an example of the interface 280 after a run in Main Classes mode.
SubClass Parents Mode: If classification analysis mode selection field 282 is set to “Subclass Parents” then proceed through step 714 and on to step 715. Next, the classification analysis module 1012 requests all ancestors of the document classifications 135 in the 2D-Array 750 from the classification data provider 1040 via the interface module 1014. The ancestors are then inserted into the 2D-Array 750, and simultaneously the original document classifications are deleted from the 2D-Array 750. Next, at step 716, the 2D-Array 750 is rearranged by summing the scores of repeat classification entries and eliminating all repeats. The resulting table is displayed in the classification analysis interface 280. See FIG. 2J for an example of the interface 280 after a run in SubClass Parents mode.
SubClass Mode: If classification analysis mode selection field 282 is set to “Subclass” then proceed through step 710 and on to step 711. Next, the classification analysis module 1012 rearranges the previously generated 2D-Array 750 by summing the scores and eliminating repeats. The resulting 2D-Array 750 is sorted according to score from high to low. Next, at step 712, the classification analysis module 1012 compares all rows in 2D-Array 750 to all rows of the classification search history 290, and assigns colors according to the following scheme (see also FIG. 3D for the scheme): 1) if a classification is in 2D-Array 750 and is not in the classification search history 290 then assign green, 2) if a classification is in 2D-Array 750 and is in the classification search history 290 with a search status 293 of “No” and a search extent 294 of “No” then assign light yellow, 3) if a classification is in 2D-Array 750 and is in the classification search history 290 with a search status 293 of “Yes” then assign red. 2) if a classification is in 2D-Array 750 and is in the classification search history 290 with a search status 293 of “No” and a search extent 294 of “Yes” then assign bright yellow. At step 720, the resulting table is displayed along with the color scheme in the classification analysis interface 280. See FIG. 2G for an example of the interface 280 after a run in SubClass mode.
Primary Mode: If classification analysis mode selection field 282 is set to “Primary” then proceed through step 707 and on to step 708. Next, the classification analysis module 1012 sorts through 2D-Array 750 and removes all but the entries labeled as primary. At step 720, the resulting table is displayed in the classification analysis interface 280.
Referring to FIG. 4, a block diagram is shown illustrating a document analysis system 800 in accordance with another exemplary embodiment of the invention. The document analysis system 800 is similar to the document analysis system of FIG. 1A however provides a client-server architecture. Accordingly, document analysis system 800 includes a client device 810 and a server device 880. The server device 880 may be a computing device having a processor such as personal computer or may be implemented on a high performance server, such as a HP, IBM or Sun computer using an operating system such as, but not limited to, Windows, Solaris or UNIX. The server device 880 includes a classification analysis module similar in function to the document analysis module of 1012 of the embodiment of FIG. 1A.
Thus, a document analysis system having the benefits of allowing for efficient and accurate identification of potentially relevant classifications is contemplated. Referring now to FIG. 5, an exemplary method 2100 of performing a patent search using multiple modes of the present invention comprised of the following:
Step 2101: Synthesizing a proposition into one or more key concepts 272;
Step 2102: Developing one or more keyword groups 214 based on the key concepts 272;
Step 2103: Conducting a text search with text search inquiry over a database of documents having text, images and one or more document classifications 135 therein using the keyword groups 214;
Step 2104: Compiling a search file of documents 1032 from the text search inquiry;
Step 2105: Selecting a first set of documents from the file of documents 1032 and creating a project file 205;
Step 2106: Tagging documents 1032 in the project file 205 using a document management interface 250, with indicia in a relevancy column 257 and concepts 272 in additional columns 258;
Step 2107: Instructing a classification analysis module 1012 to run in Main Class Mode to locate a set of classes 136 by counting and ranking according to frequency;
Step 2108: Conducting a first class & text search over the database using the top-ranked classes 136 combined with text from the keyword groups 214;
Step 2109: Compiling a second search file of documents 1032 from the classification & text search;
Step 2110: Selecting a second set of 4-5 and appending the set to the project file 205;
Step 2111: Tagging untagged documents in the project file 250 as appropriate, and particularly the second set of documents, using a document management interface 250, with indicia in a relevancy column 257 and concepts 272 in additional columns 258;
Step 2112: Instructing the classification analysis module 1012 to run in Subclass Parents Mode to locate a second set of document classifications 135 by counting and ranking according to frequency;
Step 2113: Inspecting a classification schedule to locate potentially relevant child classifications of the second set located in step 2112 and adding said classifications to the classification search history 290;
Step 2114: Conducting a third classification & text search over the database using the classifications from 2113 combined with text from the keyword groups 214;
Step 2115: Compiling a third search file of documents 1032 from the third classification & text search;
Step 2116: Selecting a third set of 4-5 documents 1032 and appending the set to the project file 205;
Step 2117: Tagging untagged documents in the project file 250 as appropriate, and particularly the third set of documents, using a document management interface 250, with indicia in a relevancy column 257 and concepts 272 in additional columns 258;
Step 2118: Instructing the classification analysis module 1012 to run in Subclass Mode by counting and ranking document classifications 135 according to frequency and cross referencing results against the classification search history 290 to locate an nth document classification 135 to add to the classification search history 290;
Step 2119: Conducting an nth search over the database using the nth classification from step 2118 either combined with text from the keyword groups 214 or inspecting the nth classification in its entirety;
Step 2120: Compiling an nth search file of documents 1032 from the nth classification & text search;
Step 2121: Selecting all relevant documents 1032 and appending the set to the project file 205;
Step 2122: Tagging untagged documents in the project file 250 as appropriate, and particularly the nth set of documents, using a document management interface 250, with indicia in a relevancy column 257 and concepts 272 in additional columns 258;
Step 2123: Inspecting the classification search history 290 for minimum of ten document classification codes and optionally repeating from 2118 to 2123;
Step 2124: Conducting forward and backward citation search (not shown) on the selected high-relevance documents from the project file 205 and adding relevant documents to the project file;
Step 2125: End.
While the foregoing invention has been described with reference to the above-described embodiments, various modifications and changes can be made without departing from the spirit of the invention. Accordingly, all such modifications and changes are considered to be within the scope of the appended claims.

Claims

1. A search system for searching through a plurality of documents that are organized using a classification system to define each of the plurality of documents as a classified document, wherein a search is conducted based on a predetermined subject matter, the system comprising:

a program module stored on at least one of a computer readable medium and a memory of a computer, the program module comprising instructions executable by a processor of the computer to determine document classifications that are relevant to the subject matter of the search, the program module comprising a classification analysis module;

wherein the classification analysis module:

receives a set of documents, the set of documents including at least one document, each document in the set of documents having a relevancy indicator and at least one classification value, each classification value being defined as a unique classification value;

determines a score of each of the unique classification values appearing in the at least one document in the set of documents, the score being defined as a frequency of occurrence of each of the unique classification values appearing in the at least one document;

determines a search indicator for each of the unique classification values, the search indicator providing an indication of a level to which each of the unique classification values has been previously searched; and

generates and displays a table of each of the unique classification values along with at least one of the score of each of the unique classification values and the search indicator for each of the unique classification values.

2. A system according to claim 1 wherein the table is sorted based on the score of each of the unique classification values.

3. A system according to claim 1 wherein each of the unique classification values is assigned a predetermined value that corresponds to a weight of each of the unique classification values to define a weighted classification value, and wherein the weighted classification value is used to modify the score.

4. A system according to claim 1 wherein each of the unique classification values relating to a document located in the search that is determined to be a relevant document is assigned a predetermined value that corresponds to a weight of each of the unique classification values to define a weighted classification value, wherein the predetermined value is derived from the overall relevance of the document located in the search, and wherein the weighted classification value is used to modify the score.

5. A system according to claim 1 wherein each of the unique classification values relating to a document located in the search is assigned a predetermined value that corresponds to a weight of each of the unique classification values to define a weighted classification value based on the number of documents located in the classification, and wherein the weighted classification value is used to modify the score.

6. A system according to claim 1 wherein the classification analysis module separates the unique classification values to display only the unique classification values of those documents located in the search that were determined to be relevant.

7. A system according to claim 1 wherein each of the unique classification values are organized in a hierarchy providing each of the unique classification values with at least one ancestor node; and wherein each of the unique classification values is replaced with the at least one ancestor node.

8. A system according to claim 1 wherein each of the unique classification values includes a class value and a subclass value; and wherein each of the unique classification values is replaced with the class value.

9. A system according to claim 1 wherein the classification analysis module determines the search indicator for each unique classification value by receiving both an alphanumeric indicator relating to a search status of the unique classification value and an alphanumeric indicator relating to a search extent of the unique classification value.

10. A system according to claim 1 wherein the classification analysis module assigns a color to be displayed on a user interface relating to the search indicator.

11. A system according to claim 1 wherein the computer is a server and the system further comprises a client computer, the server communicatively coupled to the client computer; and wherein the program module is located on the client computer and the classification analysis module is located on the server.

12. A system according to claim 7 wherein each of the unique classification values are grouped by adding the scores of each of the unique classification values after being replaced; wherein the grouped unique classification values are sorted according to the scores; and wherein the sorted grouped unique classification values are displayed on the table.

13. A method of searching through a plurality of documents that are organized using a classification system to define each of the plurality of documents as a classified document, the plurality of classified documents being searched based on a predetermined subject matter, the method comprising:

determining record classifications that are relevant to the subject matter of the search using a program module stored on at least one of a computer readable medium and a memory of a computer, the program module comprising instructions executable by a processor of the computer to determine document classifications;

receiving a set of documents, the set of documents including at least one document, each document in the set of documents having a relevancy indicator and at least one classification value, each classification value being defined as a unique classification value;

determining a score of each of the unique classification values appearing in the at least one document in the set of documents, the score being defined as a frequency of occurrence of each of the unique classification values appearing in the at least one document;

determining a search indicator for each of the unique classification values, the search indicator providing an indication of a level to which each of the unique classification values has been previously searched; and

generating and displaying a table of each of the unique classification values along with at least one of the score of each of the unique classification values and the search indicator for each of the unique classification values.

14. A method according to claim 13 further comprising sorting the table based on the score of each of the unique classification values.

15. A method according to claim 13 further comprising assigning a predetermined value to each of the unique classification values that corresponds to a weight of each of the unique classification values to define a weighted classification value; and wherein the weighted classification value is used to modify the score.

16. A method according to claim 13 further comprising assigning a predetermined value to each of the unique classification values relating to a document located in the search that is determined to be a relevant document that corresponds to a weight of each of the unique classification values to define a weighted classification value, and wherein the weighted classification value is used to modify the score.

17. A method according to claim 13 further comprising assigning a predetermined value to each of the unique classification values relating to a document located in the search that corresponds to a weight of each of the unique classification values to define a weighted classification value based on the number of documents located in the classification, and wherein the weighted classification value is used to modify the score.

18. A method according to claim 13 further comprising separating the unique classification values to display only the unique classification values of those documents located in the search that were determined to be relevant.

19. A method according to claim 13 further comprising organizing each of the unique classification values in a hierarchy providing each of the unique classification values with at least one ancestor node; and further comprising replacing each of the unique classification values with the at least one ancestor node.

20. A method according to claim 13 wherein each of the unique classification values includes a class value and a subclass value; and further comprising replacing each of the unique classification values with the class value.

21. A method according to claim 13 further comprising determining the search indicator for each unique classification value by receiving both an alphanumeric indicator relating to a search status of the unique classification value and an alphanumeric indicator relating to a search extend of the unique classification value.

22. A method according to claim 13 further comprising assigning a color to be displayed on a user interface relating to the search indicator.

23. A method according to claim 19 further comprising grouping each of the unique classification values by adding the scores of each of the unique classification values after being replaced; sorting the grouped unique classification values according to the scores; and displaying the sorted grouped unique classification values on the table.

24. A method of searching through a plurality of documents that are organized using a classification system to define each of the plurality of documents as a classified document, the plurality of classified documents being searched based on a predetermined subject matter, the method comprising:

determining a search indicator for each of the unique classification values, the search indicator providing an indication of a level to which each of the unique classification values has been previously searched, wherein the search indicator is determined by receiving both an alphanumeric indicator relating to a search status of the unique classification value and an alphanumeric indicator relating to a search extend of the unique classification value;

generating and displaying a table of each of the unique classification values along with at least one of the score of each of the unique classification values and the search indicator for each of the unique classification values; and

assigning a color to be displayed on a user interface relating to the search indicator.

25. A method according to claim 24 further comprising sorting the table based on the score of each of the unique classification values.

26. A method according to claim 24 further comprising assigning a predetermined value to each of the unique classification values that corresponds to a weight of each of the unique classification values to define a weighted classification value; and wherein the weighted classification value is used to modify the score.

27. A method according to claim 24 further comprising assigning a predetermined value to each of the unique classification values relating to a document located in the search that is determined to be a relevant document that corresponds to a weight of each of the unique classification values to define a weighted classification value, and wherein the weighted classification value is used to modify the score.

28. A method according to claim 24 further comprising assigning a predetermined value to each of the unique classification values relating to a document located in the search that corresponds to a weight of each of the unique classification values to define a weighted classification value based on the number of documents located in the classification, and wherein the weighted classification value is used to modify the score.

29. A method according to claim 24 further comprising separating the unique classification values to display only the unique classification values of those documents located in the search that were determined to be relevant.

30. A method according to claim 24 further comprising organizing each of the unique classification values in a hierarchy providing each of the unique classification values with at least one ancestor node; and further comprising replacing each of the unique classification values with the at least one ancestor node.

31. A method according to claim 24 wherein each of the unique classification values includes a class value and a subclass value; and further comprising replacing each of the unique classification values with the class value.

32. A method according to claim 30 further comprising grouping each of the unique classification values by adding the scores of each of the unique classification values after being replaced; sorting the grouped unique classification values according to the scores; and displaying the sorted grouped unique classification values on the table.

33. A system according to claim 8 wherein each of the unique classification values are grouped by adding the scores of each of the unique classification values after being replaced; wherein the grouped unique classification values are sorted according to the scores; and wherein the sorted grouped unique classification values are displayed on the table.

34. A method according to claim 20 further comprising grouping each of the unique classification values by adding the scores of each of the unique classification values after being replaced; sorting the grouped unique classification values according to the scores; and displaying the sorted grouped unique classification values on the table.

35. A method according to claim 31 further comprising grouping each of the unique classification values by adding the scores of each of the unique classification values after being replaced; sorting the grouped unique classification values according to the scores; and displaying the sorted grouped unique classification values on the table.