WO2002021313A2

WO2002021313A2 - Unsupervised method of identifying aberrant behavior by an entity with respect to healthcare claim transactions and associated computer software program product, computer device, and system

Info

Publication number: WO2002021313A2
Application number: PCT/US2001/027516
Authority: WO
Inventors: Samir Ibrahim Abed; Michael Yvan Wallace; Aaron Michael Seib; David Stanton Whipple; Paul A. Dubose
Original assignee: Bloodhound Software, Inc.
Priority date: 2000-09-05
Filing date: 2001-09-05
Publication date: 2002-03-14
Also published as: AU2001287082A1; WO2002021313A3

Abstract

An unsupervised method of identifying aberrant behavior by an entity with respect to healthcare claim transactions is provided. A coordinate space is defined with a dimensionality corresponding to a number of selected ranked variables defining a representative behavior of a peer group with respect to a reference source of claim transactions, the peer group comprising a plurality of entites. A central tendency of the representative behavior is then determined from the claim transactions of the reference source, as a result of the selected ranked variables being applied thereto, and with respect to the coordinate space. The selected ranked variables are then applied to a new source of claim transactions for entites corresponding to the peer group so as to define a corresponding element for each entity. After mapping the elements with respect to the coordinate space, a score is then determined for each element with respect to the central tendency. A threshold criteria is then applied to the scores such that a threshold-exceeding score indicates an aberrant behavior by the corresponding entity. An associated computer software program product, computer device, and system are also provided.

Description

UNSUPERVISED METHOD OF IDENTIFYΓNG ABERRANT BEHAVIOR BY AN ENTITY WITH RESPECT TO HEALTHCARE CLAIM TRANSACTIONS AND ASSOCIATED COMPUTER SOFTWARE PROGRAM PRODUCT, COMPUTER

DEVICE, AND SYSTEM

FIELD OF THE INVENTION The present invention relates to the identification of suspicious and/or unusual behavior by an entity with respect to healthcare claim transactions and, more particularly, to an unsupervised method of identifying aberrant behavior by patients, healthcare providers, pharmacists, or other entities with respect to healthcare or pharmacy claim transactions submitted thereby, along with an associated computer software program product, computer device, and system.

BACKGROUND OF THE INVENTION The GAO estimates that 10% of healthcare dollars spent fall in the overall category of errors, abuse, and fraud. Since healthcare costs typically represent a large part of the national economy, now 14% and still growing, there is a significant need for a process capable of minimizing or solving this problem. Note that, as used herein, the term "healthcare claim" is used for illustrative purposes only, wherein the techniques and principles discussed herein with respect to healthcare claims may be similarly applicable to a healthcare claim, a prescription claim, or groups of such claims, wherein the claims may be organized by entity, an entity comprising, for example, a patient, a healthcare provider, a group of healthcare providers, a pharmacist, a group of pharmacists, an institution, or other individual or organization acting on behalf of patients. However, though fraudulent or abusive healthcare claims represent a significant problem, there are relatively very few cases where a single claim or a group of claims can be definitively classified as being fraudulent or abusive. Note that, for brevity, the term "fraudulent" will be used herein in reference to both fraudulent and abusive healthcare claims. Even so, if a claim appears to be fraudulent, the individual associated with creating that claim can reasonably state and justify, for instance, that a clerical error caused the problem. Also, because of the associated legal ramifications, the words "fraud" and "abuse" must typically be used with care. As such, when investigators become concerned with a patient, healthcare provider or pharmacist due to questionable claims, the particular individual or group is generally approached with various suggestions for resolving the associated paid claims. For instance, the particular individual or group may be asked for reimbursement, requested to adjust applicable fees, or faced with payment withholding on future claims. Generally, legal action is only taken in extreme cases and these cases are often settled out of court. Further, only in those particular cases where legal judgements are rendered can it be stated that fraud and abuse has occurred. Accordingly, even if fraud and abuse cases and the associated healthcare claims were compiled into a database, the database would be small in size, with many types of potential fraud not even being represented therein. Thus, it is currently not practical to apply predictive modeling technology or supervised methods to detect abnormal or aberrant behavior by an entity that represents fraud and abuse. Reviewers of healthcare or prescription claims have a number of ad hoc approaches that are used to detect suspicious or unusual entities from associated claims. One approach is to rely on and investigate tips of potential fraud and abuse, wherein the subject claims submitted by an entity are reviewed and then followed up with a field investigation. Other approaches rely on the analysis of claims databases to find suspicious cases, wherein the particular suspicious cases are then followed up with a field investigation. More particularly, at the individual claim level, a number of logic-based systems are used to review individual claims. For example, HBOC's CLAIM CHECK and IDX's GMIS are available products that use expert rules to identify claim submission and data entry errors. However, since these systems apply predetermined logical rules, only aberrant claims consistent with the predetermined identification factors will be detected. While helpful and important in identifying cases of fraud and abuse, such systems are not solely sufficient to resolve the general problem.

Often an individual claim may appear legitimate, and a problematic trend becomes apparent only when a group of claims defining a behavior is analyzed. For example, if an expensive medical procedure is utilized by one entity such as, for example, a provider, at a much higher rate than his/her peers, then a problem may exist, although this trend cannot be detected from a single claim. Another problem with hard-coding specific logic is that new loopholes not conforming to the predetermined logical rules are continually being discovered by creative providers, who may then share these loopholes with other providers. Thus, at the healthcare provider level, databases are sometimes used to aggregate values across claims, wherein these values are used to compare to similar healthcare providers. However, such comparisons are often performed manually and using a single variable per analysis. While helpful, the results of a single variable analysis may be quite misleading, particularly, in some instances, when the analysis variables have not been transformed to normalize the distributions of the examined data. In addition, such a process is based on a manual search and lacks a standard criteria by which to identify claims requiring further scrutiny. Also, such a process depends on the identification of appropriately similar healthcare providers. However, in reality, there are many types of specialties and each provider has different types of patients exhibiting varying parameters such as, for example, the severity of a particular illness. Accordingly, unless an overly generalized criteria is used in examining the claims, similarity matches often yield either suspect results or few similar providers.

Thus, there exists a need for an improved method for identifying suspicious, unusual, or otherwise aberrant entities or such behavior by an entity with respect to healthcare claims. Such a method should desirably involve automatic scanning of a database so as to identify claims or group of claims being of most interest. Relevant information with respect to a reason why the particular claim case was identified as being of interest would also be provided. In addition, it would be desirable for such a method to be capable of automatically identifying and applying an appropriate reference basis on which to compare entities, preferably without generating or resorting to a multitude of special cases and without creating very small subsets of entities, since such cases typically are not able to provide meaningful statistical significance between normal patterns and potentially fraudulent cases. One particularly advantageous aspect of such a method would be to facilitate the identification of potentially problematic behavior before large economic impact has occurred. Legal expenses for prosecuting fraud and abuse often mitigate the effectiveness of prosecution. Instead, a preferred approach is an automated method that scrutinizes many entities, detects potential problems commensurately with their occurrence, appropriately identifies and indicates potential problems, and provides readily interpretable and understood reasons and other necessary information such that proper corrective action can be taken.

SUMMARY OF THE INVENTION The above and other needs are met by the present invention which, in one embodiment, provides an unsupervised method of identifying aberrant behavior by an entity with respect to healthcare claim transactions. A coordinate space is defined with a dimensionality corresponding to a number of selected ranked variables defining a representative behavior of a peer group with respect to a reference source of claim transactions, wherein the peer group comprises a plurality of entities. A central tendency of the representative behavior is then determined from the claim transactions of the reference source, as a result of the selected ranked variables being applied thereto, and with respect to the coordinate space. The selected ranked variables are then applied to a new source of claim transactions for the peer group so as to define a corresponding element for each entity. After mapping the elements with respect to the coordinate space, a score is then determined for each element with respect to the central tendency. A threshold criteria is then applied to the scores such that a threshold-exceeding score indicates an aberrant behavior by the corresponding entity within the peer group with respect to the new source of claim transactions. An associated computer software program product, computer device, and system are also provided. Accordingly, as indicated, the present invention may be accomplished through software, hardware, or a combination of software and hardware, as will be appreciated by one skilled in the art.

Alternatively stated, the present invention provides an automated method for detecting suspicious or otherwise unusual behavior of an entity within a peer group with respect to claim transactions using a form of multivariate outlier detection, also known as multivariate distribution characterization. Healthcare or prescription claims from entities such as patients, healthcare providers or pharmacists are evaluated so as to identify fraudulent and/or abusive behavior by an entity based on related historical multivariate statistical distributions. Such an evaluation of healthcare or prescription claims, based on historical statistical distributions, facilitates the presentation of a score and/or a probability for a particular entity, the score indicating whether or not the claims submitted by that particular entity comprises typical behavior among a peer group of such entities. The score, probability and/or the values or reasons for the results of the evaluation of the particular entity may then be manually evaluated by, for example, a human decision-maker involved in analyzing the claim transactions. Alternatively, the scores and/or probabilities for the respective entities within a peer group may be automatically monitored and the decision maker alerted when a score and/or probability exceeds a predetermined threshold value. In addition, the historical statistical distributions may be continuously or otherwise periodically updated based upon the evaluation of new claim transactions for the peer group or updated when the historical statistical distributions have undergone significant change as a result of the evaluation of further claim transactions. In this manner, the described method is capable of effective operation without requiring or without being based upon previous particular examples of fraudulent or suspicious claim transactions or corresponding behavior by an entity within the peer group, which would be required for a predictive modeling system such as a predictive neural network, and therefore provides an effective unsupervised method for identifying the described aberrant behavior by an entity with respect to healthcare claim transactions.

Thus, embodiments of the present invention provide an improved method for identifying suspicious, unusual, or otherwise aberrant behavior by an entity with respect to healthcare claims by providing for automatic scanning of an appropriate database of healthcare claim transactions so as to identify entities of most interest within a peer group. Relevant information with respect to a reason why the particular entity was identified as being of interest is also provided as a part of the results. Embodiments of the present invention are also capable of automatically identifying and applying an appropriate basis on which to compare such entities, without generating or resorting to a multitude of special or previously identified cases and without having to create very small subsets of specific entities. Accordingly, the present invention facilitates the identification of potentially problematic behavior by an entity before large economic impact has occurred, since the system is capable of scrutinizing many peer groups and entities within those peer groups, detecting potential problems commensurately with their occurrence, appropriately identifying and indicating potential problem entities, and providing readily interpretable and understood reasons and other necessary information such that proper corrective action can be taken. BRIEF DESCRIPTION OF THE DRAWINGS Having thus described the invention in general terms, reference will now be made to the accompanying drawings, which are not necessarily drawn to scale, and wherein:

FIG. 1 is a block diagram of an implementation of an unsupervised method of identifying aberrant behavior by an entity with respect to healthcare claim transactions according to one embodiment of the present invention;

FIG. 2 is a flowchart illustrating the function of an unsupervised method of identifying aberrant behavior by an entity with respect to healthcare claim transactions according to one embodiment of the present invention;

FIG. 3 is a block diagram showing a system architecture corresponding to an unsupervised method of identifying aberrant behavior by an entity with respect to healthcare claim transactions according to one embodiment of the present invention; FIG. 4 is a flowchart showing a process for statistically analyzing historical claims and computing behavioral metrics according to one embodiment of the present invention;

FIG. 5 is a diagram showing a process of determining a non-linear transformation when analyzing a behavioral metric according to one embodiment of the present invention;

FIG. 6 is a report showing an example of non-linear transformation parameters for a behavioral metric according to one embodiment of the present invention;

FIG. 7 is flowchart showing a process of selecting a compact set of best behavioral metrics according to one embodiment of the present invention;

FIG. 8 is a report showing a list of best groups of transformed behavioral metrics according to one embodiment of the present invention;

FIG. 9 is a diagram showing the generation of a configuration file for storing the information required for calculating behavioral metrics and computing proximities for detecting multivariate outliers according to one embodiment of the present invention; FIG. 10 is a schematic representation of the application of a statistical characterization of a database of historical claim transactions to new or current claim transactions according to one embodiment of the present invention; and

FIG. 11 is an example output showing detected unusual activity by an entity with respect to healthcare claim transactions according to one embodiment of the present invention.

DETAILED DESCRIPTION OF THE INVENTION The present invention now will be described more fully hereinafter with reference to the accompanying drawings, in which preferred embodiments of the invention are shown. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art. Like numbers refer to like elements throughout.

FIG. 1 shows a block diagram of an example implementation of a method according to the present invention in the form of a computer device, generally indicated by the numeral 100. A central processing unit ("CPU") 103 runs or otherwise executes instructions stored in a computer program storage module 104, whereby the CPU 103 is directed to perform various functions as described herein. In the various embodiments of the present invention, executable software programs stored in the program storage module 103 may be written in several programming languages, including, for example, Oracle, Microsoft SQL, Java, and C++, and may be executed via a variety of conventional computer hardware as will be appreciated by one skilled in the art. For example, the CPU 103 may comprise an Intel PENTIUM microprocessor operating with a Microsoft or Linux operating system. The CPU 103 of the computer device 100 may receive claim information from a claim information source 101 through a data network 102 connected therebetween. The claim information source 101 may comprise, for example, an insurance company, a healthcare organization, or a pharmacy benefit management group, who may, in turn, obtain information on the corresponding claim transactions from healthcare providers or pharmacists (not shown). In some instances, however, the claim information source 101 may comprise healthcare providers or pharmacists, wherein, in some of those instances, the computer device 100 may be operated by the appropriate insurance company, healthcare organization, or pharmacy benefit management group. Typically, one or more software programs instruct the CPU 103 to store claim transaction data obtained from the claim information source 101 in a data storage module 105. The computer device 100 also includes a random access memory ("RAM") module 106, which is implemented as a workspace as will be appreciated by one skilled in the art. Accordingly, the CPU 103, the data storage module 105, and the program storage module 104 are capable of cooperating so as to implement a method for detecting suspicious or unusual behavior of an entity with respect to healthcare claim transactions, according to the present invention. After the claim information has been processed, as described further herein, into the differences between particular claims and "typical" historical claims, an appropriate signal indicative of the unusual or suspicious behavior of an entity determined from a particular outlying claim or claims is sent from the CPU 103 to an output device 108, the described elements thus comprising a system 107 according to one embodiment of the present invention. The output device 108 may comprise, for example, a monitor for displaying the results, a printing device for printing the results, an Internet web site accessible via a web browser, or a disk storage device for storing the results in a file or database for further use. FIG. 2 is a flowchart illustrating the general function of the system 107, according to one embodiment of the present invention. Initially, a plurality of historical claim transactions from the claim information source 101 or the data storage module 105 is statistically analyzed to compute appropriate behavioral metrics (Block 201). In healthcare or pharmacy claims, such metrics typically indicate or otherwise provide relevant information with respect to the examined historical claims of a peer group. Accordingly, such metrics may comprise data resulting from variables or analyzed parameters being applied to the one or more claim transactions in a database of claim transactions. In some instances, a metric may comprise a ratio between parameters or may otherwise represent a relationship between a plurality of parameters for a peer group. The behavioral metrics are then stored in a configuration file (Block 202) in, for instance, the data storage module 105. Once the appropriate metrics are determined, as described in further detail below, current or new claim transactions for the peer group can be analyzed. In some instances, the current claim transactions may be analyzed per entity. Accordingly, current claim transaction data is obtained, for example, from the claim information source 101 and then analyzed so as to determine the corresponding behavioral metrics (Block 203) for each entity. The metrics of the current claims for each entity are then compared to the previously analyzed historical metrics (Block 204) for that peer group. Based on this comparison, the system 107 determines an outlier score and associated reason for particularly prominent behavioral metrics for an entity with respect to the current claim transactions, wherein the results are subsequently output (Block 205) to, for example, a user, a web site, or an appropriate file or database. FIG. 3 illustrates an architecture of a system 107 according to one embodiment of the present invention. In this instance, the system 107 additionally includes, within the described elements or as separate individual elements, a statistical analysis module 301 and a configuration file module 302. The statistical analysis module 301 is configured to analyze the historical claim transactions 303 for a peer group received from the claim information source 101 and/or the data storage module 105 so as to develop the corresponding behavioral metrics from the data within the claim transactions. More particularly, the statistical analysis module 301 analyzes the statistical distributions of the historical claims for the peer group to determine the appropriate non-linear transformations for computing the behavioral metrics. For example, outlier trimming and exponential powers may be used to find the non-linear transforms that best improve the distribution symmetry in the behavioral metrics determined from the historical claim transactions, as will be appreciated by one skilled in the art, though it will be understood that other transformations may be applicable to achieve similar results. The metrics are then ranked using, for example, a combination of domain knowledge and statistical teclmiques, so as to determine the "best" metrics for representing the behavior of the peer group based on the analysis of the historical claim transactions, before a portion or subset of the ranked metrics are selected to form a compact set of selected ranked metrics providing an appropriate representation of a normal behavior of the peer group. For example, a genetic algorithm may be used to find combinations of behavioral metrics with minimal pair- wise correlation value, as will be appreciated by one skilled in the art, though it will be understood that other search algorithms and/or fitness criteria may be used to determine an appropriate subset of metrics capable of achieving similar results. The configuration file module 302 may be configured to perform multiple functions such as, for example, performing a repository function for the results of the statistical analysis executed by the statistical analysis module 301. hi addition, the configuration file module 302 may be configured to receive information on current or new claims 304 for an entity or peer group corresponding to the peer group of the examined historical claim transactions, hi such instances, the configuration file module 302 analyzes the current claims for the entity or peer group, according to the statistical analysis parameters previously received from the statistical analysis module 301, so as to produce corresponding current behavioral metrics 305. Upon completion of the analysis of the current claim transactions to produce the current behavioral metrics, appropriate statistical parameters and associated information are made available to the other elements of the system 107. As previously described, the system 107 thereafter determines an appropriate signal for prominent entities based on the current behavioral metrics 305, wherein the respective signals are then indicated as an output 306 and/or provided to a database 307.

FIG. 4 illustrates a process for creating historical behavioral metrics from the historical claim transaction data according to one embodiment of the present invention. Initially, appropriate data is extracted from historical claim transactions (Block 401), wherein the historical claim transactions may be stored in an appropriate claim database. Typically, each claim transaction has the same number of attributes, each comprising the same type of data, and such claim transactions can be grouped according to the submitting healthcare entity such as a provider, group of healthcare providers, patient, group of patients, pharmacist, or group of pharmacists. A number of statistics are then computed (Block 402) with respect to both the submitting group and the overall population of historical claim transactions being examined. The resulting statistics are then used to determine a set of initial behavioral metrics (Block 403). In some instances, some behavioral metrics compare the individuals or entities within a peer group to the overall peer group through, for example, ratios of total line items per claim transaction for a particular provider compared to the average provider within that group. Once the initial behavioral metrics are established, optimal parameters for transforming these metrics are determined (Block 404), such a transformation facilitating the examination of the corresponding statistical distributions of the metrics. FIG. 5 illustrates a process of determining the optimal parameters for transforming the metrics according to one embodiment of the present invention. This process is important since some metrics may comprise such extreme outlying values that, without the transformation, less extreme, but nonetheless significant, outlying values may become hidden, thereby allowing possible fraudulent or abusive entities to escape detection. Accordingly, in embodiments of the present invention, outlier trimming and exponential power techniques are used to optimize the resulting nonlinear transforms so as to improve the statistical distribution symmetry of the data comprising a metric. Generally, a highly skewed or asymmetric statistical distribution may include a small number of cases with extremely unusual values (Block 501). Though there exists a number of techniques for identifying statistically outlying values, embodiments of the present invention implements an iterative loop that generally identifies and removes one or more of the most extreme cases from a distribution and then magnifies the remaining portion of that distribution (Block 502) for further examination. According to such an outlier trimming or clipping process, a new minimum and maximum value may be established for each metric at each iteration. Values above and below, respectively, the new minimum and maximum values may then be clipped when computing, for example, correlation values or standard deviations for the data. However, when computing the proximity of a value with respect to the corresponding population center, the undipped value may, in some instances, be a more appropriate representation of the data.

The distribution symmetry of the data may then be further improved by other non-linear transformations. In order to determine the fitness of the data, a skewness statistic, which measures the degree of asymmetry, is used as the fitness function for determining the optimal distribution for the particular metric. A skewness value of zero represents a perfectly symmetric distribution, and thus a normal distribution will have a skewness value of zero. Many statistical methodologies generally rely on a distribution being an approximately normal distribution, including methodologies estimating probability values for data points based on proximity to the population mean. Accordingly, embodiments of the present invention apply a search algorithm to the data so as to determine an optimal exponential transform for improving the distribution symmetry to be closer to a "normal" distribution (Block 503). Thus, according to one embodiment, an algorithm is applied to the data using an appropriate exponential value so as to provide a distribution having a skewness value close to zero for the particular metric.

FIG. 6 illustrates a portion of a report that demonstrates the generation of optimal parameters for transforming behavioral metrics according to one embodiment of the present invention. For example, the report indicates that there are 42 behavioral metrics in the particular file, with 18,015 records processed (601). For each behavioral metric processed, starting with metric 0 (602) or the first of the 42 metrics, the number of valid records (17,987) within the population is determined, along with the best exponent (0.05) for transforming the metric. According to the analysis, the maximum value for the valid records of metric 0 is 123.73 (603), but by clipping only records corresponding to the largest 23 values (607), the new maximum value becomes about 22.276 (603). Thus, clipping, in this instance, lowers the standard deviation of the metric from about 13.199 to about 6.299 (606) and lowers the skewness of the metric from about 6.944 to about 1.962 (604). Further, when the metric is raised to an exponent of 0.05 (602), the skewness drops further to about 1.194 (605).

FIG. 7 illustrates a process for selecting a compact optimal set of behavioral metrics for representing the historical claim transactions for a peer group according to one embodiment of the present invention. Such a process is important since, at this point, it is not clear which behavioral metrics most appropriately represent the behavior of the peer group with respect to the database of historical claim transactions. Generally, the total number of identified behavioral metrics (42 behavioral metrics in the above example) may often result in a lengthy analysis, possibly with redundant or irrelevant results. For example, a correlation matrix of the metrics may likely reveal that a plurality of metrics are highly correlated, meaning that such metrics represent redundant information and may thus bias the population representation. Further, the time required to analyze proximities, for instance, may increase by a factor proportional to the square of the number of variables involved. Accordingly, an efficient computer-implemented process desirably uses as few metrics as possible. However, using too few variables may undesirably result in an incomplete representation of the population. Tlius, the transformed behavioral metrics (Block 701) are input to, for example, a statistical routine that computes the corresponding correlation matrix (Block 702). The correlation matrix (Block 702) is used to facilitate the selection of an appropriate set of metrics by indicating the metrics having the smallest maximum pair-wise correlation values. That is, the appropriate set of metrics desirably comprises selected metrics having the maximum independence therebetween. Such a statistical routine may pose a very difficult computational task, since determining the number of metrics is, for example, a factorial computation. More particularly, if there are N metrics and K metrics out of the N metrics are to be selected, then N! / (K! * (N-K)!) combinations of metrics must be analyzed. To allow such a computation within a reasonable time period, embodiments of the present invention employ a search algorithm which uses a technique for quickly determining the optimal, or as close to the optimal as possible, combination of metrics without examining all possible combinations. More particularly, a genetic algorithm (Block 703) is employed to analyze the data so as to determine the best combinations of metrics, wherein such a search may result in, for example, a list (Block 704) of the best 2 variables, best 3 variables, and so on for the best M variables, each with a corresponding characterization of the fitness or appropriateness of each of the groups of variables. In further detail, based on the submitted data, the genetic algorithm implements a fitness equation to ascertain whether or not a certain combination of variables is appropriate for continued analysis. The eventual selection of the best M variables is accomplished by selecting the best group of variables that, on average, are more fit than other groups of variables of the same order, as indicated by the fitness equation. Further, a group of variables may produce a new generation of offspring variables. In such an instance, a crossover procedure may be performed on the resultant offspring variables, depending on the probability of crossover. Generally, crossover is a random exchange of attributes defined by individual variables from a superset of two parent variable groups. Once production of offspring variables is concluded, mutation and insertion of the offspring variables into the population of variables, wherein the number of generations comprises a parameter of the genetic algorithm. Thus, according to embodiments of the present invention, a skilled human analyst, for example, then reviews the list (Block 704) and selects the "best" group of metrics (Block 705). Note, however, that the selection of the best group of metrics may, in some instances, comprise an automated process which may be implemented in software, hardware, or a combination of software and hardware within the spirit and scope of the present invention, as will be appreciated by one skilled in the art. FIG. 8 illustrates a report including an example list of groups of variables resulting from the above-described analysis and presented for selection, according to one embodiment of the present invention. As shown, the best three metrics (801) are indicated as a group of three variables, with the particular name of each of the three variables also being indicated (802). The correlation statistics for these three variables are shown (803), wherein the absolute value of the largest pair- wise correlation between the variables is about 0.067 (803 and 804). The best four metrics are also shown (805), with the corresponding correlation matrix (806) showing the maximum absolute correlation increasing to a value of about 0.288 (807). The best group of five metrics (808) provides a corresponding correlation matrix with a maximum absolute correlation value of about 0.766 (809). As shown, the addition of a fifth metric, which produces a best group of five metrics, does not improve the maximum absolute correlation value over the best group of four metrics.

Accordingly, the best group of four metrics may be appropriate for representing the analyzed population, in this instance. Thus, in choosing the best group of metrics, consideration is given to the group having the highest number of variables with as low a maximum absolute correlation value as possible. A higher maximum absolute correlation value for a group of variables indicates less independence between at least two variables within the group.

FIG. 9 illustrates the generation of an appropriate configuration file 901 for the selected metrics according to one embodiment of the present invention. The configuration file 901 generally comprises a stored file including the relevant statistics and parameters used to calculate the selected behavioral metrics. Further, the configuration file 901 may also include details of the selected metrics with regard to, for example, the "representative population characteristics" or central tendency thereof for computing proximities of entities with respect to the current claim transaction so as to detect multivariate outliers. More particularly, a list of the claims variables 903 used to determine the selected best group of behavioral metrics 902 is stored in the configuration file 901. In addition, the formulas, parameters, or other information used to create the pre-transformed metrics 904 maybe included in the configuration file 901, along with the parameters used to compute the non-linear transformations 905, such as the minimum and maximum clipping values and the selected exponential power.

FIG. 10 illustrates a process of analyzing new or current claims for entities corresponding to the peer group, once the peer group has been characterized from the historical claim transactions and the appropriate metrics or variables chosen, according to one embodiment of the present invention. Once the group of best metrics is selected, a coordinate space corresponding in dimensionality to the number of metrics within the selected group may be established. More generally, for a group of M selected metrics, the corresponding coordinate space is defined as having M dimensions. For example, if a group of five metrics is selected, the corresponding coordinate space is defined in five dimensions. However, for the sake of example and for clarity of illustration, a two dimensional coordinate space 1001 is shown which would, in an analysis according to the present invention, correspond to a group of two selected metrics. Once the appropriate coordinate space 1001 has been determined, the statistical distribution of the respective transformed metrics may be established with respect to the coordinate space so as to define "representative population characteristics" or central tendency 1002 or behavior of the peer group to which the behavior of entities within the same peer group and submitting the new or current claims is compared. A source of related new or current claim transactions 1003 for the peer group, to which the outlier detection system is to be applied, is then accessed, whereafter the configuration file 901 corresponding to the central tendency 1002 is then applied to the new claim transaction source 1003. More particularly, the new claim transactions are analyzed according to the same process initially applied to the historical claim transactions so as to prepare a corresponding set of behavioral metrics 1004 for each of the entities represented by the new source of claim transactions. The same non-linear transformation parameters 905 are then applied to the new behavioral metrics 1004 so as to complete the necessary transformation of the data within the new claim transaction source 1003. After transformation of data within the source of new claim transactions 1003 to produce the corresponding new behavioral metrics 1004 for the corresponding entities, the previously selected group of variables 902 is applied to the new behavioral metrics so as to determine a corresponding data point or element 1005 for each respective entity represented within the new source of claim transactions. The elements 1005 are then applied or mapped to the defined coordinate space 1001 in relation to the central tendency 1002. For each element 1005, a proximity or distance calculation 1006 is performed between the respective element 1005 and the central tendency 1002. Accordingly, such a proximity or distance calculation 1006 may be translated or otherwise related to a corresponding score 1007 for each respective element 1005. Thereafter, since the scores 1007 for the respective elements 1005 may also vary in magnitude, embodiments of the present invention may provide for the establishment of a threshold criteria 1008 with respect to the determined scores 1007. Such a threshold criteria 1008 may be, in some instances, manually established by a user upon examination of the mapped elements or, in other instances, determined from a factor or other information, or derived from the previously determined parameters, stored in the configuration file 901. Accordingly, elements 1005 exceeding the threshold criteria 1008 may then be designated as suspicious or unusual cases 1009 and marked or otherwise indicated for further investigation. As previously discussed, the suspicious or unusual cases 1009 maybe indicated by, for example, an aural or visual alarm, displayed on a monitor, forwarded to a printing device, relayed to an Internet web site, or stored to a storage device for later or further processing. Once the new claim transaction source 1003 has been analyzed according to the described methodology, the analyzed new claims may be added to, replace, or otherwise modify the historical claim transactions used to initiate the detection system. Accordingly, the analytical parameters may be iteratively updated or modified so as to provide a continually updated set of metrics with which to analyze other entities corresponding to the peer group and submitting further new claim transactions. Alternatively, the set of metrics may be periodically updated by, for example, time, date, number of processed claims, or other factors. Still further, the effect of the analyzed entity behaviors, determined from the new claim transactions, on the characteristics of the behaviors of the historical peer group, determined from the historical claims transactions, may be monitored and the set of metrics updated accordingly only after the analyzed new entity behaviors have attained a threshold effect. Thus, it will also be appreciated by one skilled in the art that embodiments of the present invention provide a readily adaptable methodology for identifying suspicious, unusual, or otherwise aberrant behaviors of entities with respect to healthcare claim transactions, which may be expediently adjusted to account for changes within, for example, a representative population or a criteria for identifying a suspicious or unusual behavior of an entity.

FIG. 11 shows an example output from a multivariate outlier detection system according to one embodiment of the present invention. Such a report may include, for example, navigation tabs 1101 for allowing a user to browse through the various results and portions of the reports. In one advantageous embodiment, the report may provide, for outlying entities detected by the system, information on a particular entity 1102, such as a provider, as indicated by, for example, a provider number, the corresponding type of provider submitting the claim 1103, the total cost 1104 submitted by the provider, the total claims 1105 submitted by the provider, the resulting classification status of the provider 1106, and one or more reasons for the indicated classification status of the provider 1107.

Note that embodiments of the present invention do not require the historical entity behaviors to be classified as either fraudulent or not fraudulent for the purposes of the analysis described herein. Accordingly, without a predefined fraud standard, other techniques, such as regression or neural networks, may not be capable of providing a predictive model for analyzing the claim transactions and subsequent behavioral metrics. Further, such a predefined fraud standard may, in some cases, limit the effectiveness of such an analysis by failing to indicate entities which, though not falling within the standard, may nonetheless be exhibiting fraudulent or abusive behavior. Instead, embodiments of the present invention involve preparing a statistical characterization of the behavior of a peer group from a representative population of claim transactions, wherein a proximity measurement is then used to determine whether the behavior of an entity submitting a new claim transaction is unusual, and to what extent, as compared to the representative population or peer group. That is, in statistics terminology, embodiments of the present invention are directed to the detection of statistical outliers, more appropriately termed multivariate outlier detection where multiple metrics are used to determine the proximity measurement.

Note also that the foregoing description illustrates and supports a method for identifying suspicious, unusual, or otherwise aberrant behavior by an entity with respect to healthcare claims. In addition, it will be realized and appreciated by one skilled in the art that such a described method may be computer-implemented, as referenced herein, thereby supporting an associated computer software program product and a computer device configured to implement the described method. In addition, it will also be understood that, in addition to the method, computer software program product, and computer device contemplated by embodiments of the present invention, various systems may also be provided. For example, either or both of the databases used to provide input of historical claim transactions and new claim transactions may be combined with an appropriately configured computer device to form such a system. Further, such a system may also be combined with the various forms of result output, as previously described, to form still further systems in accordance with the spirit and scope of the present invention.

Thus, embodiments of the present invention provide an improved method for identifying suspicious, unusual, or otherwise aberrant behavior by an entity with respect to healthcare claims by providing for automatic scanning of an appropriate database so as to identify entities of most interest and the associated claim transactions. Relevant information with respect to a reason why the particular claim case was identified as being of interest is also provided as a part of the results. Embodiments of the present invention are also capable of automatically identifying and applying an appropriate basis on which to compare entities, without generating or resorting to a multitude of special cases and without creating very small subsets of entities. Accordingly, the present invention facilitates the identification of potentially problematic behavior before large economic impact has occurred, since the system is capable of scrutinizing many entities, detecting potential problems commensurately with their occurrence, appropriately identifying and indicating potential problems, and providing readily interpretable and understood reasons and other necessary information such that proper corrective action can be taken.

Many modifications and other embodiments of the invention will come to mind to one skilled in the art to which this invention pertains having the benefit of the teachings presented in the foregoing descriptions and the associated drawings. Therefore, it is to be understood that the invention is not to be limited to the specific embodiments disclosed and that modifications and other embodiments are intended to be included within the scope of the appended claims. Although specific terms are employed herein, they are used in a generic and descriptive sense only and not for purposes of limitation.

Claims

THAT WHICH IS CLAIMED:

1. An unsupervised method for identifying an aberrant behavior by an entity with respect to healthcare claim transactions, said method comprising: defining a coordinate space having a dimensionality corresponding to a number of selected ranked variables defining a representative behavior of a peer group with respect to a reference source of claim transactions, the peer group comprising a plurality of entities; determining a central tendency of the representative behavior gleaned from the claim transactions of the reference source, as a result of the selected ranked variables being applied thereto, and with respect to the coordinate space; applying the selected ranked variables to a new source of claim transactions for the peer group so as to define a corresponding element for each entity; mapping the elements with respect to the coordinate space; determining a score for each element with respect to the central tendency; and applying a threshold criteria to the scores, the threshold criteria being applied to the scores such that a threshold-exceeding score indicates an aberrant behavior by the corresponding entity within the peer group with respect to the new source of claim transactions.

2. A method according to Claim 1 further comprising compiling a plurality of historical claims so as to form the reference source of claim transactions, before defining the coordinate space.

3. A method according to Claim 2 wherein compiling a plurality of historical claims further comprises saving the plurality of historical claims in a database.

4. A method according to Claim 2 wherein compiling a plurality of historical claims further comprises compiling a plurality of at least one of healthcare claims and pharmacy claims.

5. A method according to Claim 1 further comprising identifying a plurality of variables defining the representative behavior of the peer group with respect to the reference source of claim transactions, before defining the coordinate space.

6. A method according to Claim 5 further comprising ranking the plurality of identified variables.

7. A method according to Claim 6 further comprising selecting a number of the ranked variables.

8. A method according to Claim 1 further comprising an outlier trimming procedure to the result of the application of the selected ranked variables to the reference source so as to facilitate an analysis of a corresponding statistical distribution of the result.

9. A method according to Claim 1 further comprising applying a nonlinear transformation to the result of the application of the selected ranked variables to the reference source so as to facilitate an analysis of a corresponding statistical distribution of the result.

10. A method according to Claim 1 wherein determining a score further comprises determining a score corresponding to a proximity measurement, with respect to the coordinate space, between the respective element and the central tendency.

11. A method according to Claim 1 wherein determining a score further comprises determining a score corresponding to a normalized multivariate distance, with respect to the coordinate space, between the respective element and the central tendency.

12. A method according to Claim 1 wherein determining a score further comprises detennining a score as a probability that the respective element comprises an aberrant behavior by the corresponding entity.

13. An unsupervised method for identifying an aberrant behavior by an entity with respect to healthcare claim transactions, said method comprising: identifying a plurality of variables defining a representative behavior of a peer group with respect to a reference source of claim transactions, the peer group comprising a plurality of entities; ranking the plurality of variables; selecting a number of the ranked variables; defining a coordinate space having a dimensionality corresponding to the number of selected ranked variables; determining a central tendency of the representative behavior gleaned from the claim transactions of the reference source, as a result of the selected ranked variables being applied thereto, and with respect to the coordinate space; applying the selected ranked variables to a new source of claim transactions for the peer group so as to define a conesponding element for each entity; mapping the elements with respect to the coordinate space; determining a score for each element with respect to the central tendency; and applying a threshold criteria to the scores, the threshold criteria being applied to the scores such that a threshold-exceeding score indicates an aberrant behavior by the conesponding entity within the peer group with respect to the new source of claim transactions.

14. A method according to Claim 13 further comprising compiling a plurality of historical claims so as to form the reference source of claim transactions, before defining the coordinate space.

15. A method according to Claim 14 wherein compiling a plurality of historical claims further comprises saving the plurality of historical claims in a database.

16. A method according to Claim 14 wherein compiling a plurality of historical claims further comprises compiling a plurality of at least one of healthcare claims and pharmacy claims.

17. A method according to Claim 13 further comprising applying an outlier trimming procedure to the result of the application of the selected ranked variables to the reference source so as to facilitate an analysis of a conesponding statistical distribution of the result.

18. A method according to Claim 13 further comprising applying a non- linear transformation to the result of the application of the selected ranked variables to the reference source so as to facilitate an analysis of a conesponding statistical distribution of the result.

19. A method according to Claim 13 wherein determining a score further comprises determining a score conesponding to a proximity measurement, with respect to the coordinate space, between the respective element and the central tendency.

20. A method according to Claim 13 wherein determining a score further comprises determining a score conesponding to a normalized multivariate distance, with respect to the coordinate space, between the respective element and the central tendency.

21. A method according to Claim 13 wherein determining a score further comprises determining a score as a probability that the respective element comprises an abenant behavior by the conesponding entity.

22. A computer software program product for implementing an unsupervised method of identifying an abenant behavior by an entity with respect to healthcare claim transactions, said computer software program product being executable on a computer device and comprising: an executable portion configured to define a coordinate space having a dimensionality conesponding to a number of selected ranked variables defining a representative behavior of a peer group with respect to a reference source of claim transactions, the peer group comprising a plurality of entities; an executable portion configured to determine a central tendency of the representative behavior gleaned from the claim transactions of the reference source, as a result of the selected ranked variables being applied thereto, and with respect to the coordinate space; an executable portion configured to apply the selected ranked variables to a new source of claim transactions for the peer group so as to define a conesponding element for each entity; an executable portion configured to map the elements with respect to the coordinate space; an executable portion configured to determine a score for each element with respect to the central tendency; and an executable portion configured to apply a threshold criteria to the scores, the threshold criteria being applied to the scores such that a threshold- exceeding score indicates an abenant behavior by the conesponding entity within the peer group with respect to the new source of claim transactions.

23. A computer software program product according to Claim 22 further comprising an executable portion configured to compile a plurality of historical claims so as to form the reference source of claim transactions before execution of the executable portion configured to define the coordinate space.

24. A computer software program product according to Claim 23 wherein the executable portion configured to compile the plurality of historical claims is further configured to store the plurality of historical claims in a database.

25. A computer software program product according to Claim 23 wherein the executable portion configured to compile the plurality of historical claims is further configured to compile a plurality of at least one of healthcare claims and pharmacy claims.

26. A computer software program product according to Claim 22 further comprising an executable portion configured to identify a plurality of variables defining the representative behavior of the peer group with respect to the reference source of claim transactions, before execution of the executable portion configured to define the coordinate space.

27. A computer software program product according to Claim 26 further comprising an executable portion configured to rank the plurality of identified variables.

28. A computer software program product according to Claim 27 further comprising an executable portion configured to select a number of the ranked variables.

29. A computer software program product according to Claim 22 further comprising an executable portion configured to apply an outlier trimming procedure to the result of the application of the selected ranked variables to the reference source so as to facilitate an analysis of a conesponding statistical distribution of the result.

30. A computer software program product according to Claim 22 further comprising an executable portion configured to apply a non-linear transformation to the result of the application of the selected ranked variables to the reference source so as to facilitate an analysis of a conesponding statistical distribution of the result.

31. A computer software program product according to Claim 22 wherein the executable portion configured to determine a score is further configured to determine a score conesponding to a proximity measurement, with respect to the coordinate space, between the respective element and the central tendency.

32. A computer software program product according to Claim 22 wherein the executable portion configured to determine a score is further configured to determine a score conesponding to a normalized multivariate distance, with respect to the coordinate space, between the respective element and the central tendency.

33. A computer software program product according to Claim 22 wherein the executable portion configured to determine a score is further configured to determine a score as a probability that the respective element comprises an abenant behavior by the conesponding entity.

34. A computer device capable of implementing an unsupervised method of identifying an abenant behavior by an entity with respect to healthcare claim transactions, said computer device comprising: a processing portion capable of defining a coordinate space having a dimensionality conesponding to a number of selected ranked variables defining a representative behavior of a peer group with respect to a reference source of claim transactions, the peer group comprising a plurality of entities; a processing portion capable of determining a central tendency of the representative behavior gleaned from the claim transactions of the reference source, as a result of the selected ranked variables being applied thereto, and with respect to the coordinate space; a processing portion capable of applying the selected ranked variables to a new source of claim transactions for the peer group so as to define a conesponding element for each entity; a processing portion capable of mapping the elements with respect to the coordinate space; a processing portion capable of determining a score for each element with respect to the central tendency; and a processing portion capable of applying a threshold criteria to the scores, the threshold criteria being applied such that a threshold-exceeding score indicates an abenant behavior by the conesponding entity within the peer group with respect to the new source of claim transactions.

35. A computer device according to Claim 34 further comprising a processing portion capable of compiling a plurality of historical claims so as to form the reference source of claim transactions before actuation of the processing portion defining the coordinate space.

36. A computer device according to Claim 35 wherein the processing portion for compiling the plurality of historical claims is further capable of storing the plurality of historical claims in a database.

37. A computer device according to Claim 35 wherein the processing portion for compiling the plurality of historical claims is further capable of compiling a plurality of at least one of healthcare claims and pharmacy claims.

38. A computer device according to Claim 34 further comprising a processing portion capable of identifying a plurality of variables defining the representative behavior of the peer group with respect to the reference source of claim transactions for the peer group, before actuation of the processing portion for defining the coordinate space.

39. A computer device according to Claim 38 further comprising a processing portion capable of ranking the plurality of identified variables.

40. A computer device according to Claim 39 further comprising a processing portion capable of selecting a number of the ranked variables.

41. A computer device according to Claim 34 further comprising a processing portion capable of applying an outlier trimming procedure to the result of the application of the selected ranked variables to the reference source so as to facilitate an analysis of a conesponding statistical distribution of the result.

42. A computer device according to Claim 34 further comprising a processing portion capable of applying a non-linear transformation to the result of the application of the selected ranked variables to the reference source so as to facilitate an analysis of a conesponding statistical distribution of the result.

43. A computer device according to Claim 34 wherein the processing portion for determining a score is further capable of determining a score conesponding to a proximity measurement, with respect to the coordinate space, between the respective element and the central tendency.

44. A computer device according to Claim 34 wherein the processing portion for determining a score is further capable of determining a score conesponding to a normalized multivariate distance, with respect to the coordinate space, between the respective element and the central tendency.

45. A computer device according to Claim 34 wherein the processing portion for determining a score is further capable of determining a score as a probability that the respective element comprises an abenant behavior by the conesponding entity.

46. A system for implementing an unsupervised method of identifying an abenant behavior by an entity with respect to healthcare claim transactions, said system comprising: a reference source of claim transactions; and a computer device in communication with the reference source and comprising: a processing portion capable of defining a coordinate space having a dimensionality conesponding to a number of selected ranked variables defining a representative behavior of a peer group with respect to the reference source, the peer group comprising a plurality of entities; a processing portion capable of determining a central tendency of the representative behavior gleaned from the claim transactions of the reference source, as a result of the selected ranked variables being applied thereto, and with respect to the coordinate space; a processing portion capable of applying the selected ranked variables to a new source of claim transactions for the peer group so as to define a conesponding element for each entity; a processing portion capable of mapping the elements with respect to the coordinate space; a processing portion capable of determining a score for each element with respect to the central tendency; and a processing portion capable of applying a threshold criteria to the scores, the threshold criteria being applied such that a threshold-exceeding score indicates an abenant behavior by the conesponding entity within the peer group with respect to the new source of claim transactions.

47. A system according to Claim 46 wherein the reference source comprises a plurality of historical claims.

48. A system according to Claim 46 wherein the reference source comprises a database having a plurality of historical claims stored therein.

49. A system according to Claim 46 wherein reference source comprises a plurality of at least one of healthcare claims and pharmacy claims.

50. A system according to Claim 46 wherein the computer device further comprises a processing portion capable of identifying a plurality of variables defining the representative behavior of the peer group with respect to the reference source, before actuation of the processing portion for defining the coordinate space.

51. A system according to Claim 50 wherein the computer device further comprises a processing portion capable of ranking the plurality of identified variables.

52. A system according to Claim 51 wherein the computer device further comprises a processing portion capable of selecting a number of the ranked variables.

53. A system according to Claim 46 wherein the computer device further comprises a processing portion capable of applying an outlier trimming procedure to the result of the application of the selected ranked variables to the reference source so as to facilitate an analysis of a conesponding statistical distribution of the result.

54. A system according to Claim 46 wherein the computer device further comprises a processing portion capable of applying a non-linear transformation to the result of the application of the selected ranked variables to the reference source so as to facilitate an analysis of a conesponding statistical distribution of the result.

55. A system according to Claim 46 wherein the processing portion for determining a score is further capable of determining a score conesponding to a proximity measurement, with respect to the coordinate space, between the respective element and the central tendency.

56. A system according to Claim 46 wherein the processing portion for determining a score is further capable of determining a score conesponding to a normalized multivariate distance, with respect to the coordinate space, between the respective element and the central tendency.

57. A system according to Claim 46 wherein the processing portion for determining a score is further capable of determining a score as a probability that the respective element comprises an abenant behavior by the conesponding entity.

58. A system according to Claim 46 wherein the computer device further comprises a processing portion capable of forming an indicia of at least one threshold- exceeding score.

59. A system according to Claim 58 further comprising an output module operably engaged with the computer device, the output module being configured to receive the indicia of the at least one threshold exceeding score and to process the indicia so as to indicate the abenant behavior and the conesponding entity.

60. A system according to Claim 59 wherein the output module comprises at least one of an aural alarm, a visual alarm, a monitor, a printing device, an Internet web site, and an electronic storage device.