US20140279752A1 - System and Method for Generating Ultimate Reason Codes for Computer Models - Google Patents

System and Method for Generating Ultimate Reason Codes for Computer Models Download PDF

Info

Publication number
US20140279752A1
US20140279752A1 US14/209,135 US201414209135A US2014279752A1 US 20140279752 A1 US20140279752 A1 US 20140279752A1 US 201414209135 A US201414209135 A US 201414209135A US 2014279752 A1 US2014279752 A1 US 2014279752A1
Authority
US
United States
Prior art keywords
reason
ultimate
reason code
model
computer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US14/209,135
Inventor
Joseph Milana
Yonghui Chen
Lujia Chen
Weiqiang Wang
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ElectrifAI LLC
Original Assignee
Opera Solutions LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Opera Solutions LLC filed Critical Opera Solutions LLC
Priority to US14/209,135 priority Critical patent/US20140279752A1/en
Assigned to OPERA SOLUTIONS, LLC reassignment OPERA SOLUTIONS, LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: MILANA, JOSEPH, WANG, WEIQIANG, CHEN, LUJIA, Chen, Yonghui
Publication of US20140279752A1 publication Critical patent/US20140279752A1/en
Assigned to OPERA SOLUTIONS U.S.A., LLC reassignment OPERA SOLUTIONS U.S.A., LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: OPERA SOLUTIONS, LLC
Assigned to WHITE OAK GLOBAL ADVISORS, LLC reassignment WHITE OAK GLOBAL ADVISORS, LLC SECURITY AGREEMENT Assignors: BIQ, LLC, LEXINGTON ANALYTICS INCORPORATED, OPERA PAN ASIA LLC, OPERA SOLUTIONS GOVERNMENT SERVICES, LLC, OPERA SOLUTIONS USA, LLC, OPERA SOLUTIONS, LLC
Assigned to OPERA SOLUTIONS OPCO, LLC reassignment OPERA SOLUTIONS OPCO, LLC TRANSFER STATEMENT AND ASSIGNMENT Assignors: WHITE OAK GLOBAL ADVISORS, LLC
Priority to US16/511,743 priority patent/US20190340514A1/en
Assigned to ELECTRIFAI, LLC reassignment ELECTRIFAI, LLC CHANGE OF NAME (SEE DOCUMENT FOR DETAILS). Assignors: OPERA SOLUTIONS OPCO, LLC
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models

Definitions

  • the present disclosure relates generally to a system and method for providing reason codes by training a series of computer models. More specifically, the present disclosure relates to a system and method for generating ultimate reason codes for computer models.
  • the system and method of the present disclosure generates ultimate reason codes for high score records in real time.
  • the system utilizes a four-step approach to identify reason codes for high score records in real time in production.
  • the system provides ultimate reasons for the first reason based on assumptions and results.
  • the system can provide any arbitrary number of reason codes by approximation.
  • the system for generating ultimate reason codes for computer models comprising a computer system for receiving a data set, and an ultimate reason code generation engine stored on the computer system which, when executed by the computer system, causes the computer system to train a base model with a plurality of reason codes, wherein each reason code includes one or more variables, each of which belongs to only one reason code, train a subsequent model using a subset of the plurality of reason codes, determine whether a high score exists in the base model, determine a scored difference if a high score exists in the base model, and designate a reason code having a largest drop of score as an ultimate reason code.
  • FIG. 1 is a diagram illustrating the system of the present disclosure
  • FIG. 2 illustrates processing steps carried out by the system of the present disclosure
  • FIG. 3 is a graph illustrating a score generated by the system in proportion to the probability of the target
  • FIG. 4 is a graph comparing ultimate reasons generated by the system with logistic regression reasons.
  • FIG. 5 is a diagram showing hardware and software components of the system.
  • the present disclosure relates to a system and method for generating ultimate reason codes for computer models, as discussed in detail below in connection with FIGS. 1-5 .
  • the system can be used as an add-on package for any individual classification product to provide reason codes.
  • the system could be an individual product for model deployment, and could be sold to any industries/companies requiring high performance analytics models, as well as robust reasons.
  • the system could be used internally to provide services to customers (e.g., credit issuers, and credit bureau), and could be applied to various applications (e.g., health care, collections, marketing, etc.).
  • the system and method of the present disclosure provides ultimate reason codes based on both solid assumptions and experimental results.
  • FIG. 1 is a diagram showing a system for generating ultimate reason codes for computer models, indicated generally at 10 .
  • the system 10 comprises a computer system 12 (e.g., a server) having a database 14 stored therein and ultimate reason code generation engine 16 .
  • the computer system 12 could be any suitable computer server (e.g., a server with an INTEL microprocessor, multiple processors, multiple processing cores) running any suitable operating system (e.g., Windows by Microsoft, Linux, etc.).
  • the database 14 could be stored on the computer system 12 , or located externally (e.g., in a separate database server in communication with the system 10 ).
  • the system 10 could be web-based and remotely accessible such that the system 10 communicates through a network 20 with one or more of a variety of computer systems 22 (e.g., personal computer system 26 a , a smart cellular telephone 26 b , a tablet computer 26 c , or other devices).
  • computer systems 22 e.g., personal computer system 26 a , a smart cellular telephone 26 b , a tablet computer 26 c , or other devices.
  • Network communication could be over the Internet using standard TCP/IP communications protocols (e.g., hypertext transfer protocol (HTTP), secure HTTP (HTTPS), file transfer protocol (FTP), electronic data interchange (EDI), etc.), through a private network connection (e.g., wide-area network (WAN) connection, emails, electronic data interchange (EDI) messages, extensible markup language (XML) messages, file transfer protocol (FTP) file transfers, etc.), or any other suitable wired or wireless electronic communications format.
  • HTTP hypertext transfer protocol
  • HTTPS secure HTTP
  • FTP file transfer protocol
  • EDI electronic data interchange
  • EDI electronic data interchange
  • a private network connection e.g., wide-area network (WAN) connection, emails, electronic data interchange (EDI) messages, extensible markup language (XML) messages, file transfer protocol (FTP) file transfers, etc.
  • WAN wide-area network
  • EDI extensible markup language
  • FTP file transfer protocol
  • the reason code generation system and method of the present disclosure is utilized to provide “ultimate” reason codes based on a few assumptions described below.
  • a neural network (NN) fraud detection model is used with a dataset as an example.
  • An NN trained with Mean Squared Error will approach the posteriori probability P(Bad
  • Ultimate reason code technology is used to identify an arbitrary number of reason codes by retraining a group of sub models with individual knocked-out reasons.
  • FIG. 2 illustrates processing steps 50 of the system of the present disclosure.
  • step 52 variables are grouped into reasons manually.
  • a reason can contain one or more variables, and a single variable belongs to only one reason. It is difficult to automate this process as it usually involves expert knowledge of the data, the domain, and the variables. In the examples discussed below this step is skipped to avoid human intervention. Thus, every reason contains only one variable and every variable is a unique reason.
  • step 54 a base model, M — 0, is trained with all of the reasons (e.g., N). Subsequent N models (M — 1, M — 2, . . . , M_N) are trained by removing each reason at a time.
  • M — 1 will be trained on the same data with reasons (R — 2, R — 3, . . . , R_N), and without R — 1.
  • step 56 when a high score record occurs in a base model, the score difference between S — 0 and (S — 1, S — 2, . . . , S_N) are compared.
  • step 58 the knocked-out reason with the largest drop of score, max (S — 0-S_N), is defined as the ultimate reason code. If more than one reason code is needed, the next largest drop reason is defined, and so on.
  • M — 0, M — 1, . . . , M_N are deployed. For a high score record, all the N+1 scores, S — 0, S — 1, . . . , S_N, are obtained. This only increases the overall running time by a small percentage.
  • the first assumption is that the score is consistent with the probability of target for all the trained N+1 models. This is one of the properties for Neural Networks (as well as other model paradigms). As long as there is enough sample data, and the model is trained well enough, the final score should converge on the probability of the target (validated in examples below).
  • a second assumption is that all of the N+1 models are consistent between training data and production data. This can be monitored by the score distributions of all of the N+1 models. If any inconsistency happens in any one model, the model should be retrained. Statistically this assumption holds but there can be some standard errors causing outliers, which could be in statistical range.
  • the score decreases for nearly all high-score transactions in knocked-out models.
  • the knocked-out reason in the smallest-score model would be chosen as the first reason code.
  • FIG. 3 is a graph illustrating a score in proportion to the probability of the target.
  • the base model M — 0 was used to validate the assumption that the score is consistent with the probability of target.
  • the data was from a dataset and the targets were the frauds.
  • a three-layer neural network model was trained with 30 input variables.
  • the X axis represents the scores in 100 bins, where score 0.87 corresponds to bin 87 .
  • the Y axis represents the probability of the target. As shown, the score was very consistent with the probability of target, with R-square close to 1.
  • FIG. 4 is a graph comparing ultimate reasons with logistic regression reasons.
  • Logistic regression models are often used in production since the weights are usually explainable and because the score is intended to be interpreted as a probability.
  • the first ultimate reason was compared with the first reason generated by logistic regression.
  • the general approach for determining the logistic regression reasons is to assign the relevance of each input variable to the overall score generated by the model.
  • the reason codes are then ranked based on the relevance.
  • the first logistic regression reason is the variable (first reason) x i with coefficient ⁇ i introducing the maximal deviation of the product x i ⁇ i from average value x 1 0 ⁇ i .
  • the top few (e.g., 3 or 4) reason codes were selected.
  • the X axis represents the score bins and the Y axis represents the first reason code matching rate between ultimate reasons and logistic regression in each score bin. As shown, the first reason matches well in most score bins. In high score bins above 95, the matching rate increases significantly.
  • Information related to the present disclosure includes (1) http://en.wikipedia.org/wiki/Maximum_likelihood, (2) M D Richard, et al., “Neural network classifiers estimate Bayesian a-posteriori probabilities,” Neural Computation, 3(4):461-483 (1991), and (3) Yonghui Chen, et al., “System and method for developing proxy model,” U.S. Provisional Patent No. 61/759,682, the disclosures of which are incorporated herein by reference.
  • FIG. 5 is a diagram showing hardware and software components of a computer system 100 on which the system of the present disclosure could be implemented.
  • the system 100 comprises a processing server 102 which could include a storage device 104 , a network interface 108 , a communications bus 110 , a central processing unit (CPU) (microprocessor) 112 , a random access memory (RAM) 114 , and one or more input devices 116 , such as a keyboard, mouse, etc.
  • the server 102 could also include a display (e.g., liquid crystal display (LCD), cathode ray tube (CRT), etc.).
  • LCD liquid crystal display
  • CRT cathode ray tube
  • the storage device 104 could comprise any suitable, computer-readable storage medium such as disk, non-volatile memory (e.g., read-only memory (ROM), eraseable programmable ROM (EPROM), electrically-eraseable programmable ROM (EEPROM), flash memory, field-programmable gate array (FPGA), etc.).
  • the server 102 could be a networked computer system, a personal computer, a smart phone, tablet computer etc. It is noted that the server 102 need not be a networked server, and indeed, could be a stand-alone computer system.
  • the functionality provided by the present disclosure could be provided by an ultimate reason code generation program/engine 106 , which could be embodied as computer-readable program code stored on the storage device 104 and executed by the CPU 112 using any suitable, high or low level computing language, such as Python, Java, C, C++, C#, .NET, MATLAB, etc.
  • the network interface 108 could include an Ethernet network interface device, a wireless network interface device, or any other suitable device which permits the server 102 to communicate via the network.
  • the CPU 112 could include any suitable single- or multiple-core microprocessor of any suitable architecture that is capable of implementing and running the ultimate reason code generation program 106 (e.g., Intel processor).
  • the random access memory 114 could include any suitable, high-speed, random access memory typical of most modern computers, such as dynamic RAM (DRAM), etc.

Abstract

A system and method for generating ultimate reason codes for computer models is provided. The system for generating ultimate reason codes for computer models comprising a computer system for receiving a data set, and an ultimate reason code generation engine stored on the computer system which, when executed by the computer system, causes the computer system to train a base model with a plurality of reason codes, wherein each reason code includes one or more variables, each of which belongs to only one reason code, train a subsequent model using a subset of the plurality of reason codes, determine whether a high score exists in the base model, determine a scored difference if a high score exists in the base model, and designate a reason code having a largest drop of score as an ultimate reason code.

Description

    CROSS-REFERENCE TO RELATED APPLICATION
  • This application claims priority to U.S. Provisional Patent Application No. 61/786,010 filed on Mar. 14, 2013, which is incorporated herein by reference in its entirety.
  • BACKGROUND
  • 1. Field of the Disclosure
  • The present disclosure relates generally to a system and method for providing reason codes by training a series of computer models. More specifically, the present disclosure relates to a system and method for generating ultimate reason codes for computer models.
  • 2. Related Art
  • Currently, for big data applications, clients typically require high performance models which are usually advanced complex models. In business (e.g., consumer finance and risk, health care, and marketing research), there are many non-linear modeling approaches (e.g., neural network, gradient boosting tree, ensemble model, etc.). At the same time, high score reason codes are often required for business reasons. One example is in the fraud detection area where neural network models are used for scoring, and reason codes are provided for investigation.
  • There are different techniques to provide reason codes for non-linear complex models in the big data industry. Many methods utilize a single base model by computing the derivative of input reasons (e.g., the impact of a particular input variable on the model score), which is similar to sensitivity analysis approximation. Some other methods apply approximation of the scoring model to compute reasons. All of them are based on a single model, with the assumption that by modifying the input without retraining, the score is still consistent with the probability of the target. In other words, one assumption of utilizing a single base model is that the probability consistency holds even if one input variable is knocked-out without retraining. This assumption does not necessary hold as each sub-model's parameters are not optimized by training, such as by maximum-likelihood (e.g., the knocked-out model is not retrained).
  • SUMMARY
  • The system and method of the present disclosure generates ultimate reason codes for high score records in real time. The system utilizes a four-step approach to identify reason codes for high score records in real time in production. The system provides ultimate reasons for the first reason based on assumptions and results. The system can provide any arbitrary number of reason codes by approximation.
  • The system for generating ultimate reason codes for computer models comprising a computer system for receiving a data set, and an ultimate reason code generation engine stored on the computer system which, when executed by the computer system, causes the computer system to train a base model with a plurality of reason codes, wherein each reason code includes one or more variables, each of which belongs to only one reason code, train a subsequent model using a subset of the plurality of reason codes, determine whether a high score exists in the base model, determine a scored difference if a high score exists in the base model, and designate a reason code having a largest drop of score as an ultimate reason code.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The foregoing features of the disclosure will be apparent from the following Detailed Description, taken in connection with the accompanying drawings, in which:
  • FIG. 1 is a diagram illustrating the system of the present disclosure;
  • FIG. 2 illustrates processing steps carried out by the system of the present disclosure;
  • FIG. 3 is a graph illustrating a score generated by the system in proportion to the probability of the target;
  • FIG. 4 is a graph comparing ultimate reasons generated by the system with logistic regression reasons; and
  • FIG. 5 is a diagram showing hardware and software components of the system.
  • DETAILED DESCRIPTION
  • The present disclosure relates to a system and method for generating ultimate reason codes for computer models, as discussed in detail below in connection with FIGS. 1-5. The system can be used as an add-on package for any individual classification product to provide reason codes. The system could be an individual product for model deployment, and could be sold to any industries/companies requiring high performance analytics models, as well as robust reasons. The system could be used internally to provide services to customers (e.g., credit issuers, and credit bureau), and could be applied to various applications (e.g., health care, collections, marketing, etc.). The system and method of the present disclosure provides ultimate reason codes based on both solid assumptions and experimental results. By the term “ultimate reason code,” it is mean a final reason code for a particular data set being modeled by a computer model, driven by the relationships within the data and not by the specific model.
  • FIG. 1 is a diagram showing a system for generating ultimate reason codes for computer models, indicated generally at 10. The system 10 comprises a computer system 12 (e.g., a server) having a database 14 stored therein and ultimate reason code generation engine 16. The computer system 12 could be any suitable computer server (e.g., a server with an INTEL microprocessor, multiple processors, multiple processing cores) running any suitable operating system (e.g., Windows by Microsoft, Linux, etc.). The database 14 could be stored on the computer system 12, or located externally (e.g., in a separate database server in communication with the system 10).
  • The system 10 could be web-based and remotely accessible such that the system 10 communicates through a network 20 with one or more of a variety of computer systems 22 (e.g., personal computer system 26 a, a smart cellular telephone 26 b, a tablet computer 26 c, or other devices). Network communication could be over the Internet using standard TCP/IP communications protocols (e.g., hypertext transfer protocol (HTTP), secure HTTP (HTTPS), file transfer protocol (FTP), electronic data interchange (EDI), etc.), through a private network connection (e.g., wide-area network (WAN) connection, emails, electronic data interchange (EDI) messages, extensible markup language (XML) messages, file transfer protocol (FTP) file transfers, etc.), or any other suitable wired or wireless electronic communications format.
  • The reason code generation system and method of the present disclosure is utilized to provide “ultimate” reason codes based on a few assumptions described below. A neural network (NN) fraud detection model is used with a dataset as an example. An NN trained with Mean Squared Error will approach the posteriori probability P(Bad|x) for a binary outcome, which is validated by results described in more detail below. Ultimate reason code technology is used to identify an arbitrary number of reason codes by retraining a group of sub models with individual knocked-out reasons.
  • FIG. 2 illustrates processing steps 50 of the system of the present disclosure. In step 52, variables are grouped into reasons manually. A reason can contain one or more variables, and a single variable belongs to only one reason. It is difficult to automate this process as it usually involves expert knowledge of the data, the domain, and the variables. In the examples discussed below this step is skipped to avoid human intervention. Thus, every reason contains only one variable and every variable is a unique reason. In step 54, a base model, M 0, is trained with all of the reasons (e.g., N). Subsequent N models (M 1, M2, . . . , M_N) are trained by removing each reason at a time. For example, M 1 will be trained on the same data with reasons (R2, R3, . . . , R_N), and without R 1. In step 56, when a high score record occurs in a base model, the score difference between S 0 and (S 1, S2, . . . , S_N) are compared. In step 58, the knocked-out reason with the largest drop of score, max (S0-S_N), is defined as the ultimate reason code. If more than one reason code is needed, the next largest drop reason is defined, and so on. In production all N+1 models (M 0, M 1, . . . , M_N) are deployed. For a high score record, all the N+1 scores, S 0, S 1, . . . , S_N, are obtained. This only increases the overall running time by a small percentage.
  • This technique is based on a few assumptions, as described below. The first assumption is that the score is consistent with the probability of target for all the trained N+1 models. This is one of the properties for Neural Networks (as well as other model paradigms). As long as there is enough sample data, and the model is trained well enough, the final score should converge on the probability of the target (validated in examples below). A second assumption is that all of the N+1 models are consistent between training data and production data. This can be monitored by the score distributions of all of the N+1 models. If any inconsistency happens in any one model, the model should be retrained. Statistically this assumption holds but there can be some standard errors causing outliers, which could be in statistical range. The third assumption is that compared to the original model M 0, each sub-model M_k (1<=k<=N) has a lower score for a suspicious record due to missing information from the knocked-out reason. As shown in the results below, the score decreases for nearly all high-score transactions in knocked-out models. There are rare cases that all sub-models have higher scores than the original. This is due to statistical fluctuations affecting the original model. In this scenario, the knocked-out reason in the smallest-score model would be chosen as the first reason code.
  • FIG. 3 is a graph illustrating a score in proportion to the probability of the target. In this example, the base model M 0 was used to validate the assumption that the score is consistent with the probability of target. The data was from a dataset and the targets were the frauds. A three-layer neural network model was trained with 30 input variables. The X axis represents the scores in 100 bins, where score 0.87 corresponds to bin 87. The Y axis represents the probability of the target. As shown, the score was very consistent with the probability of target, with R-square close to 1.
  • FIG. 4 is a graph comparing ultimate reasons with logistic regression reasons. Logistic regression models are often used in production since the weights are usually explainable and because the score is intended to be interpreted as a probability. In this example, the first ultimate reason was compared with the first reason generated by logistic regression. The general approach for determining the logistic regression reasons is to assign the relevance of each input variable to the overall score generated by the model. The reason codes are then ranked based on the relevance. The first logistic regression reason is the variable (first reason) xi with coefficient βi introducing the maximal deviation of the product xiβi from average value x1 0βi. After ranking, the top few (e.g., 3 or 4) reason codes were selected. The X axis represents the score bins and the Y axis represents the first reason code matching rate between ultimate reasons and logistic regression in each score bin. As shown, the first reason matches well in most score bins. In high score bins above 95, the matching rate increases significantly.
  • Information related to the present disclosure includes (1) http://en.wikipedia.org/wiki/Maximum_likelihood, (2) M D Richard, et al., “Neural network classifiers estimate Bayesian a-posteriori probabilities,” Neural Computation, 3(4):461-483 (1991), and (3) Yonghui Chen, et al., “System and method for developing proxy model,” U.S. Provisional Patent No. 61/759,682, the disclosures of which are incorporated herein by reference.
  • FIG. 5 is a diagram showing hardware and software components of a computer system 100 on which the system of the present disclosure could be implemented. The system 100 comprises a processing server 102 which could include a storage device 104, a network interface 108, a communications bus 110, a central processing unit (CPU) (microprocessor) 112, a random access memory (RAM) 114, and one or more input devices 116, such as a keyboard, mouse, etc. The server 102 could also include a display (e.g., liquid crystal display (LCD), cathode ray tube (CRT), etc.). The storage device 104 could comprise any suitable, computer-readable storage medium such as disk, non-volatile memory (e.g., read-only memory (ROM), eraseable programmable ROM (EPROM), electrically-eraseable programmable ROM (EEPROM), flash memory, field-programmable gate array (FPGA), etc.). The server 102 could be a networked computer system, a personal computer, a smart phone, tablet computer etc. It is noted that the server 102 need not be a networked server, and indeed, could be a stand-alone computer system.
  • The functionality provided by the present disclosure could be provided by an ultimate reason code generation program/engine 106, which could be embodied as computer-readable program code stored on the storage device 104 and executed by the CPU 112 using any suitable, high or low level computing language, such as Python, Java, C, C++, C#, .NET, MATLAB, etc. The network interface 108 could include an Ethernet network interface device, a wireless network interface device, or any other suitable device which permits the server 102 to communicate via the network. The CPU 112 could include any suitable single- or multiple-core microprocessor of any suitable architecture that is capable of implementing and running the ultimate reason code generation program 106 (e.g., Intel processor). The random access memory 114 could include any suitable, high-speed, random access memory typical of most modern computers, such as dynamic RAM (DRAM), etc.
  • Having thus described the system and method in detail, it is to be understood that the foregoing description is not intended to limit the spirit or scope thereof. It will be understood that the embodiments of the present disclosure described herein are merely exemplary and that a person skilled in the art may make any variations and modification without departing from the spirit and scope of the disclosure. All such variations and modifications, including those discussed above, are intended to be included within the scope of the disclosure. What is desired to be protected is set forth in the following claims.

Claims (18)

What is claimed is:
1. A system for generating ultimate reason codes for computer models comprising:
a computer system for receiving a data set;
an ultimate reason code generation engine stored on the computer system which, when executed by the computer system, causes the computer system to:
train a base model with a plurality of reason codes, wherein each reason code includes one or more variables, each of which belongs to only one reason code;
train a subsequent model using a subset of the plurality of reason codes;
determine whether a high score exists in the base model;
determine a scored difference if a high score exists in the base model; and
designate a reason code having a largest drop of score as an ultimate reason code.
2. The system of claim 1, further comprising comparing a score difference between the base model and subsequent model.
3. The system of claim 1, further comprising defining a next largest drop for a second ultimate reason code.
4. The system of claim 3, wherein the ultimate reason codes are generated in real time.
5. The system of claim 1, further comprising obtaining, for a high score record, one or more scores from the base model and subsequent model.
6. The system of claim 1, further comprising ranking the reason codes based on relevance.
7. A method for generating ultimate reason codes for computer models comprising:
receiving a data set at a computer system;
training a base model with a plurality of reason codes by an ultimate reason code generation engine stored on and executed by the computer system, wherein each reason code includes one or more variables, each of which belongs to only one reason code;
training by the ultimate reason code generation engine a subsequent model using a subset of the plurality of reason codes;
determining by the ultimate reason code generation engine whether a high score exists in the base model;
determining by the ultimate reason code generation engine a scored difference if a high score exists in the base model; and
designating by the ultimate reason code generation engine a reason code having a largest drop of score as an ultimate reason code.
8. The method of claim 7, further comprising comparing a score difference between the base model and subsequent model.
9. The method of claim 7, further comprising defining a next largest drop for a second ultimate reason code.
10. The method of claim 9, wherein the ultimate reason codes are generated in real time.
11. The method of claim 7, further comprising obtaining, for a high score record, one or more scores from the base model and subsequent model.
12. The method of claim 7, further comprising ranking the reason codes based on relevance.
13. A non-transitory computer-readable medium having computer-readable instructions stored thereon which, when executed by a computer system, cause the computer system to perform the steps of:
receiving a data set at the computer system;
training a base model with a plurality of reason codes by an ultimate reason code generation engine stored on and executed by the computer system, wherein each reason code includes one or more variables, each of which belongs to only one reason code;
training by the ultimate reason code generation engine a subsequent model using a subset of the plurality of reason codes;
determining by the ultimate reason code generation engine whether a high score exists in the base model;
determining by the ultimate reason code generation engine a scored difference if a high score exists in the base model; and
designating by the ultimate reason code generation engine a reason code having a largest drop of score as an ultimate reason code.
14. The computer-readable medium of claim 13, further comprising comparing a score difference between the base model and subsequent model.
15. The computer-readable medium of claim 13, further comprising defining a next largest drop for a second ultimate reason code.
16. The computer-readable medium of claim 15, wherein the ultimate reason codes are generated in real time.
17. The computer-readable medium of claim 13, further comprising obtaining, for a high score record, one or more scores from the base model and subsequent model.
18. The computer-readable medium of claim 13, further comprising ranking the reason codes based on relevance.
US14/209,135 2013-03-14 2014-03-13 System and Method for Generating Ultimate Reason Codes for Computer Models Abandoned US20140279752A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US14/209,135 US20140279752A1 (en) 2013-03-14 2014-03-13 System and Method for Generating Ultimate Reason Codes for Computer Models
US16/511,743 US20190340514A1 (en) 2013-03-14 2019-07-15 System and method for generating ultimate reason codes for computer models

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201361786010P 2013-03-14 2013-03-14
US14/209,135 US20140279752A1 (en) 2013-03-14 2014-03-13 System and Method for Generating Ultimate Reason Codes for Computer Models

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/511,743 Continuation US20190340514A1 (en) 2013-03-14 2019-07-15 System and method for generating ultimate reason codes for computer models

Publications (1)

Publication Number Publication Date
US20140279752A1 true US20140279752A1 (en) 2014-09-18

Family

ID=51532863

Family Applications (2)

Application Number Title Priority Date Filing Date
US14/209,135 Abandoned US20140279752A1 (en) 2013-03-14 2014-03-13 System and Method for Generating Ultimate Reason Codes for Computer Models
US16/511,743 Pending US20190340514A1 (en) 2013-03-14 2019-07-15 System and method for generating ultimate reason codes for computer models

Family Applications After (1)

Application Number Title Priority Date Filing Date
US16/511,743 Pending US20190340514A1 (en) 2013-03-14 2019-07-15 System and method for generating ultimate reason codes for computer models

Country Status (1)

Country Link
US (2) US20140279752A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017003499A1 (en) * 2015-06-29 2017-01-05 Wepay, Inc. System and methods for generating reason codes for ensemble computer models
US9734447B1 (en) 2014-10-30 2017-08-15 Sas Institute Inc. Generating accurate reason codes with complex non-linear modeling and neural networks

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220004704A1 (en) * 2020-03-13 2022-01-06 DataRobot, Inc. Methods for documenting models, and related systems and apparatus

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5819226A (en) * 1992-09-08 1998-10-06 Hnc Software Inc. Fraud detection using predictive modeling
US20060212386A1 (en) * 2005-03-15 2006-09-21 Willey Dawn M Credit scoring method and system
US7458508B1 (en) * 2003-05-12 2008-12-02 Id Analytics, Inc. System and method for identity-based fraud detection
US20090018026A1 (en) * 2006-02-28 2009-01-15 Chul Woo Kim Protein markers for diagnosing stomach cancer and the diagnostic kit using them
US20090076866A1 (en) * 2007-09-18 2009-03-19 Zoldi Scott M Revenue Assurance Analytics
US7788195B1 (en) * 2006-03-24 2010-08-31 Sas Institute Inc. Computer-implemented predictive model generation systems and methods
US20110119213A1 (en) * 1999-10-27 2011-05-19 Health Discovery Corporation Support vector machine - recursive feature elimination (svm-rfe)
US8296257B1 (en) * 2009-04-08 2012-10-23 Google Inc. Comparing models
US8805737B1 (en) * 2009-11-02 2014-08-12 Sas Institute Inc. Computer-implemented multiple entity dynamic summarization systems and methods

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5819226A (en) * 1992-09-08 1998-10-06 Hnc Software Inc. Fraud detection using predictive modeling
US20110119213A1 (en) * 1999-10-27 2011-05-19 Health Discovery Corporation Support vector machine - recursive feature elimination (svm-rfe)
US7458508B1 (en) * 2003-05-12 2008-12-02 Id Analytics, Inc. System and method for identity-based fraud detection
US20060212386A1 (en) * 2005-03-15 2006-09-21 Willey Dawn M Credit scoring method and system
US20090018026A1 (en) * 2006-02-28 2009-01-15 Chul Woo Kim Protein markers for diagnosing stomach cancer and the diagnostic kit using them
US7788195B1 (en) * 2006-03-24 2010-08-31 Sas Institute Inc. Computer-implemented predictive model generation systems and methods
US20090076866A1 (en) * 2007-09-18 2009-03-19 Zoldi Scott M Revenue Assurance Analytics
US8296257B1 (en) * 2009-04-08 2012-10-23 Google Inc. Comparing models
US8805737B1 (en) * 2009-11-02 2014-08-12 Sas Institute Inc. Computer-implemented multiple entity dynamic summarization systems and methods

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"An Introduction to Variable and Feature Selection" Guyon et al, Journal of Machine Learning Research 3 (2003) 1157-1182 Submitted 11/02; Published 3/03 *
"An Introduction to Variable and Feature Selection"Guyon et al, Journal of Machine Learning Research 3 (2003) 1157-1182 Submitted 11/02; Published 3/03 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9734447B1 (en) 2014-10-30 2017-08-15 Sas Institute Inc. Generating accurate reason codes with complex non-linear modeling and neural networks
WO2017003499A1 (en) * 2015-06-29 2017-01-05 Wepay, Inc. System and methods for generating reason codes for ensemble computer models
US10387800B2 (en) * 2015-06-29 2019-08-20 Wepay, Inc. System and methods for generating reason codes for ensemble computer models

Also Published As

Publication number Publication date
US20190340514A1 (en) 2019-11-07

Similar Documents

Publication Publication Date Title
CN109858737B (en) Grading model adjustment method and device based on model deployment and computer equipment
US20190034766A1 (en) Machine learning predictive labeling system
US20190050368A1 (en) Machine learning predictive labeling system
US20200302540A1 (en) Applying a trained model to predict a future value using contextualized sentiment data
CN109165840A (en) Risk profile processing method, device, computer equipment and medium
WO2021000678A1 (en) Business credit review method, apparatus, and device, and computer-readable storage medium
US11551026B2 (en) Dynamic reconfiguration training computer architecture
US11093833B1 (en) Multi-objective distributed hyperparameter tuning system
US20190340514A1 (en) System and method for generating ultimate reason codes for computer models
US11200514B1 (en) Semi-supervised classification system
US20140279815A1 (en) System and Method for Generating Greedy Reason Codes for Computer Models
US11374919B2 (en) Memory-free anomaly detection for risk management systems
WO2020253038A1 (en) Model construction method and apparatus
CN110717597A (en) Method and device for acquiring time sequence characteristics by using machine learning model
CN111582932A (en) Inter-scene information pushing method and device, computer equipment and storage medium
CN113785314A (en) Semi-supervised training of machine learning models using label guessing
CN112995414B (en) Behavior quality inspection method, device, equipment and storage medium based on voice call
CN114647554A (en) Performance data monitoring method and device of distributed management cluster
CN113392920B (en) Method, apparatus, device, medium, and program product for generating cheating prediction model
CN114090601B (en) Data screening method, device, equipment and storage medium
CN115099875A (en) Data classification method based on decision tree model and related equipment
CN109145115B (en) Product public opinion discovery method, device, computer equipment and storage medium
US20230377004A1 (en) Systems and methods for request validation
US20240095598A1 (en) Data processing methods and computer systems for wavelakes signal intelligence
CN117172632B (en) Enterprise abnormal behavior detection method, device, equipment and storage medium

Legal Events

Date Code Title Description
AS Assignment

Owner name: OPERA SOLUTIONS, LLC, NEW JERSEY

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:MILANA, JOSEPH;CHEN, YONGHUI;CHEN, LUJIA;AND OTHERS;SIGNING DATES FROM 20140423 TO 20140527;REEL/FRAME:032983/0836

AS Assignment

Owner name: OPERA SOLUTIONS U.S.A., LLC, NEW JERSEY

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:OPERA SOLUTIONS, LLC;REEL/FRAME:039089/0761

Effective date: 20160706

AS Assignment

Owner name: WHITE OAK GLOBAL ADVISORS, LLC, CALIFORNIA

Free format text: SECURITY AGREEMENT;ASSIGNORS:OPERA SOLUTIONS USA, LLC;OPERA SOLUTIONS, LLC;OPERA SOLUTIONS GOVERNMENT SERVICES, LLC;AND OTHERS;REEL/FRAME:039277/0318

Effective date: 20160706

AS Assignment

Owner name: OPERA SOLUTIONS OPCO, LLC, CALIFORNIA

Free format text: TRANSFER STATEMENT AND ASSIGNMENT;ASSIGNOR:WHITE OAK GLOBAL ADVISORS, LLC;REEL/FRAME:047276/0107

Effective date: 20181010

STPP Information on status: patent application and granting procedure in general

Free format text: FINAL REJECTION MAILED

STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION

AS Assignment

Owner name: ELECTRIFAI, LLC, NEW JERSEY

Free format text: CHANGE OF NAME;ASSIGNOR:OPERA SOLUTIONS OPCO, LLC;REEL/FRAME:057047/0300

Effective date: 20191209