WO2004097587A2 - Method and system for evaluation fit of raw data to model data - Google Patents

Method and system for evaluation fit of raw data to model data Download PDF

Info

Publication number
WO2004097587A2
WO2004097587A2 PCT/US2004/013397 US2004013397W WO2004097587A2 WO 2004097587 A2 WO2004097587 A2 WO 2004097587A2 US 2004013397 W US2004013397 W US 2004013397W WO 2004097587 A2 WO2004097587 A2 WO 2004097587A2
Authority
WO
WIPO (PCT)
Prior art keywords
item
assessment
examinee
mastery
class
Prior art date
Application number
PCT/US2004/013397
Other languages
French (fr)
Other versions
WO2004097587A3 (en
Inventor
William F. Stout
Sarah M. Hartz
Louis Roussos
Original Assignee
Educational Testing Service
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Educational Testing Service filed Critical Educational Testing Service
Publication of WO2004097587A2 publication Critical patent/WO2004097587A2/en
Publication of WO2004097587A3 publication Critical patent/WO2004097587A3/en

Links

Classifications

    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B7/00Electrically-operated teaching apparatus or devices working with questions and answers

Definitions

  • the present invention relates generally to the field of assessment evaluation.
  • the invention relates to providing a method and system for evaluating
  • assessment examinees on a plurality of attributes based on responses to assessment items
  • standardized testing is accused of a variety of failings.
  • One criticism of standardized testing is that it can only assess a student's abilities generally, but cannot adequately determine whether a student has mastered a particular ability or not. Accordingly, standardized testing is seen as inadequate in assisting teachers with developing
  • Cognitive diagnosis statistically analyzes the process of evaluating each examinee on the
  • cognitive diagnosis performs summative assessment.
  • cognitive diagnosis performs formative
  • algebra test is interested in evaluating a standard set of algebra attributes, such as factoring,
  • diagnosis were developed to either (1) diagnose examinees, by assigning mastery or non-
  • evaluation of the assessment item does not relate to examinees. Moreover, in order to be
  • assessment items assist in determining one or more attributes of an assessment examinee.
  • assessment examination based on the responses of assessment examinees.
  • the present invention is directed to solving one or more of the problems
  • each association includes receiving a plurality of associations, wherein each association pertains to a
  • each mastery estimate represents
  • the one or more statistics may include a percentage of correct
  • the plurality of examinee classes may include a master class associated with an assessment item.
  • the master class includes all examinees having a mastery estimate for
  • the plurality of examinee classes may be any one of each attribute associated with the assessment item.
  • the plurality of examinee classes may be any one of each attribute associated with the assessment item.
  • the high non-master class includes all
  • the low non-master class comprises all
  • the one or more statistics may include a difference between a
  • the plurality of examinee classes may further include a non-master class.
  • the non-master class includes all examinees not having a mastery estimate for at least one
  • the one or more statistics include a difference
  • each association includes receiving a plurality of associations, wherein each association pertains to a
  • each mastery estimate represents
  • assessment item generating one or more statistics, wherein each statistic is based on one or
  • the method may further include
  • Each allowability limit defines a threshold number
  • the determining step may include defining a plurality of item types, determining a list of allowable examinees for each item class based on the allowability
  • determining step further includes computing the average proportion of items for all allowable
  • the determining step further includes
  • the allowable examinee meets the criterion level for the item class.
  • the one or more statistics may includesone or more results of the binomial hypothesis tests, a list of allowable examinees that did not meet the criterion level for the item
  • the plurality of item classes may include a
  • the master class associated with an examinee.
  • the master class comprises all items for which an
  • examinee has a mastery estimate for each attribute associated with an assessment item.
  • the plurality of item classes may further include a high non-master class
  • high non-master class comprises all items for which an examinee has a mastery estimate for
  • the low non-master class comprises all items for which an examinee has a mastery estimate
  • the plurality of item classes further comprises a non-master class.
  • the non-master class
  • each association pertains to a
  • each mastery estimate represents
  • each of the statistics is based on one or more of the associations, responses,
  • the plurality of item parameter estimates may include receiving
  • each first probability is a measure of a likelihood that
  • each weight is a measure of the relevance of the plurality of
  • the one or more statistics may include, for each examinee and each item, determining a probability that the examinee answered the item
  • the one or more statistics may further include, for each examinee, one
  • statistics may further include, for each item, one or more of an observed score
  • each association may include receiving a plurality of associations, wherein each association pertains to a
  • each mastery estimate file pertains to an answer by one of a plurality of examinees to one of the plurality of assessment items, receiving a plurality of mastery estimates files, wherein each mastery estimate files
  • each mastery estimate represents whether one of
  • the plurality of examinees has mastered one of the plurality of attributes, receiving a plurality of mastery parameters, determining whether each examinee falls into at least one of a
  • thresholds generating one or more statistics, wherein each statistic is based on one or more of
  • more statistics comprise one or more mastery thresholds, and outputting at least one of the
  • optimizing a plurality of mastery thresholds may
  • the plurality of examinee classes may include
  • the master class comprises all examinees having a mastery state for each attribute associated with an assessment item, a high non-master class, wherein the high
  • non-master class comprises all examinees having a mastery state for at least one-half of all attributes, but not all attributes, associated with an assessment item, and a low non-master
  • the low non-master class comprises all examinees having a mastery state for
  • the maximization criteria comprises maximizing the average of the difference between the average proportion correct for the master class and the
  • the one or more statistics may include one or more of the maximization
  • processor for performing one or more of the methods described above.
  • FIG. 1 depicts an exemplary probabilistic structure for an item response
  • FIG. 2 depicts an exemplary hierarchical Bayesian model for the examinee parameters according to an embodiment of the present invention.
  • FIG. 3 depicts an exemplary hierarchical Bayesian model according to an
  • FIG. 4 depicts a flow chart for an exemplary method for generating examinee
  • FIG. 5 depicts a flow chart for an exemplary method for generating
  • FIG. 6 depicts a flow chart for an exemplary method for generating coordinated examinee and assessment item statistics according to an embodiment of the
  • FIG. 7 depicts a flow chart for an exemplary method for optimizing mastery
  • FIG. 8 depicts a flow chart of the interrelationship between the exemplary
  • the present invention relates to a method and system for evaluating
  • Item response functions may be used to determine the probability that an
  • response function of the present invention further models whether an examinee has mastered
  • an examinee that has mastered more evaluated attributes may be expected to have a higher likelihood of having mastered non-evaluated attributes than an examinee that
  • the item response function may be as follows:
  • ⁇ j the cognitive attributes k specified for examinee/.
  • ⁇ j - a unidimensional projection of examinee/s ability to perform skills outside of the Q matrix attributes;
  • the present invention uses a Bayesian model to estimate
  • a Bayesian model may permit flexibility in parameter relationships and simplify estimation procedures.
  • the Bayesian model may use hierarchical
  • estimated values for the unknown variables are determined predominantly by the evaluation data and not by the prior distributions.
  • a prior distribution refers to information
  • FIG. 1 depicts an exemplary probabilistic structure for an item response
  • Bayesian priors for the parameters are added to the model, which define relationships between variables.
  • non-informative priors may be constructed for
  • the correlations may have a uniform prior over some
  • the hyperparameter K k may be used as a
  • FIG. 2 depicts an exemplary hierarchical Bayesian model for the
  • a Beta distribution with a unit interval may be used to represent each of the
  • Beta distribution may use two parameters
  • Beta distributions are a and b, which define the shape of the distribution.
  • the mean of the Beta distributions is
  • the following priors may be assigned: ⁇ ] ⁇ ⁇ a+b l + a+b
  • Beta distribution parameters of each Beta distribution may be re-formulated into the mean and an inverse
  • FIG. 3 depicts an exemplary hierarchical Bayesian model according to an
  • MCMC Monte Carlo simulation-based computational approach.
  • the MCMC approach may also permit flexibility in inferring stochastic relationships among
  • the MCMC algorithm may be implemented by the Metropolis-Hastings within
  • the MCMC simulation may produce a "chain" of random
  • random numbers may be approximately equal to the required posterior distribution.
  • MCMC may simulate a Markov Chain in a parameter space S of the
  • Estimating the posterior distribution entails performing a series of draws of S
  • the Metropolis-Hastings algorithm may perform the
  • the Bayesian model such as the one depicted in FIG. 3, supplies the
  • J, (9 * ⁇ 9_' ⁇ l ) J, (9_' ⁇ ⁇ 9 ) (i.e., J, (9 ⁇ 9' ⁇ ) is chosen to be symmetric).
  • the Gibbs sampler also known as "alternating conditional sampling,” may be used to determine whether alternating conditional sampling.
  • sampler may partition 9 into item parameters and examinee parameters such that the item
  • Sampling the hyperparameters may include drawing the candidate
  • the likelihoods are calculated by
  • likelihoods may use the following model-based relationship:
  • the correlations p k2 may have a uniform prior on (0, 1).
  • to be a positive definite matrix.
  • determinant of ⁇ * is a quadratic function of
  • the matrix may be
  • AA T is a positive definite matrix whose diagonal elements are one and off-
  • diagonal elements are positive numbers less than one.
  • the Gibbs step for c is separate from the Gibbs step for ⁇ and r (steps 4 and 5, respectively), the structures for the two Gibbs steps may be nearly identical.
  • the candidate parameters are drawn: ( ⁇ ' ) * ⁇ N( ( ⁇ * )' - 1 , ⁇ . ), (r' )' ⁇ N( (r * )' - 1 , ⁇ f . ), c ⁇ N(c - 1 , ⁇ c ).
  • ⁇ ⁇ . , ⁇ r . , and ⁇ c are determined experimentally by
  • the likelihoods are calculated by computing R(item i responses
  • Beta distributions with the hyperparameters from step t.
  • This software is referred to herein as the Arpeggio software application, although additional or
  • the methodology may produce multiple Markov Chain runs with imputed values of various parameters and summary files
  • a software application may provide
  • This software application is referred to herein as the IMStats software application
  • FIG. 4 depicts a flow chart of the operation of the IMStats software
  • the IMStats software application may receive the Q matrix ⁇ q, k ⁇ , the response
  • X tJ the mastery estimate matrix
  • a ⁇ k the mastery estimate matrix
  • examinees may be grouped into mastery classes. For example, four types of mastery classes may be created for
  • Non-masters of item / are a superset containing all
  • the EMStats software application may not compute one or more of the classes. [0062] After the examinees are assigned to each class for an item, the proportion of
  • masters of item / ' " may be outputted.
  • the proportions may be outputted for
  • the present invention may include a software application
  • This software is referred to herein as the EMStats software application.
  • FIG. 5 depicts a flow chart of the operation of the EMStats software
  • the EMStats software application may receive the Q matrix ⁇ q, k ⁇ , the response
  • the allowability limits may be equal to the minimum number of responses to
  • the criterion levels may be equal to
  • four item classes may be created for each examinee: 1) "mastered items” (i.e., those items for which the examinee has mastered all attributes), 2) "non-mastered items” (i.e., those
  • Non-mastered items are a superset containing
  • the EMStats are high non-mastered items and low non-mastered items.
  • the EMStats are high non-mastered items and low non-mastered items.
  • a group of allowable examinees may be determined using the allowability limits for each item class. For
  • each allowable examinee n ⁇ , ..., Nand for each of the item classes, the proportion of items that the allowable examinee answered correctly out of the set of items may be computed.
  • a hypothesis test may be performed on each examinee in each allowable
  • examinee group A list of the examinees in each allowable examinee group who were
  • rejected by the hypothesis test may be generated for each allowable examinee group.
  • EMStats software application may output one or more of the average proportion of items
  • the present invention may include a software application
  • FusionStats software application This software application is referred to herein as the FusionStats software application.
  • FIG. 6 depicts a flow chart of the operation of the FusionStats software
  • the EMStats software application may receive the Q matrix ⁇ q, k ⁇ , the response
  • a residual examinee score may then be computed
  • item score may be computed by subtracting the observed item score from the predicted item
  • the residual score may be the absolute value of the differences
  • the predicted, observed, and residual scores may be outputted for each
  • the present invention may include a software application
  • the GAMEStats software application is referred to herein as the GAMEStats software application.
  • the GAMEStats are referred to herein as the GAMEStats software application.
  • the GAMEStats software may utilize a genetic algorithm to determine how to optimize these settings and the fit between the model and the data.
  • the genetic algorithm may utilize a genetic algorithm to determine how to optimize these settings and the fit between the model and the data.
  • p nmh corresponds to the average proportion correct for item high non-masters
  • p nm ⁇ corresponds to the average proportion correct for item low non-masters.
  • FIG. 7 is a flow chart of the GAMEStats software application. As shown in
  • the GAMEStats software application may receive the Q-matrix, an item response data
  • output may include, for example, the top 100 mastery settings with a value for the
  • FIG. 8 depicts a flow chart of the operation of the above-defined software
  • Inputs to the classifier may include one
  • the output of the classifier may include item parameters ⁇ , r * , c); the mastery
  • the GAMEStats software application may be performed to perform converge the
  • IMStats IMStats, EMStats, and FusionStats may be performed. Upon completion a summary fit report

Abstract

Methods and systems for evaluating the fit of raw data to model data are disclosed. Relationships between an assessment item and a tested attribute, responses from examinees to an assessment item, mastery states for an examinee for a tested attribute, one or more parameters based on expected assessment item performance, estimates for non-tested attributes for each examinee, likelihoods that an examinee that has mastered attributes pertaining to an assessment item will answer the item correctly, likelihoods that an examinee that has not mastered an attribute for an assessment item will answer the item correctly, and/or other variables may be received. For each item and/or for each examinee, a determination of a class for the examinee and/or item may be determined. Statistics may also be generated for each examinee, each item, an examination and/or any other basis.

Description

METHOD AND SYSTEM FOR EVALUATION FIT OF RAW DATA TO MODEL
DATA
CLAIM OF PRIORITY [0001] This application claims priority to U.S. Provisional Patent Application Serial
No. 60/466,319, filed April 29, 2003, entitled "Method for Evaluation Fit of Raw Data to
Model Data," and pending U.S. Patent Application Serial No. 09/838,129, filed April 20, 2001, entitled "A Latent Property Diagnosing Procedure," each of which is incorporated
herein by reference in its entirety.
TECHNICAL FIELD
[0002] The present invention relates generally to the field of assessment evaluation.
In particular, the invention relates to providing a method and system for evaluating
assessment examinees on a plurality of attributes based on responses to assessment items,
evaluating assessment items based on their ability to determine one or more attributes of an
assessment examinee, and evaluating an assessment examination based on the responses of
assessment examinees.
BACKGROUND
[0003] Standardized testing is prevalent in the United States today. Such testing is
used for higher education entrance examinations and achievement testing at the primary and
secondary school levels. The prevalence of standardized testing in the United States has been
further bolstered by the No Child Left Behind Act of 2001, which emphasizes nationwide
test-based assessment of student achievement.
[0004] At the same time, standardized testing is accused of a variety of failings. One criticism of standardized testing is that it can only assess a student's abilities generally, but cannot adequately determine whether a student has mastered a particular ability or not. Accordingly, standardized testing is seen as inadequate in assisting teachers with developing
a level of mastery for a student in all subject matters.
[0005] Because of this limitation, cognitive modeling methods, also known as skills
assessment or skills profiling, have been developed for assessing students' abilities.
Cognitive diagnosis statistically analyzes the process of evaluating each examinee on the
basis of the level of competence on an array of skills and using this evaluation to make relatively fine-grained categorical teaching and learning decisions about each examinee.
Traditional educational testing, such as the use of an SAT score to determine overall ability,
performs summative assessment. In contrast, cognitive diagnosis performs formative
assessment, which partitions answers for an assessment examination into fine-grained (often
discrete or dichotomous) cognitive skills or abilities in order to evaluate an examinee with
respect to his level of competence for each skill or ability. For example, if a designer of an
algebra test is interested in evaluating a standard set of algebra attributes, such as factoring,
laws of exponents, quadratic equations, and the like, cognitive diagnosis attempts to evaluate
each examinee with respect to each such attribute, whereas summative analysis simply
evaluates each examinee with respect to an overall score on the algebra test.
[0006] One assumption of all cognitive diagnosis models is that the assessment
items (/ = 1 , ... ,1) relate to a set of cognitive attributes (k - 1 , ... , K) in a particular manner.
The relationships between assessment items and cognitive attributes are generally represented
in a matrix of size Ix K and having values Q = {q,k}, where q, = 1 when attribute k is
required by item / and qtk = 0 when attribute k is not required by item .
[0007] Using the Q matrix representation, conventional approaches to cognitive
diagnosis were developed to either (1) diagnose examinees, by assigning mastery or non-
mastery of each attribute to each examinee, without determining the cognitive structure of the exam, or (2) cognitively evaluate the exam, by statistically evaluating the relationships
between the items and the attributes, without diagnosing the cognitive abilities of the
examinees. If items are not cognitively evaluated, a cognitive diagnosis has little meaning except with relation to the particular assessment examination. Likewise, if cognitive
diagnosis is not performed on examinees, a cognitive evaluation of the items cannot be aligned with the observed examinee response data. As a result, the interpretation of an
evaluation of the assessment item does not relate to examinees. Moreover, in order to be
useful for evaluation purposes, the parameters of a cognitive diagnosis model must be statistically identifiable. No conventional method incorporates all of these requirements.
[0008] What is needed is a method and system for performing cognitive diagnosis
that evaluates both assessment items and assessment examinees using statistically identifiable
parameters.
[0009] A further need exists for a method and system for evaluating assessment
examinees with respect to a plurality of attributes based on responses to assessment items.
[0010] A further need exists for a method and system for evaluating whether
assessment items assist in determining one or more attributes of an assessment examinee.
[0011] A still further need exists for a method and system for evaluating an
assessment examination based on the responses of assessment examinees.
[0012] The present invention is directed to solving one or more of the problems
described above.
SUMMARY [0013] Before the present methods, systems and materials are described, it is to be understood that this invention is not limited to the particular methodologies, systems and
materials described, as these may vary. It is also to be understood that the terminology used in the description is for the purpose of describing the particular versions or embodiments only, and is not intended to limit the scope of the invention which will be limited only by the
appended claims.
[0014] It must also be noted that as used herein and in the appended claims, the
singular forms "a," "an," and "the" include plural references unless the context clearly dictates otherwise. Thus, for example, reference to an "assessment item" is a reference to one
or more assessment items and equivalents thereof known to those skilled in the art, and so
forth. Unless defined otherwise, all technical and scientific terms used herein have the same
meanings as commonly understood by one of ordinary skill in the art. Although any methods,
materials, and devices similar or equivalent to those described herein can be used in the
practice or testing of embodiments of the invention, the preferred methods, materials, and
devices are now described. All publications mentioned herein are incorporated by reference.
Nothing herein is to be construed as an admission that the invention is not entitled to antedate
such disclosure by virtue of prior invention.
[0015] In an embodiment, a method for evaluating the fit of raw data to model data
includes receiving a plurality of associations, wherein each association pertains to a
relationship between one of a plurality of assessment items for an assessment examination
and one of a plurality of attributes, receiving a plurality of responses, wherein each response
pertains to an answer by one of a plurality of examinees to one of the plurality of assessment
items, receiving a plurality of mastery estimates, wherein each mastery estimate represents
whether one of the plurality of examinees has mastered one of the plurality of attributes, determining whether an examinee falls into at least one of a plurality of examinee classes for
an assessment item based upon the mastery estimates for the examinee and the associations
for the assessment item, generating one or more statistics, wherein each statistic is based on one or more of the associations, responses and mastery estimates, and outputting at least one of the one or more statistics. The one or more statistics may include a percentage of correct
answers for an assessment item for each of the plurality of examinee classes associated with the assessment item and/or a percentage of correct answers for the assessment examination
for each of the plurality of examinee classes.
[0016] The plurality of examinee classes may include a master class associated with an assessment item. The master class includes all examinees having a mastery estimate for
each attribute associated with the assessment item. The plurality of examinee classes may
further include a high non-master class associated with an assessment item, and a low non-
master class associated with the assessment item. The high non-master class includes all
examinees having a mastery estimate for at least one-half of all attributes, but not all
attributes, associated with the assessment item. The low non-master class comprises all
examinees having a mastery estimate for less than one-half of all attributes associated with
the assessment item. The one or more statistics may include a difference between a
percentage of correct answers for the assessment examination for the master class for an
assessment item and the percentage of correct answers for the assessment examination for the
low non-master class for the assessment item, a difference between a percentage of correct
answers for the assessment examination for the master classes for all assessment items and
the percentage of correct answers for the assessment examination for the low non-master
classes for all assessment items, a difference between a percentage of correct answers for the
assessment examination for the master class for an assessment item and the percentage of
correct answers for the assessment examination for the high non-master class for the
assessment item, and/or a difference between a percentage of correct answers for the
assessment examination for the master classes for all assessment items and the percentage of correct answers for the assessment examination for the high non-master classes for all
assessment items. [0017] The plurality of examinee classes may further include a non-master class.
The non-master class includes all examinees not having a mastery estimate for at least one
attribute associated with an assessment item. The one or more statistics include a difference
between a percentage of correct answers for the assessment examination for the master class for an assessment item and the percentage of correct answers for the assessment examination
for the non-master class for the assessment item and/or a difference between a percentage of
correct answers for the assessment examination for the master classes for all assessment items and the percentage of correct answers for the assessment examination for the non-master
classes for all assessment items.
[0018] In an embodiment, a method for evaluating the fit of raw data to model data
includes receiving a plurality of associations, wherein each association pertains to a
relationship between one of a plurality of assessment items for an assessment examination
and one of a plurality of attributes, receiving a plurality of responses, wherein each response
pertains to an answer by one of a plurality of examinees to one of the plurality of assessment
items, receiving a plurality of mastery estimates, wherein each mastery estimate represents
whether one of the plurality of examinees has mastered one of the plurality of attributes,
receiving one or more parameters based on an expected assessment item performance,
determining whether an item for an examinee falls into at least one of a plurality of item
classes based upon the mastery estimates for the examinee and the associations for the
assessment item, generating one or more statistics, wherein each statistic is based on one or
more of the associations, responses, mastery estimates and parameters, and outputting at least
one of the one or more statistics. In an embodiment, the method may further include
receiving one or more allowability limits. Each allowability limit defines a threshold number
of items for each of a plurality of item classes required for an examinee to be an allowable examinee. In an embodiment, the determining step may include defining a plurality of item types, determining a list of allowable examinees for each item class based on the allowability
limit and, for each allowable examinee for an item class, computing a proportion of items in the item class that the allowable examinee answered correctly. In an embodiment, the
determining step further includes computing the average proportion of items for all allowable
examinees for each item class. In an embodiment, the determining step further includes
performing a binomial hypothesis test on each allowable examinee, and determining whether
the allowable examinee meets the criterion level for the item class.
[0019] The one or more statistics may includesone or more results of the binomial hypothesis tests, a list of allowable examinees that did not meet the criterion level for the item
class, a proportion correct for each allowable examinee, and/or an average proportion correct
for all allowable examinees for an item class. The plurality of item classes may include a
master class associated with an examinee. The master class comprises all items for which an
examinee has a mastery estimate for each attribute associated with an assessment item. In an
embodiment, the plurality of item classes may further include a high non-master class
associated with an examinee, and a low non-master class associated with an examinee. The
high non-master class comprises all items for which an examinee has a mastery estimate for
at least one-half of all attributes, but not all attributes, associated with an assessment item.
The low non-master class comprises all items for which an examinee has a mastery estimate
for less than one-half of all attributes associated with an assessment item. In an embodiment, the plurality of item classes further comprises a non-master class. The non-master class
comprises all items for which an examinee does not have a mastery estimate for at least one
attribute associated with an assessment item.
[0020] In an embodiment, a method for evaluating the fit of raw data to model data
including receiving a plurality of associations, wherein each association pertains to a
relationship between one of a plurality of assessment items for an assessment examination and one of a plurality of attributes, receiving a plurality of responses, wherein each response
pertains to an answer by one of a plurality of examinees to one of the plurality of assessment
items, receiving a plurality of mastery estimates, wherein each mastery estimate represents
whether one of the plurality of examinees has mastered one of the plurality of attributes,
receiving a plurality of proficiency estimates for non-tested attributes for each examinee,
receiving a plurality of item parameter estimates for each item, generating one or more
statistics, wherein each of the statistics is based on one or more of the associations, responses,
mastery probabilities, proficiency estimates, and item probabilities, and outputting at least one
of the one or more statistics. The plurality of item parameter estimates may include receiving
a plurality of first probabilities, wherein each first probability is a measure of a likelihood that
an examinee that has mastered the attributes pertaining to an assessment item will answer the
assessment item correctly, receiving a plurality of second probabilities, wherein each second
probability is a measure of a likelihood that an examinee that has not mastered an attribute pertaining to an assessment item will answer the assessment item correctly, and receiving a
plurality of weights, wherein each weight is a measure of the relevance of the plurality of
proficiency estimates for an examinee. The one or more statistics may include, for each examinee and each item, determining a probability that the examinee answered the item
correctly based on the associations, mastery estimates, proficiency estimates and item
parameter estimates. The one or more statistics may further include, for each examinee, one
or more of an observed score computed by summing the responses by the examinee, a
predicted score computed by summing the probabilities associated with the examinee, and a residual score computed by subtracting the observed score from the predicted score. The one
or more statistics may further include, for each item, one or more of an observed score
computed by summing the responses to the item, a predicted score computed by summing the probabilities associated with the item, and a residual score computed by subtracting the observed score from the predicted score.
[0021] In an embodiment, a method for evaluating the fit of raw data to model data
may include receiving a plurality of associations, wherein each association pertains to a
relationship between one of a plurality of assessment items for an assessment examination
and one of a plurality of attributes, receiving a plurality of responses, wherein each response
pertains to an answer by one of a plurality of examinees to one of the plurality of assessment items, receiving a plurality of mastery estimates files, wherein each mastery estimate files
contains a set of mastery estimates, wherein each mastery estimate represents whether one of
the plurality of examinees has mastered one of the plurality of attributes, receiving a plurality of mastery parameters, determining whether each examinee falls into at least one of a
plurality of examinee classes for an assessment item based upon the mastery probabilities for
the examinee and the associations for the assessment item, optimizing a plurality of mastery
thresholds, generating one or more statistics, wherein each statistic is based on one or more of
the associations, responses, mastery probabilities and mastery parameters, wherein the one or
more statistics comprise one or more mastery thresholds, and outputting at least one of the
one or more statistics. In an embodiment, optimizing a plurality of mastery thresholds may
include, for each examinee class, performing an algorithm using the associations, responses,
mastery estimates and mastery parameters to obtain a result, calculating maximization criteria
based on the results for each examine class, applying the maximization criteria to the mastery thresholds, and repeating the performing, calculating and applying steps until the mastery
thresholds converge within a threshold range. The plurality of examinee classes may include
a master class, wherein the master class comprises all examinees having a mastery state for each attribute associated with an assessment item, a high non-master class, wherein the high
non-master class comprises all examinees having a mastery state for at least one-half of all attributes, but not all attributes, associated with an assessment item, and a low non-master
class, wherein the low non-master class comprises all examinees having a mastery state for
less than one-half of all attributes associated with an assessment item.
[0022] In an embodiment, the maximization criteria comprises maximizing the average of the difference between the average proportion correct for the master class and the
average proportion correct for the high non-master class and the difference between the
average proportion correct for the master class and the average proportion correct for the low
non-master class. The one or more statistics may include one or more of the maximization
criteria for a group of mastery thresholds, the average proportion correct for the master class,
the average proportion correct for the high non-master class, and the average proportion
correct for the low non-master class.
[0023] In an embodiment, a system for evaluating the fit of raw data to model data
includes a processor and a computer-readable storage medium operably connected to the
processor for performing one or more of the methods described above.
BRIEF DESCRIPTION OF THE DRAWINGS
[0024] Aspects, features, benefits and advantages of the embodiments of the present invention will be apparent with regard to the following description, appended claims and
accompanying drawings where:
[0025] FIG. 1 depicts an exemplary probabilistic structure for an item response
according to an embodiment of the present invention.
[0026] FIG. 2 depicts an exemplary hierarchical Bayesian model for the examinee parameters according to an embodiment of the present invention.
[0027] FIG. 3 depicts an exemplary hierarchical Bayesian model according to an
embodiment of the present invention. [0028] FIG. 4 depicts a flow chart for an exemplary method for generating examinee
statistics according to an embodiment of the present invention.
[0029] FIG. 5 depicts a flow chart for an exemplary method for generating
assessment item statistics according to an embodiment of the present invention.
[0030] FIG. 6 depicts a flow chart for an exemplary method for generating coordinated examinee and assessment item statistics according to an embodiment of the
present invention.
[0031] FIG. 7 depicts a flow chart for an exemplary method for optimizing mastery
threshold parameters according to an embodiment of the present invention.
[0032] FIG. 8 depicts a flow chart of the interrelationship between the exemplary
methods described in FIGs. 4-7.
DETAILED DESCRIPTION [0033] The present invention relates to a method and system for evaluating
assessment items, assessment examinations and examinees using statistically identifiable
parameters.
[0034] Item response functions may be used to determine the probability that an
examinee correctly answers a particular assessment item based on the examinee's mastery of
one or more attributes evaluated with respect to an assessment examination. The item
response function of the present invention further models whether an examinee has mastered
non-evaluated attributes that are relevant to correctly answering a particular assessment item.
In particular, an examinee that has mastered more evaluated attributes may be expected to have a higher likelihood of having mastered non-evaluated attributes than an examinee that
has mastered fewer evaluated attributes. This is particularly the case for a cognitive test where the evaluated abilities are dimensionally close to one another (i.e., the attributes are not
statistically independent of each other).
[0035] In an embodiment, the item response function may be as follows:
Figure imgf000014_0001
αj = the cognitive attributes k specified for examinee/. θj - a unidimensional projection of examinee/s ability to perform skills outside of the Q matrix attributes;
M π = FT π k - the baseline probability that an examinee that has mastered all k=\ attributes for a particular item ' will correctly apply all the attributes when solving item ;
. r P(Y = l \ a .k = 0) rlk = — — = = the baseline probability that an examinee lacking an
** p(γuk = 1 1 aJk = 1) attribute k that is required for a particular item / will correctly answer item ;
Y≠ = when examinee/ correctly applies attribute k to item ; and
c, = the amount the item response function relies on θ , after accounting for the attributes in the Q matrix (0< c, < 3).
[0036] In an embodiment, the present invention uses a Bayesian model to estimate
unknown parameters. A Bayesian model may permit flexibility in parameter relationships and simplify estimation procedures. In addition, the Bayesian model may use hierarchical
relationships and correlations between attributes. While conventional Bayesian networks do
not allow probabilistic relationships to restrict conditional probabilities, the probability
structure of interest in the present invention may be combined with relationships between the
attributes by using hierarchical Bayesian modeling.
[0037] In constructing a Bayesian model, the prior distributions and their
parameters, as well as the prior distributions of the parameters, are constructed so that the
estimated values for the unknown variables are determined predominantly by the evaluation data and not by the prior distributions. A prior distribution (or "prior") refers to information
available before (and in addition to) information coming from the collected data.
[0038] FIG. 1 depicts an exemplary probabilistic structure for an item response
according to an embodiment of the present invention. As shown in FIG. 1, the item response
for an item i performed by an examinee/ is dependent on examinee parameters a (examinee
/'s attributes) and θ} (examinee/'s residual ability), and item parameters π* (item difficulty
for item /), rk' (attribute discrimination for attributes 1...K for item ), and c, (item difficulty
not associated with attributes 1...K). To construct a Bayesian model from the probabilistic
structure in FIG. 1 , Bayesian priors for the parameters are added to the model, which define relationships between variables. In addition, non-informative priors may be constructed for
the unknown relationships or distributions so that the data reveals detailed aspects of the
relationships between the variables.
[0039] ajk for k = 1 , ... , K may have standard normal priors that are used to
generate dichotomous aj. k, k = \, ..., K. Likewise, 0, may have a standard normal prior. Since
the examinee attributes have a positive correlation, (α ,θ } )~ N(0, Σ) where Σ = {p tkiS has
1 's on the diagonal for the marginal variances of the attributes and the non-negative pairwise
correlations between (α , θf) as the off-diagonal elements. Correlations between attributes
may be defined as hyperparameters. The correlations may have a uniform prior over some
interval pkU2 ~Unif(α, b), where 0 < a < b < 1. The hyperparameter Kk may be used as a
'cutoff for mastery of attribute k. When a}k > Kk, aj. k = l, the examinee may be considered to
have mastered the attribute. FIG. 2 depicts an exemplary hierarchical Bayesian model for the
examinee parameters according to an embodiment of the present invention. [0040] A Beta distribution with a unit interval may be used to represent each of the
item parameters
Figure imgf000016_0001
and — (c, e (0, 3)) . Each Beta distribution may use two parameters,
a and b, which define the shape of the distribution. The mean of the Beta distributions is
μ = and the variance is σ2 = . The following priors may be assigned: π] ~ β a+b l + a+b
, c
(aπ, bπ), rlk ~ β (ar, br) and — ~ β (ac, bc). The hyperparameters are given Uniform priors
over a restricted range to permit flexibility in the shape of the distribution. The {a, b)
parameters of each Beta distribution may be re-formulated into the mean and an inverse
measure of the spread s = a + b (for constant μ, (f2 = — — is inversely proportional to
\ + s
(l+s)). FIG. 3 depicts an exemplary hierarchical Bayesian model according to an
embodiment of the present invention and incorporating each of the item and examinee
parameters, hyperparameters and priors.
[0041] The use of complex Bayesian models with many parameters has become a
reasonable foundation for practical statistical inference because of the Markov Chain Monte
Carlo (MCMC) simulation-based computational approach. MCMC statistically analyzes data
sets produced by Bayesian models by bypassing the computation of complicated posterior
distributions of the parameters. A Markov Chain Monte Carlo (MCMC) approach to data
analysis may be used to perform cognitive diagnosis of the Bayesian model described above.
The MCMC approach may also permit flexibility in inferring stochastic relationships among
the model parameters and in easily incorporating additional parameters into the model. In an
embodiment, the MCMC algorithm may be implemented by the Metropolis-Hastings within
Gibbs sampler algorithm. The Metropolis-Hastings within Gibbs sampler algorithm is one of numerous variations of the MCMC approach. Accordingly, the invention disclosed herein is not limited to this algorithm, but is meant to encompass all MCMC algorithms.
[0042] In the case of MCMC, the simulated random numbers of the Markov Chain
are probabilistically dependent. The MCMC simulation may produce a "chain" of random
numbers whose steady state probability distribution is the desired posterior distribution. If the
chain is run for a sufficient number of iterations, the observed distribution of its simulated
random numbers may be approximately equal to the required posterior distribution.
[0043] MCMC may simulate a Markov Chain in a parameter space S of the
statistical model that converges to p( 9 \ data), the posterior distribution of & given the data.
From the Bayesian perspective, all statistical inference depends upon the estimation of the
posterior distribution ?^ | data). In a MCMC-based statistical analysis, a Markov model
having a stationary distribution equal to the posterior distribution p(& \ data) may be created
and simulated. Estimating the posterior distribution entails performing a series of draws of S
from a particular probability distribution where #' represents draw t of the Markov Chain. A
sufficient number of draws may be required to accurately estimate the stationary distribution.
[0044] The Metropolis-Hastings algorithm is designed to estimate the stationary
distribution using a transition distribution for the current state #' depending on the previous
state &'~ . At each step t, the Metropolis-Hastings algorithm stochastically decides whether
to remain at the previous state or move to a new state. The choice of the candidate new state
3* is drawn from the user-defined probability distribution J, (9 , &'~] ) , representing the
probability of drawing 9 as the candidate parameter given that 9_'~ was the last step of the
Markov Chain. [0045] In an embodiment, the Metropolis-Hastings algorithm may perform the
following steps:
1) Sample 9' from N(,9 , Σ) , for a suitable covariance matrix Σ;
P(9' data) I J, (9* \ 9'~l ) P(data \ 9_' )P(9') I J, (9* \ 9"x )
2) Calculate r =
P(9"1 \data) I J, (9"1 1 9 ) P(data 1 9'" )R(i?'"1 ) / J, (9" \ 9 )
and
Figure imgf000018_0001
[0046] The Bayesian model, such as the one depicted in FIG. 3, supplies the
likelihood P(data \ 9) and the prior distribution P(9) . For item response theory
applications, J, (9* \ 9_'~l) = J, (9_'~ \ 9 ) (i.e., J, (9 \ 9'~ ) is chosen to be symmetric).
[0047] The Gibbs sampler, also known as "alternating conditional sampling," may
simplify the required Metropolis-Hastings likelihood calculations by partitioning
9 = (i j ,..., 9d ) into lower dimensional sub-vectors 9] . The MCMC algorithm may then be
performed for each sub-vector, where the likelihood for is determined by using the
conditional distribution of 9', given all the other components of 9 : P(9 \ 9'~, ,data) , where
91',1 = (9[ ,.••> <?)_, , «9j~j ,-~, 9j ] ) . Thus, the Metropolis-Hastings algorithm within the Gibbs
sampler may partition 9 into item parameters and examinee parameters such that the item
parameters depend only on the examinee parameters and the examinee parameters depend
only on the item parameters. Accordingly, the algorithm for a tth step of a Markov Chain may
be computed using the following steps:
1) For each item , sample (9, \ 9[~ j , data) using the Metropolis-Hastings ratio r as
follows:
Figure imgf000019_0001
Figure imgf000019_0002
2) For each examinee/, sample (#y | 9 l 'X-l , , data) using the Metropolis-Hastings ratio r
as follows:
Figure imgf000019_0003
Figure imgf000019_0004
'
[0048] Applying the Metropolis-Hastings algorithm with Gibbs sampling to the present invention results in the following steps:
1) Sample the hyperparameters ( μ = (μπ' , μr' , μ ); s' = (sπ' , sr' , sc' ) ;
2) For each attribute k=\ , ..., K, sample the cutpoints ( κk' );
3) Sample the correlations ((pk'l k2 , k\ ≠ k2 e (l,...,M) ;
4) For each item /', sample the item parameters ((π' )' , (rk * )' , k = 1,..., K) ;
5) For each item /', sample the item parameter (c\ ) ; and
6) For each examinee/, sample the examinee parameters (a'jj' ) ■
[0049] Sampling the hyperparameters (step 1) may include drawing the candidate
parameters {μ , s } from a Normal distribution with mean
Figure imgf000019_0005
the parameters from
the previous step in the Markov Chain, and a variance of 0.1. Since only the item parameters depend on the hyperparameters, the likelihoods of data and all other model parameters given the two candidate vectors, P(data,9ems',9exammees,9correlallons,9 \ {μ',s'}) and
P(data,9llems,9aammees,9correlatlons,9
Figure imgf000020_0001
and
P(9ltems I {μ'~l ,s'~ }) , respectively. In an embodiment, the likelihoods are calculated by
using C^jl^ -'.Cr'y-'.Cc'y-^giventhat (π, -~β(μκ.,sχ.), (r])'-l~β(μr. ,S ),
and c~x~β(μc, sc).
[0050] Each mastery cutpoint Kk (step 2) may be dependent only on the dichotomous aj. , for each examinee/ = 1, ..., J. Each candidate parameter may be drawn
κk ~ N(κk'~l ,0.03) , where the distribution is truncated
Figure imgf000020_0002
\ < a , for appropriately chosen
values a > 0. For example, a = 0.52 may correspond to truncating Kk at the thirtieth and
seventieth percentile of the standard normal distribution. Since α,* is linked to a jk via Kk (aj.k
= 1 if a jk > Kk, }k = 0, otherwise), the examinee candidate parameters a k = {al * k ,..., alk } are
defined by the candidate parameter κ
Figure imgf000020_0003
The calculation of the
likelihoods may use the following model-based relationship: | α,,
Figure imgf000020_0004
θ ΛπYΛrYΛcY).
[0051] The correlations p k2 (step 3) may have a uniform prior on (0, 1). The
necessary likelihoods in the Metropolis-Hastings algorithm require only the examinee
parameters (a ,θ'~) and may be calculated by assuming ( ,θ'~l) are normally
distributed with the correlations pkm- Since the prior is uniform and the candidate
distribution is symmetric, the only components of the Metropolis-Hastings ratio r are the likelihoods. [0052] Each collection of correlations {pk' k2 ,k[ ≠ k2' e (\,..., M)} must form a
positive definite matrix. This condition must be satisfied at each step of the MCMC.
However, if Σ is a positive definite correlation matrix and Σ* differs from Σ by only one off- diagonal matrix entry, a positive determinant of Σ is a necessary and sufficient condition for
Σ to be a positive definite matrix. Moreover, the determinant of Σ* is a quadratic function of
the correlation element that is changed. Thus, from the zeroes of this quadratic function, the range of the particular correlation element that satisfies the positive definite criterion may be
determined.
[0053] An initial positive definite correlation matrix ∑mxm may be used, where m is
one greater than the number of attributes (i.e., the dimension of (a, θ)). The matrix may be
formed by generating an w x m matrix whose vectors lie on the positive surface of a unit
sphere, A. Then, AAT is a positive definite matrix whose diagonal elements are one and off-
diagonal elements are positive numbers less than one.
[0054] Using this initial Σ as the first step in the Markov Chain, each candidate
pk * lk2 is drawn uniformly from the allowed correlations, satisfying both positive definiteness
and the condition that they be non-negative. A Gibbs step is taken for each puk2 using pk * lk2
as its candidate parameter. The likelihoods for this step are only based on the relationship
between the current examinee parameters and the estimated correlational structure:
P ΘJ , Q J I ∑) • The diagonal elements of Σ equal one and the off-diagonal elements equal
Pklk2-
[0055] Although the Gibbs step for c is separate from the Gibbs step for π and r (steps 4 and 5, respectively), the structures for the two Gibbs steps may be nearly identical.
For each item /', the candidate parameters are drawn: (π' )* ~ N( (π* )' - 1 , σ . ), (r' )'~ N( (r*)' - 1 , σ f. ), c ~ N(c - 1 , σc). σ π. , σ r. , and σc are determined experimentally by
monitoring the time series plots of the Markov Chain. Since the candidate distribution is
symmetric, the J, term is cancelled out of the Metropolis-Hastings ratio r.
[0056] The likelihoods are calculated by computing R(item i responses | π* , r* ,
c'~ ,
Figure imgf000022_0001
= 1,...,«)) for the Gibbs step involving π* and r* , and (item i responses |
c„ (θj, α/)'"1,/ = 1,...,«)) for the Gibbs step involving c,. The priors are computed
using the Beta distributions with the hyperparameters from step t.
*
[0057] The candidate vector of examinee parameters ( ,θ*) in Step 6 are drawn
from N(0, Σ), where the off-diagonal elements of Σ are p[λk2. The elements of a are then * defined by the relationship between {a * } and the cutpoint Kk, for each attribute k.
Likelihoods are computed using P(item i responses | θ}, a}, π* r' c , i = 1, ..., T).
[0058] Thus, the ratio r used in the computation of the Metropolis-Hastings
algorithm reduces as follows:
Figure imgf000022_0002
P((°L ,a" ,θ') \ data)/ P(a ,a ,θ') P((a"] , a'"' , θ'~ ) \ data) I P(a ~l , " ,θ'~ )
P(data \ (a ,a ,θ'))P(a , ,θ*) l P(a ,a ,θ') P(data
Figure imgf000022_0003
,θ'-χ)l
Figure imgf000022_0004
,θ'~x)
P(data \ (a ,a ,θ'))
Figure imgf000022_0005
[0059] Software embodying the MCMC may be used to provide estimates of the
examinee mastery parameters, estimates of the item parameters of the reparameterized unified
model, and estimates of the Bayesian hyperparameters having non-fixed values. This software is referred to herein as the Arpeggio software application, although additional or
other software may be used to perform similar functions. The methodology may produce multiple Markov Chain runs with imputed values of various parameters and summary files
that provide information on parameter estimates for the various assessment items and
assessment examinees analyzed. In addition, the Arpeggio " software may run MCMC
simulations and provide summaries of the operation of these chain simulations.
[0060] Several companion software methodologies utilizing the output data received
from the Arpeggio software to optimize the fit between the model produced by the software
and the underlying data, are also described. For example, a software application may provide
a statistical analysis of the mastery of one or more assessment items found on an assessment
examination. This software application is referred to herein as the IMStats software application
[0061] FIG. 4 depicts a flow chart of the operation of the IMStats software
application. The IMStats software application may receive the Q matrix {q,k}, the response
matrix, XtJ, and the mastery estimate matrix, a}k. For each item i = 1, ..., /, examinees may be grouped into mastery classes. For example, four types of mastery classes may be created for
each item: 1) "masters of item /'" (i.e., those examinees that have mastered all attributed
required by item /'), 2) "non-masters of item " (i.e., those examinees that have not mastered
all attributes required by item ), 3) "high non-masters of item /" (i.e., those examinees that
have mastered at least one-half of all attributes, but not all attributes, required by an item /'),
and 4) "low non-masters of item /'" (i.e., those examinees that have mastered less than one-
half of all attributes required by an item ). Non-masters of item / are a superset containing all
high non-masters of item and all low non-masters of item . In an embodiment, the EMStats software application may not compute one or more of the classes. [0062] After the examinees are assigned to each class for an item, the proportion of
examinees in each class that answered the item correctly is determined. These proportions
are then averaged over all items under consideration for each type of class (e.g., the average
proportion for "masters of item " for all = 1 , ... , I). In an embodiment, the resulting output
may include the averages for each type of class, the difference between the average proportion
for all "masters of item " and the average proportion for all "non-masters of item ," and the
difference between the average proportion for all "masters of item z" and the average
proportion for all "low non-masters of item ." In an embodiment, the difference between the average proportion for all "masters of item " and the average proportion for all "high non-
masters of item /'" may be outputted. In an embodiment, the proportions may be outputted for
one or more items. In an embodiment, the difference between the proportion for "masters of
item /'" and the proportion for "non-masters of item /'," the proportion for "high non-masters
of item /'," and/or the proportion for "low non-masters of item " for a particular item may be
outputted.
[0063] In an embodiment, the present invention may include a software application
that statistically analyzes the mastery of allowable examinees on a particular assessment examination. This software is referred to herein as the EMStats software application.
[0064] FIG. 5 depicts a flow chart of the operation of the EMStats software
application. The EMStats software application may receive the Q matrix {q,k}, the response
matrix, X, the mastery estimate matrix, }k, allowability limits for each mastery class defined
with respect of the IMSTATS software application {aM, a M, aNMH, aNML.}, and criterion limits
for each mastery class {CM, CNM, CNMH, CNML}-
[0065] The allowability limits may be equal to the minimum number of responses to
assessment items required for an examinee to be counted as a member of a corresponding
mastery class. For example, if
Figure imgf000024_0001
- 14 items an examinee that has mastered 12 items would not be included when generating the outputs. In contrast, an examinee that has mastered 15
items would be included when generating the outputs. The criterion levels may be equal to
the proportion correct criteria required to perform a binomial hypothesis test with confidence
= 0.05 for each of the mastery classes.
[0066] For each examinee/ = 1, ..., J, items may be grouped into item classes. For
example, four item classes may be created for each examinee: 1) "mastered items" (i.e., those items for which the examinee has mastered all attributes), 2) "non-mastered items" (i.e., those
items for which the examinee has not mastered all attributes), 3) "high non-mastered items"
(i.e., those items for which the examinee has mastered at least one-half, but not all,
attributes), and 4) "low non-mastered items" (i.e., those items for which the examinee has
mastered less than one-half of all attributes). Non-mastered items are a superset containing
high non-mastered items and low non-mastered items. In an embodiment, the EMStats
software application may not compute one or more of the classes.
[0067] After the items are assigned to each class for an examinee, a group of allowable examinees may be determined using the allowability limits for each item class. For
each allowable examinee n = \, ..., Nand for each of the item classes, the proportion of items that the allowable examinee answered correctly out of the set of items may be computed. The
average proportion of items answered correctly by each examinee may then be computed for
each item class. A hypothesis test may be performed on each examinee in each allowable
examinee group. A list of the examinees in each allowable examinee group who were
rejected by the hypothesis test may be generated for each allowable examinee group. The
EMStats software application may output one or more of the average proportion of items
answered correctly for each item class, the number of items answered correctly by each
allowable examinee, the lists of examinees that were rejected by the hypothesis test, and/or any other statistics generated by the EMStats software application. [0068] In an embodiment, the present invention may include a software application
that utilizes a reparameterized unified model in computing expected examinee performance
across assessment items as a function of an examinee's predicted skill set. This software application is referred to herein as the FusionStats software application.
[0069] FIG. 6 depicts a flow chart of the operation of the FusionStats software
application. The EMStats software application may receive the Q matrix {q,k}, the response
matrix, Xy, the mastery estimate matrix, aj. , estimates for the attributes not included in the Q matrix for each examinee, θ and item parameter estimates {π , r , c).
[0070] For each examinee/ = 1 , ... , J, for each item = 1 , ..., /, the probability that
an examinee/ answered item correctly may be computed using the Q matrix, the mastery
estimate matrix, the estimates for the attributes not included in the Q matrix, and the item
parameter estimates. These values may be entered into an / x J matrix where each entry P,j
represents the probability that examinee/ answered item / correctly. For each examinee/, summing the entries in P,j corresponding to the examinee may produce a predicted examinee
score. For each item /, summing the entries in P,j corresponding to the item may produce a
predicted item score. Similarly, summing the rows and columns of Xυ may produce observed
examinee scores and observed item scores. A residual examinee score may then be computed
by subtracting the observed examinee score from the predicted examinee score. A residual
item score may be computed by subtracting the observed item score from the predicted item
score. In an embodiment, the residual score may be the absolute value of the differences
computed above.
[0071] The predicted, observed, and residual scores may be outputted for each
examinee and item. Other outputs may include average residual scores for all examinees and items. Such measurements may be used to evaluate the efficacy of the parameters used. [0072] In an embodiment, the present invention may include a software application
that evaluates the effectiveness of the fit of the fusion model to the data. The software
application is referred to herein as the GAMEStats software application. The GAMEStats
software application may be used to create optimal mastery settings that identify the level of skill required for achieving the mastery level in the tested underlying latent space. Deriving
these optimal settings from the data may require optimizing the fit between the model and the
underlying data. The GAMEStats software may utilize a genetic algorithm to determine how to optimize these settings and the fit between the model and the data. The genetic algorithm
may maximize the equation )^ = average (pm -pnmh,pm -Pnmi) to calculate the average of the difference between masters and high non-masters and the difference between masters and low
non-masters. In this notation, pm corresponds to the average proportion correct for item
masters, pnmh corresponds to the average proportion correct for item high non-masters, and pnmι corresponds to the average proportion correct for item low non-masters.
[0073] FIG. 7 is a flow chart of the GAMEStats software application. As shown in
FIG. 7, the GAMEStats software application may receive the Q-matrix, an item response data
matrix and various mastery estimates and parameters for optimizing the mastery settings as
inputs. These inputs may be iteratively processed and a calculation of the maximization
criteria for each mastery setting may be performed. These operations are iteratively performed until the maximization criteria converge within an acceptable range. The resulting
output may include, for example, the top 100 mastery settings with a value for the
maximization criteria for each setting and values for the constraint criteria for each setting.
[0074] FIG. 8 depicts a flow chart of the operation of the above-defined software
applications with the classifier, such as Arpeggio™. Inputs to the classifier may include one
or more of the Q matrix { /*}, the response matrix, Xy, and any run-time parameters specific to the assessment examination. The operations of the classifier may then be performed. In an embodiment, the output of the classifier may include item parameters {π , r*, c); the mastery
estimates, α,*, a correlation matrix between attributes {pktk2}, and a proportion of the number
of masters for each attribute with respect to the number of examinees {pk}- If the proportions of the number of masters for each attribute with respect to the number of examinees do not
converge, the GAMEStats software application may be performed to perform converge the
proportions. Otherwise or upon completion of GAMEStats, the operations of one or more of
IMStats, EMStats, and FusionStats may be performed. Upon completion a summary fit report
may be generated denoting the fit of the raw data to the model data.
[0075] While the present invention has been described in conjunction with particular
applications as outlined above, it is evident that many alternatives, modifications and
variations will be apparent to one of ordinary skill in the art. Accordingly, the particular
applications of this invention as set forth above are intended to be illustrative, not limiting.
Modifications or changes may be made without departing from the spirit or scope of the
invention, or may become obvious to one skilled in the art after review of the present
invention. Such modifications or changes are intended to be included within the scope of this
present application.

Claims

What is claimed is:
1. A method for evaluating the fit of raw data to model data, the method
comprising:
receiving a plurality of associations, wherein each association pertains to a
relationship between one of a plurality of assessment items for an assessment examination
and one of a plurality of attributes; receiving a plurality of responses, wherein each response pertains to an answer
by one of a plurality of examinees to one of the plurality of assessment items;
receiving a plurality of mastery estimates, wherein each mastery estimate
represents whether one of the plurality of examinees has mastered one of the plurality of
attributes;
determining whether an examinee falls into at least one of a plurality of
examinee classes for an assessment item based upon the mastery estimates for the examinee
and the associations for the assessment item;
generating one or more statistics, wherein each statistic is based on one or
more of the associations, responses and mastery estimates; and outputting at least one of the one or more statistics.
2. The method of claim 1 wherein the one or more statistics comprise a
percentage of correct answers for an assessment item for each of the plurality of examinee
classes associated with the assessment item.
3. The method of claim 1 wherein the one or more statistics comprise a percentage of correct answers for the assessment examination for each of the plurality of
examinee classes.
4. The method of claim 1 wherein the plurality of examinee classes comprises a
master class associated with an assessment item, wherein the master class comprises all
examinees having a mastery estimate for each attribute associated with the assessment item.
5. The method of claim 4 wherein the plurality of examinee classes further
comprises: a high non-master class associated with an assessment item; and
a low non-master class associated with the assessment item,
wherein the high non-master class comprises all examinees having a mastery
estimate for at least one-half of all attributes, but not all attributes, associated with the
assessment item, wherein the low non-master class comprises all examinees having a mastery
estimate for less than one-half of all attributes associated with the assessment item.
6. The method of claim 5 wherein the one or more statistics comprise a
difference between a percentage of correct answers for the assessment examination for the
master class for an assessment item and the percentage of correct answers for the assessment
examination for the low non-master class for the assessment item.
7. The method of claim 5 wherein the one or more statistics comprise a difference between a percentage of correct answers for the assessment examination for the master classes for all assessment items and the percentage of correct answers for the
assessment examination for the low non-master classes for all assessment items.
8. The method of claim 5 wherein the one or more statistics comprise a
difference between a percentage of correct answers for the assessment examination for the
master class for an assessment item and the percentage of correct answers for the assessment examination for the high non-master class for the assessment item.
9. The method of claim 5 wherein the one or more statistics comprise a
difference between a percentage of correct answers for the assessment examination for the
master classes for all assessment items and the percentage of correct answers for the
assessment examination for the high non-master classes for all assessment items.
10. The method of claim 4 wherein the plurality of examinee classes further comprises a non-master class, wherein the non-master class comprises all examinees not
having a mastery estimate for at least one attribute associated with an assessment item.
11. The method of claim 10 wherein the one or more statistics comprise a
difference between a percentage of correct answers for the assessment examination for the
master class for an assessment item and the percentage of correct answers for the assessment
examination for the non-master class for the assessment item.
12. The method of claim 10 wherein the one or more statistics comprise a difference between a percentage of correct answers for the assessment examination for the master classes for all assessment items and the percentage of correct answers for the
assessment examination for the non-master classes for all assessment items.
13. A method for evaluating the fit of raw data to model data, the method comprising:
receiving a plurality of associations, wherein each association pertains to a
relationship between one of a plurality of assessment items for an assessment examination
and one of a plurality of attributes;
receiving a plurality of responses, wherein each response pertains to an answer
by one of a plurality of examinees to one of the plurality of assessment items;
receiving a plurality of mastery estimates, wherein each mastery estimate
represents whether one of the plurality of examinees has mastered one of the plurality of
attributes; receiving one or more parameters based on an expected assessment item
performance;
determining whether an item for an examinee falls into at least one of a
plurality of item classes based upon the mastery estimates for the examinee and the associations for the assessment item;
generating one or more statistics, wherein each statistic is based on one or
more of the associations, responses, mastery estimates and parameters; and
outputting at least one of the one or more statistics.
14. The method of claim 13, further comprising:
receiving one or more allowability limits, wherein each allowability limit
defines a threshold number of items for each of a plurality of item classes required for an
examinee to be an allowable examinee.
15. The method of claim 14 wherein the determining step comprises: defining a plurality of item types; determining a list of allowable examinees for each item class based on the
allowability limit; and for each allowable examinee for an item class, computing a proportion of
items in the item class that the allowable examinee answered correctly.
16. The method of claim 15 wherein the determining step further comprises:
computing the average proportion of items for all allowable examinees for
each item class.
17. The method of claim 15 wherein the determining step further comprises: performing a binomial hypothesis test on each allowable examinee; and
determining whether the allowable examinee meets the criterion level for the
item class.
18. The method of claim 17 wherein the one or more statistics comprise one or
more results of the binomial hypothesis tests.
19. The method of claim 17 wherein the one or more statistics comprise a list of allowable examinees that did not meet the criterion level for the item class.
20. The method of claim 15 wherein the one or more statistics comprise a
proportion correct for each allowable examinee.
21. The method of claim 15 wherein the one or more statistics comprise an
average proportion correct for all allowable examinees for an item class.
22. The method of claim 13 wherein the plurality of item classes comprises a
master class associated with an examinee, wherein the master class comprises all items for
which an examinee has a mastery estimate for each attribute associated with an assessment
item.
23. The method of claim 22 wherein the plurality of item classes further
comprises: a high non-master class associated with an examinee; and
a low non-master class associated with an examinee,
wherein the high non-master class comprises all items for which an examinee
has a mastery estimate for at least one-half of all attributes, but not all attributes, associated with an assessment item, wherein the low non-master class comprises all items for which an
examinee has a mastery estimate for less than one-half of all attributes associated with an assessment item.
24. The method of claim 22 wherein the plurality of item classes further comprises a non-master class, wherein the non-master class comprises all items for which an examinee
does not have a mastery estimate for at least one attribute associated with an assessment item.
25. A method for evaluating the fit of raw data to model data, the method
comprising: receiving a plurality of associations, wherein each association pertains to a
relationship between one of a plurality of assessment items for an assessment examination
and one of a plurality of attributes; receiving a plurality of responses, wherein each response pertains to an answer
by one of a plurality of examinees to one of the plurality of assessment items;
receiving a plurality of mastery estimates, wherein each mastery estimate
represents whether one of the plurality of examinees has mastered one of the plurality of
attributes; receiving a plurality of proficiency estimates for non-tested attributes for each
examinee;
receiving a plurality of item parameter estimates for each item;
generating one or more statistics, wherein each of the statistics is based on one
or more of the associations, responses, mastery probabilities, proficiency estimates, and item
probabilities; and outputting at least one of the one or more statistics.
26. The method of claim 25 wherein the plurality of item parameter estimates
comprises: receiving a plurality of first probabilities, wherein each first probability is a
measure of a likelihood that an examinee that has mastered the attributes pertaining to an assessment item will answer the assessment item correctly;
receiving a plurality of second probabilities, wherein each second probability
is a measure of a likelihood that an examinee that has not mastered an attribute pertaining to
an assessment item will answer the assessment item correctly; and
receiving a plurality of weights, wherein each weight is a measure of the
relevance of the plurality of proficiency estimates for an examinee.
27. The method of claim 25 wherein the one or more statistics comprise:
for each examinee and each item, determining a probability that the examinee
answered the item correctly based on the associations, mastery estimates, proficiency
estimates and item parameter estimates.
28. The method of claim 27 wherein the one or more statistics further comprise
one or more of the following, for each examinee:
an observed score computed by summing the responses by the examinee;
a predicted score computed by summing the probabilities associated with the
examinee; and
a residual score computed by subtracting the observed score from the predicted score.
29. The method of claim 27 wherein the one or more statistics further comprise one or more of the following, for each item:
an observed score computed by summing the responses to the item; a predicted score computed by summing the probabilities associated with the
item; and
a residual score computed by subtracting the observed score from the predicted
score.
30. A method for evaluating the fit of raw data to model data, the method
comprising: receiving a plurality of associations, wherein each association pertains to a relationship between one of a plurality of assessment items for an assessment examination
and one of a plurality of attributes;
receiving a plurality of responses, wherein each response pertains to an answer
by one of a plurality of examinees to one of the plurality of assessment items;
receiving a plurality of mastery estimates files, wherein each mastery estimate
files contains a set of mastery estimates, wherein each mastery estimate represents whether
one of the plurality of examinees has mastered one of the plurality of attributes;
receiving a plurality of mastery parameters;
determining whether each examinee falls into at least one of a plurality of
examinee classes for an assessment item based upon the mastery probabilities for the examinee and the associations for the assessment item; optimizing a plurality of mastery thresholds; generating one or more statistics, wherein each statistic is based on one or
more of the associations, responses, mastery probabilities and mastery parameters, wherein
the one or more statistics comprise one or more mastery thresholds; and outputting at least one of the one or more statistics.
31. The method of claim 30 wherein optimizing a plurality of mastery thresholds
comprises: for each examinee class, performing an algorithm using the associations,
responses, mastery estimates and mastery parameters to obtain a result;
calculating maximization criteria based on the results for each examine class;
applying the maximization criteria to the mastery thresholds; and
repeating the performing, calculating and applying steps until the mastery
thresholds converge within a threshold range.
32. The method of claim 31 wherein the plurality of examinee classes comprises: a master class, wherein the master class comprises all examinees having a
mastery state for each attribute associated with an assessment item;
a high non-master class, wherein the high non-master class comprises all
examinees having a mastery state for at least one-half of all attributes, but not all attributes, associated with an assessment item; and
a low non-master class, wherein the low non-master class comprises all
examinees having a mastery state for less than one-half of all attributes associated with an assessment item.
33. The method of claim 32 wherein the maximization criteria comprises maximizing the average of the difference between the average proportion correct for the
master class and the average proportion correct for the high non-master class and the difference between the average proportion correct for the master class and the average
proportion correct for the low non-master class.
34. The method of claim 33 wherein the one or more statistics comprise one or
more of the following:
a plurality of mastery settings;
maximization criteria corresponding to each of the mastery settings; and
mastery parameters corresponding to each of the mastery settings.
35. A system for evaluating the fit of raw data to model data, the system
comprising:
a processor;
a computer-readable storage medium operably connected to the processor,
wherein the computer-readable storage medium contains one or more programming
instructions for performing a method for evaluating the fit of raw data to model data, the
method comprising:
receiving a plurality of associations, wherein each association pertains
to a relationship between one of a plurality of assessment items for an assessment examination and one of a plurality of attributes,
receiving a plurality of responses, wherein each response pertains to an
answer by one of a plurality of examinees to one of the plurality of assessment items, receiving a plurality of mastery estimates, wherein each mastery estimate represents whether one of the plurality of examinees has mastered one of the
plurality of attributes,
determining whether an examinee falls into at least one of a plurality of
examinee classes for an assessment item based upon the mastery estimates for the
examinee and the associations for the assessment item,
generating one or more statistics, wherein each statistic is based on one
or more of the associations, responses and mastery estimates, and
outputting at least one of the one or more statistics.
36. The system of claim 35 wherein the one or more statistics comprise a
percentage of correct answers for an assessment item for each of the plurality of examinee
classes associated with the assessment item.
37. The system of claim 35 wherein the one or more statistics comprise a
percentage of correct answers for the assessment examination for each of the plurality of
examinee classes.
38. The system of claim 35 wherein the plurality of examinee classes comprises a
master class associated with an assessment item, wherein the master class comprises all
examinees having a mastery estimate for each attribute associated with the assessment item.
39. The system of claim 38 wherein the plurality of examinee classes further
comprises:
a high non-master class associated with an assessment item; and a low non-master class associated with the assessment item,
wherein the high non-master class comprises all examinees having a mastery
estimate for at least one-half of all attributes, but not all attributes, associated with the
assessment item, wherein the low non-master class comprises all examinees having a mastery
estimate for less than one-half of all attributes associated with the assessment item.
40. The system of claim 39 wherein the one or more statistics comprise a
difference between a percentage of correct answers for the assessment examination for the
master class for an assessment item and the percentage of correct answers for the assessment
examination for the low non-master class for the assessment item.
41. The system of claim 39 wherein the one or more statistics comprise a
difference between a percentage of correct answers for the assessment examination for the
master classes for all assessment items and the percentage of correct answers for the
assessment examination for the low non-master classes for all assessment items.
42. The system of claim 39 wherein the one or more statistics comprise a
difference between a percentage of correct answers for the assessment examination for the
master class for an assessment item and the percentage of correct answers for the assessment
examination for the high non-master class for the assessment item.
43. The system of claim 39 wherein the one or more statistics comprise a
difference between a percentage of correct answers for the assessment examination for the
master classes for all assessment items and the percentage of correct answers for the
assessment examination for the high non-master classes for all assessment items.
44. The system of claim 38 wherein the plurality of examinee classes further
comprises a non-master class, wherein the non-master class comprises all examinees not having a mastery estimate for at least one attribute associated with an assessment item.
45. The system of claim 44 wherein the one or more statistics comprise a
difference between a percentage of correct answers for the assessment examination for the
master class for an assessment item and the percentage of correct answers for the assessment
examination for the non-master class for the assessment item.
46. The system of claim 44 wherein the one or more statistics comprise a
difference between a percentage of correct answers for the assessment examination for the
master classes for all assessment items and the percentage of correct answers for the
assessment examination for the non-master classes for all assessment items.
47. A system for evaluating the fit of raw data to model data, the system
comprising:
a processor;
a computer-readable storage medium operably connected to the processor, wherein the computer-readable storage medium contains one or more programming instructions for performing a method for evaluating the fit of raw data to model data, the
method comprising:
receiving a plurality of associations, wherein each association pertains
to a relationship between one of a plurality of assessment items for an assessment
examination and one of a plurality of attributes,
receiving a plurality of responses, wherein each response pertains to an
answer by one of a plurality of examinees to one of the plurality of assessment items,
receiving a plurality of mastery estimates, wherein each mastery
estimate represents whether one of the plurality of examinees has mastered one of the plurality of attributes,
receiving one or more parameters based on an expected assessment
item performance,
determining whether an item for an examinee falls into at least one of a
plurality of item classes based upon the mastery estimates for the examinee and the associations for the assessment item,
generating one or more statistics, wherein each statistic is based on one
or more of the associations, responses, mastery estimates and parameters, and outputting at least one of the one or more statistics.
48. The system of claim 47, where the computer-readable storage medium further
contains one or more programming instructions for receiving one or more allowability limits,
wherein each allowability limit defines a threshold number of items for each of a plurality of
item classes required for an examinee to be an allowable examinee.
49. The system of claim 48 wherein the one or more programming instructions for
the determining step comprise:
defining a plurality of item types;
determining a list of allowable examinees for each item class based on the
allowability limit; and for each allowable examinee for an item class, computing a proportion of
items in the item class that the allowable examinee answered correctly.
50. The system of claim 49 wherein the one or more programming instructions for
the determining step further comprise:
computing the average proportion of items for all allowable examinees for
each item class.
51. The system of claim 49 wherein the one or more programming instructions for
the determining step further comprise:
performing a binomial hypothesis test on each allowable examinee; and
determining whether the allowable examinee meets the criterion level for the
item class.
52. The system of claim 51 wherein the one or more statistics comprise one or
more results of the binomial hypothesis tests.
53. The system of claim 51 wherein the one or more statistics comprise a list of allowable examinees that did not meet the criterion level for the item class.
54. The system of claim 51 wherein the one or more statistics comprise a proportion correct for each allowable examinee.
55. The system of claim 49 wherein the one or more statistics comprise an average proportion correct for all allowable examinees for an item class.
56. The system of claim 47 wherein the plurality of item classes comprises a master class associated with an examinee, wherein the master class comprises all items for
which an examinee has a mastery estimate for each attribute associated with an assessment item.
57. The system of claim 56 wherein the plurality of item classes further comprises:
a high non-master class associated with an examinee; and
a low non-master class associated with an examinee,
wherein the high non-master class comprises all items for which an examinee
has a mastery estimate for at least one-half of all attributes, but not all attributes, associated with an assessment item, wherein the low non-master class comprises all items for which an
examinee has a mastery estimate for less than one-half of all attributes associated with an assessment item.
58. The system of claim 56 wherein the plurality of item classes further comprises a non-master class, wherein the non-master class comprises all items for which an examinee
does not have a mastery estimate for at least one attribute associated with an assessment item.
59. A system for evaluating the fit of raw data to model data, the system comprising:
a processor;
a computer-readable storage medium operably connected to the processor,
wherein the computer-readable storage medium contains one or more programming instructions for performing a method for evaluating the fit of raw data to model data, the
method comprising: receiving a plurality of associations, wherein each association pertains
to a relationship between one of a plurality of assessment items for an assessment
examination and one of a plurality of attributes,
receiving a plurality of responses, wherein each response pertains to an
answer by one of a plurality of examinees to one of the plurality of assessment items,
receiving a plurality of mastery estimates, wherein each mastery
estimate represents whether one of the plurality of examinees has mastered one of the
plurality of attributes,
receiving a plurality of proficiency estimates for non-tested attributes for each examinee,
receiving a plurality of item parameter estimates for each item,
generating one or more statistics, wherein each of the statistics is based
on one or more of the associations, responses, mastery probabilities, proficiency
estimates, and item probabilities, and
outputting at least one of the one or more statistics.
60. The system of claim 59 wherein the plurality of item parameter estimates
comprises:
receiving a plurality of first probabilities, wherein each first probability is a
measure of a likelihood that an examinee that has mastered the attributes pertaining to an assessment item will answer the assessment item correctly;
receiving a plurality of second probabilities, wherein each second probability
is a measure of a likelihood that an examinee that has not mastered an attribute pertaining to
an assessment item will answer the assessment item correctly; and
receiving a plurality of weights, wherein each weight is a measure of the relevance of the plurality of proficiency estimates for an examinee.
61. The system of claim 59 wherein the one or more statistics comprise:
for each examinee and each item, determining a probability that the examinee
answered the item correctly based on the associations, mastery estimates, proficiency
estimates and item parameter estimates.
62. The system of claim 61 wherein the one or more statistics further comprise one
or more of the following, for each examinee:
an observed score computed by summing the responses by the examinee;
a predicted score computed by summing the probabilities associated with the examinee; and
a residual score computed by subtracting the observed score from the predicted score.
63. The system of claim 61 wherein the one or more statistics further comprise one
or more of the following, for each item:
an observed score computed by summing the responses to the item;
a predicted score computed by summing the probabilities associated with the item; and
a residual score computed by subtracting the observed score from the predicted
score.
64. A system for evaluating the fit of raw data to model data, the system
comprising:
a processor;
a computer-readable storage medium operably connected to the processor,
wherein the computer-readable storage medium contains one or more programming
instructions for performing a method for evaluating the fit of raw data to model data, the
method comprising: receiving a plurality of associations, wherein each association pertains
to a relationship between one of a plurality of assessment items for an assessment
examination and one of a plurality of attributes,
receiving a plurality of responses, wherein each response pertains to an
answer by one of a plurality of examinees to one of the plurality of assessment items,
receiving a plurality of mastery estimates files, wherein each mastery
estimate files contains a set of mastery estimates, wherein each mastery estimate
represents whether one of the plurality of examinees has mastered one of the plurality
of attributes, receiving a plurality of mastery parameters, determining whether each examinee falls into at least one of a plurality
of examinee classes for an assessment item based upon the mastery probabilities for the examinee and the associations for the assessment item,
optimizing a plurality of mastery thresholds,
generating one or more statistics, wherein each statistic is based on one
or more of the associations, responses, mastery probabilities and mastery parameters, wherein the one or more statistics comprise one or more mastery thresholds; and outputting at least one of the one or more statistics.
65. The system of claim 64 wherein the one or more programming instructions for
optimizing a plurality of mastery thresholds comprise:
for each examinee class, performing an algorithm using the associations,
responses, mastery estimates and mastery parameters to obtain a result;
calculating maximization criteria based on the results for each examine class;
applying the maximization criteria to the mastery thresholds; and
repeating the performing, calculating and applying steps until the mastery
thresholds converge within a threshold range.
66. The system of claim 65 wherein the plurality of examinee classes comprises:
a master class, wherein the master class comprises all examinees having a
mastery state for each attribute associated with an assessment item;
a high non-master class, wherein the high non-master class comprises all
examinees having a mastery state for at least one-half of all attributes, but not all attributes, associated with an assessment item; and a low non-master class, wherein the low non-master class comprises all
examinees having a mastery state for less than one-half of all attributes associated with an assessment item.
67. The system of claim 66 wherein the maximization criteria comprises
maximizing the average of the difference between the average proportion correct for the master class and the average proportion correct for the high non-master class and the
difference between the average proportion correct for the master class and the average
proportion correct for the low non-master class.
68. The system of claim 67 wherein the one or more statistics comprise one or more of the following:
a plurality of mastery settings; maximization criteria corresponding to each of the mastery settings; and
mastery parameters corresponding to each of the mastery settings.
PCT/US2004/013397 2003-04-29 2004-04-29 Method and system for evaluation fit of raw data to model data WO2004097587A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US46631903P 2003-04-29 2003-04-29
US60/466,319 2003-04-29

Publications (2)

Publication Number Publication Date
WO2004097587A2 true WO2004097587A2 (en) 2004-11-11
WO2004097587A3 WO2004097587A3 (en) 2005-02-10

Family

ID=33418364

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2004/013397 WO2004097587A2 (en) 2003-04-29 2004-04-29 Method and system for evaluation fit of raw data to model data

Country Status (1)

Country Link
WO (1) WO2004097587A2 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6460035B1 (en) * 1998-01-10 2002-10-01 International Business Machines Corporation Probabilistic data clustering
US6484010B1 (en) * 1997-12-19 2002-11-19 Educational Testing Service Tree-based approach to proficiency scaling and diagnostic assessment
US6676413B1 (en) * 2002-04-17 2004-01-13 Voyager Expanded Learning, Inc. Method and system for preventing illiteracy in substantially all members of a predetermined set

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6484010B1 (en) * 1997-12-19 2002-11-19 Educational Testing Service Tree-based approach to proficiency scaling and diagnostic assessment
US6460035B1 (en) * 1998-01-10 2002-10-01 International Business Machines Corporation Probabilistic data clustering
US6676413B1 (en) * 2002-04-17 2004-01-13 Voyager Expanded Learning, Inc. Method and system for preventing illiteracy in substantially all members of a predetermined set

Also Published As

Publication number Publication date
WO2004097587A3 (en) 2005-02-10

Similar Documents

Publication Publication Date Title
US7095979B2 (en) Method of evaluation fit of raw data to model data
Hartz A Bayesian framework for the unified model for assessing cognitive abilities: Blending theory with practicality
US8550822B2 (en) Method for estimating examinee attribute parameters in cognitive diagnosis models
US8348674B2 (en) Test discrimination and test construction for cognitive diagnosis
De La Torre et al. Model evaluation and multiple strategies in cognitive diagnosis: An analysis of fraction subtraction data
US8798518B2 (en) Method and system for calibrating evidence models
Kaplan et al. Bayesian statistical methods
US7418458B2 (en) Method for estimating examinee attribute parameters in a cognitive diagnosis model
Henson et al. Cognitive diagnostic attribute-level discrimination indices
Chen et al. Joint Discovery of Skill Prerequisite Graphs and Student Models.
KR102233255B1 (en) Mehtod and apparatuse for learner diagnosis using reliability of cognitive diagnosis medel
Willse Mixture Rasch models with joint maximum likelihood estimation
Xiang Nonlinear penalized estimation of true Q-Matrix in cognitive diagnostic models
Ayers et al. Incorporating student covariates in cognitive diagnosis models
Hendrawan et al. The effect of person misfit on classification decisions
US7440725B2 (en) Method of evaluation fit of raw data to model data
Finkelman et al. Automated test assembly for cognitive diagnosis models using a genetic algorithm
Bao et al. A diagnostic classification model for polytomous attributes
Thompson Bayesian psychometrics for diagnostic assessments: A proof of concept
Shojima Test Data Engineering: Latent Rank Analysis, Biclustering, and Bayesian Network
Li Estimation of Q-matrix for DINA Model Using the Constrained Generalized DINA Framework
Turhan Multilevel 2PL item response model vertical equating with the presence of differential item functioning
Johnson et al. 17 Hierarchical Item Response Theory Models
WO2004097587A2 (en) Method and system for evaluation fit of raw data to model data
Xu Statistical inference for diagnostic classification models

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase