Using simulation to evaluate retest reliability of assessment results


As diagnostic assessment systems become more prevalent as large-scale operational assessments, consideration must be given to the method of reporting reliability. Alternatives to traditional reliability methods must be explored that are consistent with the design, scoring, and reporting level of diagnostic assessment systems. One method for evaluating retest reliability when practical constraints make a second empirical administration infeasible is with the use of simulation methodology. The purpose of this paper is to summarize the method and application of using a simulated second test administration to report reliability for one large-scale operational diagnostic assessment program. Using operational administration data, student response data was simulated based on model-calibrated parameters. Reliability estimates were calculated to provide a measure of association between true and estimated mastery of skills. Overall, results provide support for reporting reliability via simulation-based methods and for the valid interpretation and use of skill mastery information provided in diagnostic score reports. This paper includes a summary of the methods used, presentation of example results, broad implications for its application within the measurement field and future directions.

National Council on Measurement in Education