Differential Item Functioning (DIF) is an important topic in educational testing and psychometrics, which refers to the phenomenon in which test—takers having identical abilities that a particular test item is designed to measure, have different probabilities of correctly answering the item. To achieve the goal of setting an unbiased test, DIF assessment is therefore mandatory. A common metric is often established and serves as a matching variable for assessing whether an item in a target test exhibits DIF. A usual approach is to derive the common metric from an anchor set comprising carefully identiﬁed items from the test. It is highly desirable to have an accurate anchor set which is DIF-free, as the purity of an anchor set will signiﬁcantly affect the accuracy of the item and person calibrations, which in turn affect the success of DIF assessment or detection. Normally, the more accurate the anchor set is, the higher the power and the better controlled the Type I error rate in the DIF detection process will be. This thesis proposes a new anchor selection method for improving the accuracy of DIF detection. The new method, abbreviated as IRCI, is based on an Iterative Randomized Constant Item selection process coupled with scale puriﬁcation that repeatedly identiﬁes DIF items using randomized short anchor sets and ﬁlters the DIF items from the candidate anchor items. A computer simulation program is implemented to serve as a platform for comparing the performance of the new IRCI method against other existing anchor selection methods including the A01 (All-Other—Item), AOI-SP (All~Other-ltem with Scale-Purification) and Cl (Constant-Item). The methods are evaluated upon different test data with different parameter settings (e.g., number of items, DIF contamination rate, etc.). Our simulation results show that the new IRCI method improves the anchor selection accuracy in a number of parameter settings. In general, the power of the IRCI method in DIF detection is higher than that of the A01, the A01-SP and the CI methods, and the Type I error rate is better controlled. The new method is particularly effective under high DIF contamination scenarios. When the number of DIF items is 30% or above, the new method performs better than the A01, the AOI-SP and the CI methods with a bigger margin. Furthermore, when the sample sizes of the reference and focal groups are smaller, the new method also demonstrates improvement over the A01 and A01-SP methods. All rights reserved.
|Publication status||Published - 2014|
- Educational tests and measurements
- Theses and Dissertations
- Thesis (Ed.D.)--The Hong Kong Institute of Education, 2014