Empirical Research Background: Guessing is commonly found in answering multiple-choice (MC) items, and a successful guessing has been found to be related to ability; recently, the one-parameter logistic model for ability-based guessing (1PL-AG; San Martin, del Pino, & De Boeck, 2006) has been proposed to represent the effect of ability on guessing. The testlet effect, which exists among items sharing a common stimulus, is a significant issue because it may violate the assumption of local independence; the Rasch testlet model (Wang & Wilson, 2005) has been proposed to resolve this problem. To simultaneously model the influences of ability-based guessing and testlet effects in the context of testlet-based MC items, the testlet response model for ability-based guessing (TRM-AG) has been suggested (Lin & Wang, 2009). This study was aimed to demonstrate the application of TRM-AG and the influence of item representations on guessing and testlet effects. Empirical Research Aims: Since different tasks would activate different cognitive processes, this study was aimed at examining whether different types of item representation would also elicit different cognitive processes from examinees and then cause unexpected test performances. The newly developed TRM-AG was used to calibrate a test data set, which was composed of examinees‘ responses to two types of tests, the Symbolic Representation Test (SR Test) and the Pictorial Representation Test (PR Test). The data analysis was focused on the differences between the SR and PR Tests on testlet effects, guessing and ability-weighted parameters. Additionally, TRM-AG can calibrate these parameters simultaneously, so it is reasonable to directly compare these effects across items and tests. Empirical Research Sample: The SR and PR Tests and test data were designed and collected in a research project which was sponsored by the National Science Council of Taiwan in 2005. Participants in this project comprised of volunteer teachers and students. The data set analyzed in this study was part of the database. The effective sample size was 681 fifth-grade elementary school students, 345 female and 336 male. In the test administration process, the test-taking order for the SR and PR Tests were balanced to reduce the influence of the potential order effect. Empirical Research Method: Firstly, to compare the ability-weighted parameters for the SR and PR Tests, TRM-AG was adjusted to include two ability-weighted parameters, one for each test. The original TRM-AG has only one ability-weighted parameter. Secondly, the model fit for TRM-AG and other competing models (the Rasch model, the Rasch testlet model, 1PL-AG, and TRM-AG with only one ability-weighted parameter) were examined. In this study, WinBUGS was used to conduct all the calibrations, so the indicator of model fit was DIC, and there were a total of 4,000 burn-in and 6,000 subsequent iterations for each model calibration. Empirical Research RASCH: Since the SR and PR Tests were administered to the same students at the same time, the response data for each test were combined together. The calibration results of TRM-AG with two ability-weighted parameters for SR and PR Tests were analyzed directly to compare the differences in testlet effects, guessing and ability-weighted parameters. Empirical Research Results: 1.The value of DIC indicates that TRM-AG with two ability-weighted parameters was the preferred model in this study; that is, the SR and PR Tests each has a specific ability-weighted parameter. 2. The average item difficulty of SR Test was lower than that of PR Test. 3. The testlet effect of SR Test was higher than that of PR Test. 4. The average of guessing parameter of SR Test was slightly lower than that of PR Test. 5. The ability-weighted parameter of SR Test was higher than that of PR Test. Empirical Research Conclusions: For measurement, the results show that TRM-AG could be used to observe the influence of testlet-based item characteristics on test performance. This advantage of TRM-AG will be more practical, owing to the popularity of testlets in the test design. For teaching practice, since the symbolic representation is more common than the pictorial representation in the math tests, it could be concluded that students‘ abilities to decipher symbolic representation are stronger than the abilities to decipher pictorial representation. Since the pictorial representation is also part of the math course, some adjustments to the teaching material and test design would be expected.
|Publication status||Published - 2009|