When humans are enlisted to judge competency in many fields, rating errors are inevitable and can have serious consequences. Major rater errors include severity, inconsistency, centrality, and similarity (Myford & Wolfe, 2004). There are two IRT frameworks for rater effects. In the facets framework, raters are treated as independent judges. In the Hierarchical Rater Model (HRM) framework, the ratings given by raters are treated as indicators of the ideal "true" category for the work being judged (e.g., an essay). Both approaches focused on raters’ severity and inconsistency; leaving centrality and similarity almost untouched. This study developed a class of IRT models that account for various rater errors in the HRM framework. Like the traditional HRM, the rating process in the new models contains two stages. In the first stage, the ideal rating that a rater with perfect reliability would assign to the item response follows a standard IRT model (e.g., the rating scale model). In the second stage, raters give ratings to an item response, which may be different to the ideal rating due to rater errors. By incorporating specific parameters for response criteria, the new models can handle raters’ severity, inconsistency, and centrality simultaneously. A series of simulations were conducted to assess parameter recovery of the new models. Results found that the parameters can be well recovered with the freeware JAGS. An empirical example was provided to demonstrate the implications and applications of the new models. A discussion on extending the new models to capture raters’ similarity was given. Copyright © 2017 International Meeting of the Psychometric Society.
|Publication status||Published - Jul 2017|