Evaluating the performance of ChatGPT and GPT-4o in coding classroom discourse data: A study of synchronous online mathematics instruction

Simin XU, Xiaowei HUANG, Chung Kwan LO, Gaowei CHEN, Morris Siu-yung JONG

Research output: Contribution to journalArticlespeer-review


High-quality instruction is essential to facilitating student learning, prompting many professional development (PD) programmes for teachers to focus on improving classroom dialogue. However, during PD programmes, analysing discourse data is time-consuming, delaying feedback on teachers' performance and potentially impairing the programmes' effectiveness. We therefore explored the use of ChatGPT (a fine-tuned GPT-3.5 series model) and GPT-4o to automate the coding of classroom discourse data. We equipped these AI tools with a codebook designed for mathematics discourse and academically productive talk. Our dataset consisted of over 400 authentic talk turns in Chinese from synchronous online mathematics lessons. The coding outcomes of ChatGPT and GPT-4o were quantitatively compared against a human standard. Qualitative analysis was conducted to understand their coding decisions. The overall agreement between the human standard, ChatGPT output, and GPT-4o output was moderate (Fleiss's Kappa = 0.46) when classifying talk turns into major categories. Pairwise comparisons indicated that GPT-4o (Cohen's Kappa = 0.69) had better performance than ChatGPT (Cohen's Kappa = 0.33). However, at the code level, the performance of both AI tools was unsatisfactory. Based on the identified competences and weaknesses, we propose a two-stage approach to classroom discourse analysis. Specifically, GPT-4o can be employed for the initial category-level analysis, following which teacher educators can conduct a more detailed code-level analysis and refine the coding outcomes. This approach can facilitate timely provision of analytical resources for teachers to reflect on their teaching practices. Copyright © 2024 The Authors. Published by Elsevier Ltd.

Original languageEnglish
Article number100325
JournalComputers and Education: Artificial Intelligence
Early online dateOct 2024
Publication statusPublished - Dec 2024


Xu, S., Huang, X., Lo, C. K., Chen, G., & Jong, M. S.-Y. (2024). Evaluating the performance of ChatGPT and GPT-4o in coding classroom discourse data: A study of synchronous online mathematics instruction. Computers and Education: Artificial Intelligence, 7, Article 100325. https://doi.org/10.1016/j.caeai.2024.100325


  • ChatGPT
  • GPT-4o
  • Classroom discourse analysis
  • Professional development
  • Mathematics instruction


Dive into the research topics of 'Evaluating the performance of ChatGPT and GPT-4o in coding classroom discourse data: A study of synchronous online mathematics instruction'. Together they form a unique fingerprint.