A pilot study of probing before trusting large language models in self-learning

Zhengyuan WEI, Victor C. S. LEE, W. K. CHAN

Research output: Chapter in Book/Report/Conference proceedingChapters

Abstract

A critical and general problem of large language models (LLMs) is that they may hallucinate, generating specious answers, especially when they interpret certain concepts expressed in the given natural language queries using incorrect (sub)domain knowledge they possess. Still, LLMs are widely used by people. To embrace LLMs in education, students can adopt them to seek quick feedback on their questions to enhance their inquisitive approaches to self-learning, while there is a guard against hallucination. This paper introduces DomainProbe, the first approach to domain-level hallucination detection that leverages metamorphic testing for addressing the test oracle problem to improve the trustworthiness of the feedback generated by LLMs. Given a question posed by a student, DomainProbe prompts the LLM to extract key topical terms from the question and provide explanations for each. The student then evaluates whether there is any inconsistent term-explanation pair. If such an inconsistency is identified, the corresponding answer for the question from the LLM is flagged as untrustworthy. We show DomainProbe to achieve promising results by evaluating it on MMLU, a widely used question-answer benchmark dataset. We further discuss our vision on the approach to promote students' learning objectives and outline future work for the metamorphic relations formulated in DomainProbe. Copyright © 2025 IEEE.

Original languageEnglish
Title of host publicationProceedings of 2025 International Symposium on Educational Technology, ISET 2025
EditorsKwok Tai CHUI, Chaiporn JAIKAEO, Jitti NIRAMITRANON, Wattana KAEWMANEE, Kwan-Keung NG, Pornthipa ONGKUNARUK
Place of PublicationDanvers, MA
PublisherIEEE
Pages190-195
ISBN (Electronic)9798331595500
DOIs
Publication statusPublished - 2025

Citation

Wei, Z., Lee, V. C. S., & Chan, W. K. (2025). A pilot study of probing before trusting large language models in self-learning. In K. T. Chui, C. Jaikaeo, J. Niramitranon, W. Kaewmanee, K.-K. Ng, & P. Ongkunaruk (Eds.), Proceedings of 2025 International Symposium on Educational Technology, ISET 2025 (pp. 190-195). IEEE. https://doi.org/10.1109/ISET65607.2025.00046

Keywords

  • AI hallucination detection
  • Metamorphic testing
  • Test oracle
  • Educational technology
  • Generative AI

Fingerprint

Dive into the research topics of 'A pilot study of probing before trusting large language models in self-learning'. Together they form a unique fingerprint.