Abstract
Adaptive assessment has been researched as an important formative assessment approach. Adaptive assessment usually relies on a substantial question bank, primarily comprising auto-graded items with varying difficulty levels. Manual question creation by human instructors is very time-consuming, therefore remaining a barrier to the widespread adoption of adaptive assessment. To address this problem, LLM AI technologies have been researched on automatically generating questions. However, LLM inherently tends to produce flawed or erroneous auto-generated questions (AGQ), necessitating human oversight through some question validation criteria (QVC). Therefore, it is important to find out how LLM techniques can be used to generate high-quality questions in the first place. Our study investigates whether embedding QVC directly in the prompts can improve the quality of auto-generated multiple-choice questions and to what extent. We first create a set of QVC based on the MCQ writing guidelines in the literature. Then, we conducted a comparison study to evaluate the quality difference of AGQ with and without QVC in the prompt. The results show embedding QVC in the prompt has generated more good and valid questions. Our findings help researchers in the LLM community recognize an important way of improving AGQ quality.
Details
Presentation Type
Paper Presentation in a Themed Session
Theme
KEYWORDS
Generative AI, Adaptive Assessment, Automatic Question Generation, Question Validation Criteria