The Role of ChatGPT in Dental Examination A Study on Reliability and Efficiency in Automated Essay Scoring
Abstract
The integration of artificial intelligence (AI) in dental education and assessment has gained significant attention in recent years. This study evaluates the role of ChatGPT in dental examinations, specifically focusing on its reliability and efficiency in automated essay scoring. The research aims to assess how effectively ChatGPT can evaluate dental students’ written responses, considering factors such as accuracy, consistency, and grading bias.
A dataset of subjective dental examination answers was analyzed using ChatGPT’s natural language processing (NLP) capabilities. The AI-generated scores were compared with manual grading by dental educators, using metrics such as correlation with expert scores, intra-rater reliability, and time efficiency. Results indicate that ChatGPT demonstrates high consistency and efficiency in grading, significantly reducing the time required for evaluation. However, challenges such as contextual misinterpretation, grading fairness, and domain-specific limitations were observed.
This study concludes that ChatGPT has promising potential in automated essay scoring for dental examinations, offering a scalable and time-saving solution. However, human oversight remains essential to ensure clinical relevance and fairness in assessment. Future research should focus on refining AI models to better understand dental-specific terminologies and reasoning for improved accuracy.
Aim
- Assess the accuracy of ChatGPT’s grading compared to manual scoring by dental educators.
- Analyze the consistency of AI-generated scores across different responses.
- Evaluate time efficiency, determining whether ChatGPT can reduce the time required for essay evaluation.
4. Identify limitations and challenges, such as contextual misinterpretation or bias in grading.
Objective
- To analyze the accuracy of ChatGPT’s automated essay scoring in dental examinations by comparing AI-generated scores with those given by expert dental educators.
- To evaluate the reliability of ChatGPT in maintaining consistency across multiple essay responses.
- To measure the efficiency of ChatGPT in terms of time taken for evaluation compared to manual grading.
Method: A cross sectional survey was conducted among 204 dental students comprising 57 males and 147 females. The survey included 14 questions. The responses were analyzed based on gender and year of study using chi square gets to identify statistically significant differences.
References
Floridi, L., Chiriatti, M. GPT-3: Its Nature, Scope, Limits, and Consequences. Minds & Machines 30, 681–694 (2020). https://doi.org/10.1007/s11023-020-09548-1
Abd-Alrazaq A, AlSaad R, Alhuwail D, Ahmed A, Healy PM, Latifi S, Aziz S, Damseh R, Alabed Alrazak S, Sheikh J. Large language models in medical education: opportunities, challenges, and future directions. JMIR Med Educ. 2023;9:e48291. doi: https://doi.org/10.2196/48291
Kasneci, E., Sessler, K., Küchemann, S., Bannert, M., Dementieva, D., Fischer, F., Gasser, U., Groh, G., Günnemann, S., Hüllermeier, E., Krusche, S., Kutyniok, G., Michaeli, T., Nerdel, C., Pfeffer, J., Poquet, O., Sailer, M., Schmidt, A., Seidel, T., Kasneci, G. (2023). ChatGPT for good? On opportunities and challenges of large language models for education. Learning and Individual Differences, 103, 102274. https://doi.org/10.1016/j.lindif.2023.102274
Javaid M, Haleem A, Singh RP, Khan S, Khan IH. Unlocking the opportunities through ChatGPT Tool towards ameliorating the education system. BenchCouncil Transact Benchmarks Standards Eval. 2023;3(2): 100115. doi: http://dx.doi.org/10.1016/j.tbench.2023.100115
Javaid M, Haleem A, Singh RP, Khan S, Khan IH. Unlocking the opportunities through ChatGPT Tool towards ameliorating the education system. BenchCouncil Transact Benchmarks Standards Eval. 2023;3(2): 100115. doi: http://dx.doi.org/10.1016/j.tbench.2023.100115
Ramesh, D., Sanampudi, S.K. An automated essay scoring systems: a systematic literature review. Artif Intell Rev 55, 2495–2527 (2022). https://doi.org/10.1007/s10462-021-10068-2
Mizumoto, A., & Eguchi, M. (2023). Exploring the potential of using an AI language model for automated essay scoring. Research Methods in Applied Linguistics, 2(2), 100050. https://doi.org/10.1016/j.rmal.2023.100050
Erturk, S., van Tilburg, W.A.P. & Igou, E.R. Off the mark: Repetitive marking undermines essay evaluations due to boredom. Motiv Emot 46, 264–275 (2022). https://doi.org/10.1007/s11031-022-09929-2
Khan, R. A., Jawaid, M., Khan, A. R., & Sajjad, M. (2023). ChatGPT - Reshaping medical education and clinical management. Pakistan Journal of Medical Sciences, 39(2), 605. https://doi.org/10.12669/pjms.39.2.7653