A COMPARATIVE STUDY OF ORTHOPEDIC SURGEONS AND AI MODELS IN THE CLINICAL EVALUATION OF SPINAL SURGERY


Demir M. T., KÜLTÜR Y.

Journal of Turkish Spinal Surgery, cilt.36, sa.3, ss.125-129, 2025 (Scopus, TRDizin) identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 36 Sayı: 3
  • Basım Tarihi: 2025
  • Doi Numarası: 10.4274/jtss.galenos.2025.62207
  • Dergi Adı: Journal of Turkish Spinal Surgery
  • Derginin Tarandığı İndeksler: Scopus, TR DİZİN (ULAKBİM)
  • Sayfa Sayıları: ss.125-129
  • Anahtar Kelimeler: Artificial intelligence, large language model, spinal surgery
  • İstanbul Yeni Yüzyıl Üniversitesi Adresli: Hayır

Özet

Objective: Spinal surgery (SS) is an area characterized by high intra-operative challenges and higher complication rates compared to several other surgical specialties. The purpose of this study is to evaluate the effectiveness of artificial intelligence (AI) instruments-Chat Generative Pre-trained Transformer (ChatGPT)-4o, DeepSeek-V3, and Gemini Pro-in patient assessment and the clinical decision-making process compared with specialists of orthopedic surgery on a series of case-based and knowledge-based questions relevant to SS. Materials and Methods: By two experienced orthopedic surgeons, a set of 50 questions has been created, including 25 requiring clinical judgement through the use of a case presentation format and 25 to test theoretical understanding. The test was given to two groups: Group 1 included three AI software programs (ChatGPT-4.0, DeepSeek-V3, Gemini Pro) and Group 2 included ten experienced orthopedic surgeons. The answers given were scored independently by the two expert surgeons. Results: Group 2 performed significantly better than Group 1 in the case-based questions. There was a significant difference between the groups in one section (p=0.025), while there was no significant difference for the knowledge-based questions section (p=1.000). On the assessment of total correct responses, Group 2’s performance was significantly better (p=0.036). Conclusion: AI technologies have proved their utility for knowledge-based tasks but are dramatically inferior to clinicians for areas requiring clinical judgement and case analysis. Even if AI algorithms can become auxiliary tools, they should not take the clinician’s place as the decision-maker.