Gastroenterological Disease Detection using Transformer-Based Medical Image Analysis: Evaluating ViT-B16 on Curated Colon Data
DOI:
https://doi.org/10.64105/ghbweq62Keywords:
Curated Colon Dataset (CCD), Detection Subtle Anomalies, Gastrointestinal diseases, Prediction Pa- tient Outcome, Vision Transformer (ViT-B16).Abstract
The early detection of gastroenterological dis- eases can improve both outcomes for patients and reduce the burden of diagnosis at late stages. Traditional models, including CNNs, have been limited in capturing complex patterns within medical imaging analysis datasets, resulting in the investigation into transformer architectures, such as the Vision Transformer, or ViT. However, the use of ViT models in medical image analysis for gastroenterological disease detection remains relatively underexplored. This study is intended to evaluate the effectiveness of the ViT- B16 variant in predicting patient outcomes and detecting subtle anomalies using the Curated Colon Dataset, or CCD. This dataset was trained and tested using the transformer- based model and also compared the performance of tra- ditional CNNs. The ViT-B16 reached the result of 99.5% accuracy, while ResNet and EfficientNet reached 91.3% and 92.5% accuracies, respectively. Precision, recall, and AUC had high values; in this case, the AUC was estimated to be around 0.99, which indicates accurate discrimina- tion between classes of diseases. Hence, the obtained results demonstrate that the ViT-B16 model has poten- tial for the medical diagnostics task,particularly classifi- cation and prediction of patient outcomes, with possible applicability in real-world clinical settings,where informed decision-making,explainable AI approaches, clinical inter- pretability remain essential.However, challenges, such as clinical data integration and ethical considerations in diag- nostics, alongside the need for multimodal image fusion and improvements in diagnostics within minority classes, emphasize areas for future work.




