Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.
Advertisement
Scientific Reports volume 16, Article number: 3780 (2026)
5166
1
2
Metrics details
In today’s world, Diabetic Retinopathy (DR) remains a leading cause of vision loss globally, necessitating early detection and accurate diagnosis for timely intervention. Traditional machine learning and deep learning-based approaches, while effective, often suffer from issues such as limited interpretability, static decision-making, and inadequate generalization across diverse patient data. This research introduces an Agentic-AI Driven Framework for Diabetic Retinopathy Analysis (AADR-AI), which leverages intelligent agent-based learning mechanisms to enhance decision-making autonomy, dynamic adaptability, and contextual understanding of retinal fundus images. The novelty lies in incorporating agentic intelligence principles, autonomy, reactivity, and proactivity into DR detection systems, allowing real-time analysis and adaptive feature learning based on patient-specific variations. The proposed AADR-AI framework integrates a multi-agent ensemble of convolutional and transformer-based networks, coordinated through a decision fusion layer for robust classification. Key contributions include improved classification accuracy (up to 96.7%), enhanced model efficiency with reduced computational overhead, and real-time adaptability to varying image qualities and disease progression stages. Extensive experimentation on benchmark datasets demonstrates superior performance compared to existing state-of-the-art methods. This work highlights the transformative potential of agentic AI in medical imaging, paving the way for more autonomous and interpretable clinical decision-support systems.
Diabetic retinopathy is a disorder that happens over time because of diabetes, and if it is not treated, it might lead to permanent blindness1. Early diagnosis using retinal fundus image analysis is the key to successful clinical intervention2. Rule-based models and classical deep learning architectures are examples of old DR detection methods that often do not work well in diverse situations, are hard to grasp, and do not always work well with different datasets3. In the actual world, diagnostic limits happen since these methods are static and cannot change to fit different patterns that are unique to each patient4. This study introduces a new AADR-AI to help with these problems5.
This approach is based on the idea that agentic intelligence, which means being self-reliant, flexible, and acting with a purpose, should be a part of DR diagnosis6. Unlike other models, AADR-AI adjusts the way it makes decisions in real-time when it gets new data7. This proposed system uses a decision fusion method to connect transformer models with convolutional neural networks in a multi-agent architecture8. This makes the system more responsive to changes in imaging conditions, less expensive to run, and more accurate at diagnosing problems9. This new development is a big step forward from earlier methods since it sets the stage for a smarter and more reliable way to find DR10.
The current approaches for finding diabetic retinopathy have problems with inconsistent results across multiple retinal scans, limited flexibility, and hard-to-understand results. In real-life clinical settings, these static models often do not work for other circumstances. The authors recommend the AADR-AI as a way to fix these difficulties. It includes goal-directed agents that are independent, flexible, and enhance diagnostic accuracy, efficiency, and responsiveness in real time without losing sight of their main aims.
The suggested system enables autonomy, reactivity, and proactivity in the diagnostic process by combining transformer models, convolutional neural networks, and a hybrid CNN–ViT architecture within a multi-agent ensemble. The agentic design, in contrast to traditional deep learning systems, provides better diagnostic precision across all disease phases, transparent interpretability through explainable agents, and real-time flexibility. Both scientific innovation (through the use of JADE/Mesa to execute real agentic behavior) and practical clinical relevance by showcasing a reliable, comprehensible, and scalable methodology for retinal disease screening are the contributions.
This study presents the AADR-AI framework, which uses AI to make DR diagnosis more independent and adaptable.
It has a multi-agent deep learning architecture that combines decisions in real time, which makes it more accurate and efficient.
The system has better performance on benchmark datasets for robust retinal disease diagnosis since it can be scaled, understood, and tailored to each patient.
The remaining section of this paper is organized as follows: Sect. “Background” reviews past studies on the methods of DR, which often suffer from issues such as limited interpretability, static decision-making, and inadequate generalization across diverse patient data. Section “Problem statement” describes the proposed AADR-AI process. Section “Contributions of this paper” looks at and compares our suggested approach with other conventional approaches. Section “The three major contributions are” of the paper finishes with a discussion of possible future studies.
This study goes into detail about the past and present state of the art in diagnosing diabetic retinopathy using AI, agent-based modelling, deep learning, and hybrid CNN-transformer frameworks. This review looks at these works to find out what the present problems and limitations are in terms of accuracy, flexibility, and ease of understanding. These findings directly support the development of the proposed AADR-AI architecture, which includes agentic intelligence.
The authors in the paper elucidate how deep learning algorithms can be used to find diabetic retinopathy in retinal fundus images (DR-RFI). It shows how well pre-trained networks and models based on convolutional neural networks can diagnose things automatically11. The study says that deep learning is better at classifying data, and it still has problems with data imbalance, and it needs to understand how it is used in clinical settings.
This research explores how machine learning and deep learning can be used to diagnose DR. The authors examine both deep CNN architectures and regular classifiers12. Deep learning models are better than standard ones, and their research demonstrates that real-time deployment is hard because of concerns with dataset diversity, limited generalizability, and a lack of transparency.
The new developments in using AI to find DR in retinal images, and it goes into deep learning, segmentation, and hybrid models, focusing on how beneficial they are for helping with diagnosis13. Even if performance has improved, the results show that data variability and black-box behaviour are still problems for AI when it comes to validation, understanding, and using it in real-world settings.
This paper examines computer-assisted strategies for detecting DR (CaS-DDR) that are based on analyzing fundus images. It talks about how to sort things, get features out of pictures, and prepare pictures for use14. The report highlights that integrating traditional image processing with AI approaches makes things work better, and there is still a lot of research to be done on dependable early detection and strong real-time diagnostic systems.
The decentralized agent-based architecture for putting together medical images was discussed in this paper. The approach uses smart agents that talk to each other in a distributed setting to handle imaging data15. The results show that the system is more autonomous and scalable, which means that agent-based methods provide a new, flexible choice for difficult medical imaging tasks.
Integration of agent-based modelling with machine learning (Ia-M-ML) might help biological systems, and this study details healthcare uses of agents for dynamic simulation and prediction16. The authors say that combining ML with agent-based systems makes biological diagnostics and analysis better by making them more flexible, allowing for patient-specific modelling, and making decisions at the system level.
This overview of vision transformers (ViTs) and hybrid convolutional neural network (CNN)-transformer models has changed throughout the years and is defined in this paper. The authors describe an architecture that combines CNN’s ability to find things in a specific area with the transformer’s ability to learn about the whole world17. Their research shows that hybrid models are better at classifying than models that work on their own. This is especially true for DR and medical imaging applications.
This article illustrates a hybrid CNN-Vision Transformer design (CNN-ViT) for the purpose of organizing pictures. It recommends a structural mix to help with long-range interdependence and extracting spatial features18. The results reveal that hybrid architectures are more accurate and durable than solo models. This is good news for medical diagnostic uses like processing fundus pictures.
In this article, AI-powered transparent decision-support systems (AI-TDSS) for medical diagnosis by combining explainable models with healthcare workflows are presented, making things easier to understand and giving people more trust, as given in this paper19. The study’s discovery that openness improves choice acceptance and system dependability points to a shift away from opaque AI models and toward diagnostic systems that put the patient first.
The AI-powered tools transforming how doctors diagnose and treat patients are examined in this paper20. It proposes a smart system that uses machine learning to understand clinical data. The results show that AI makes diagnoses far more accurate and timelier, and that adding explainability and real-time analysis to clinical decision-making makes it stronger.
The literature shows that both classic and new AI methods for diagnosing DR have their pros and cons. There is a lot of talk about systems that can work on their own, understand what they are doing, and change as needed. These results show that the proposed AADR-AI system is effective because it combines real-time adaptation with multi-agent deep learning to fix the problems with current fundus image analysis diagnostic tools.
Table 1 shows various methods in related work. The AADR-AI model uses a modular multi-agent system instead of a single fusion stream, unlike CNN + ViT designs. It uses numerous flexible feature extraction agents in tandem to capture more retinal characteristics for diabetic retinopathy detection than CNN and ViT agents. A dedicated decision fusion layer blends agent outputs using probability averaging or majority voting to improve robustness and adaptability. The model has an explainable AI interface and real-time physician verification, ensuring transparency and clinical reliability, unlike most CNN + ViT hybrids. The platform supports independent agent specialization and dynamic adaptation to image quality and diagnostic needs using agentic reasoning. Its pipeline includes preprocessing, multi-agent extraction, fusion, classification, and explainability, making it more intelligent, interpretable, and clinically deployable than CNN + ViT fusion models.
The primary benefit of this research is the development of an Agentic AI framework (AADR-AI) for the identification of diabetic retinopathy, which operationalizes actual agentic behavior in the context of medical imaging. In contrast to previous research that uses CNNs or transformers as static classifiers, this approach uses JADE/Mesa to build these models as autonomous agents, allowing them to be autonomous, reactive, and proactive. Through confidence-weighted negotiation methods and shared knowledge structures, these agents, which include a CNN Agent, Transformer Agent, Hybrid CNN–ViT Agent, Fusion Agent, and Explanation Agent, interact dynamically to form an ensemble that instantly adjusts to changing input conditions. Additionally, by generating Grad-CAM visuals and clinician-informed reasons, the addition of an Explanation Agent guarantees interpretability and closes the gap between transparency and accuracy. A new paradigm for adaptable, interpretable, and repeatable medical AI systems has been established by this agent-oriented design, which represents an important change from traditional ensemble or hybrid approaches.
Compared to current hybrid CNN–ViT methods, which mainly combine local and global representations into a single static model, our framework operationalizes these architectures as independent agents capable of adaptively re-weighting their effect according to dependability and confidence. While conventional multi-model ensembles for DR detection usually use globally learned or fixed fusion procedures, our Fusion Agent adds a dynamic negotiation mechanism that modifies weights per input, allowing for proactive and reactive behavior that has not been seen in previous research. In contrast to interpretable DR systems that often offer post-hoc explanations, our Explanation Agent incorporates interpretability right into the decision-making process by creating adaptive Grad-CAM maps and revising explanations in response to clinician input. When taken together, these contributions create a technical difference: whereas previous works add interpretability as an auxiliary step or combine architectures for accuracy, our system incorporates true agentic intelligence into the ensemble itself, producing a diagnostic tool that is more transparent, adaptive, and clinically relevant.
This method suggests an AADR-AI to help diagnose diabetic retinopathy using pictures of the retina. It uses autonomous, adaptable agents to extract features and make diagnoses. By using convolutional neural networks, transformers, and a decision fusion layer, the system gets beyond the problems with traditional static diagnostic models. This makes the system more accurate, able to change in real time, and easier to understand.
The architecture and parameters of the model are as follows:
The CNN Agent uses a ResNet-50 backbone, which is composed of 50 convolutional layers with batch normalization and ReLU. The global average pooling layer and a fully connected 5-output neuron (softmax) classifier were employed. The training was done with a dropout rate of 0.2. The Transformer Agent is based on ViT-Base architecture, 16 × 16 patch, 768 embedding dimension, 12 self-attention heads, 12 encoder layers, and 3072 hidden dimensions of the MLP. All blocks of Transformers used layer normalization and dropout (0.1). The Hybrid Agent is a CNN-Transformer model that employs a concatenation-based fusion layer and then a fully connected layer (512 units) and softmax output. Adaptive fusion weights are estimated with a confidence-based mechanism. Cross-entropy loss, He-normal weight initialization, and L2 regularization were all used to optimize all the models. The parameters are consistent between agents and can be used to reproduce the performance of the system.
In this study, the suggested AADR-AI framework has been implemented as a multi-agent system using JADE (Java Agent Development Framework) and Mesa (a Python agent-based modelling library) to ensure that the diagnostic pipeline demonstrates true agentic behaviour. With defined objectives, rules for making decisions, and procedures for communicating, every agent functions as an independent entity. When input quality falls below a threshold, a pre-processing agent proactively re-invokes denoising or contrast adjustment and automatically improves image quality. A CNN agent that focuses on local lesions, a Transformer agent that captures global structural dependencies, and a Hybrid CNN-ViT agent that combines local and global data are the three independent agents that perform feature extraction. The Transformer Agent, for instance, increases attention-head weighting when the CNN Agent reports low certainty. These agents do more than just forward results; they continuously assess their confidence levels and adjust their processing processes in response. In JADE, a central Fusion Agent uses FIPA-ACL communication to coordinate output negotiation, dynamically modifying decision weights using consensus procedures like confidence-weighted aggregation or majority voting. An Explanation Agent creates textual reasons and visual saliency maps to improve interpretability, and it responds to clinician overrides by proactively updating its policies. Structured message forwarding and coordination between agents are made possible by JADE, and agents can proactively create new objectives like improving thresholds or starting more feature checks by using Mesa to model adaptive behaviors in the diagnostic environment. This architecture makes the novelty of the suggested framework tangible and repeatable by embodying the three characteristics of agentic intelligence: autonomy in making decisions on its own, reactivity in responding to changes in the environment, and proactivity in starting diagnostic improvements.
To generate the final diabetic retinopathy stage, the outputs of all feature extraction agents (CNN, Transformer, Hybrid CNN-ViT) are combined using standard ensemble fusion techniques using Eqs. 1,2, and 3.
When using confidence-weighted fusion, the final probability (:{p}_{final}) distribution (:{p}_{i}) is computed as:
where (:{w}_{i}) denotes the normalized confidence of agent (:i).
When equal weighting is desired, the probabilities are averaged:
Alternatively, for class-level decisions (:N), majority voting is used:
These fusion strategies improve robustness by leveraging diverse agent perspectives.
The proposed AADR-AI framework has a multi-agent architecture that may be used to find and diagnose DR utilizing retinal fundus pictures. There is a preprocessing module that improves the photos by levelling the contrast and getting rid of noise. In Fig. 1, the dotted ovals “Combines Outputs from CNN” and “Transformer” demonstrate how the model mixes input from several processing agents before concluding. A single representation is created from many CNN-based agents’ locally generated spatial attributes or predictions. Transformer and hybrid CNN-ViT agent outputs show global contextualization. Local (CNN) and global (Transformer) information are merged in the decision fusion layer to capture fine-grained visual details and contextual patterns. This split and fusion show the architecture’s multi-stream design, which improves the Diabetic Retinopathy stages.
Proposed system.
Specialized feature extraction agents like CNN, Transformer, and Hybrid CNN-ViT can learn about spatial and contextual information on their own. After that, they are given the updated images. These agents show the ideals of agentic AI, like being flexible and independent, and they can work alone or with others, as shown in Fig. 1. A centralized Decision Fusion Layer gets its outputs and combines them to make accurate DR stage classifications. This multilayer decision-making approach guarantees better and more reliable diagnostic accuracy21. After that, the model’s decisions are shown through an explainable support interface, which makes them easier to understand in a clinical setting. This architecture brings dynamic, responsive, and interpretable AI to solve problems with static, monolithic models. It does this by showing that it can work well, be flexible in real time, and be more reliable in diverse imaging circumstances.
The proposed explainable decision support system is worthy of praise since it improves the level of trustworthiness and transparency in applications of artificial intelligence in the medical field. The generation of visual and feature attribution explanations is accomplished by the utilization of techniques such as Grad-CAM, SHAP, and LIME. This provides clinicians with the ability to comprehend the reasoning behind each diagnosis. This method not only makes the decision-making process of the model more transparent, and it helps to improve the level of collaboration between humans and AI, boosts clinician confidence, and makes it easier to identify any potential flaws. To improve diagnosis accuracy, raise user trust, and fit with the ethical need for openness in healthcare AI, the system provides interpretable insights that are aligned with clinical reasoning. As a result, the system represents a substantial leap in clinical decision support.
The implementation of the agentic framework was done through a combination of JADE (Java Agent DEvelopment Framework) and the Mesa (a Python-based agent simulation). All four agents are CNN Agent, Transformer Agent, Hybrid Fusion Agent, and Support/Explainability Agent, and each of them was implemented as an autonomous JADE agent. Mesa managed agent state changes, observation changes, and adaptive weight recalibration internally. Messages between agents were using the protocol of FIPA-ACL; these message types were request, inform, and propose to negotiate during decision fusion.
Algorithm 1: Agentic-AI driven diabetic retinopathy detection algorithm.
The algorithm 1 processes a retinal image using three agents: CNN, Transformer, and Hybrid CNN-ViT. Each agent extracts features and produces a classification score. The system selects the label with the best score, merges ties when necessary, and generates an explanation. The final output includes the predicted DR stage and confidence score.
The first step of the AADR-AI system is to get images of the retinal fundus and then improve them with a preprocessing module. This part cleans up the photographs, makes them more contrasty, and changes the brightness, size, and noise levels. These methods make micro-aneurysms and haemorrhages, two small and important signs of correct DR diagnosis, easier to see.
Image acquisition and preprocessing.
Preprocessing guarantees that the input to the feature extraction agents that come after it is clean and consistent, which helps decrease differences caused by varied imaging settings in Fig. 2. This step improves the system’s accuracy and dependability by making the input data more consistent and of better quality22. It fixes issues with current systems that make it hard to generalize and make wrong predictions when using raw, unedited photographs. It is an important first step in getting the multi-agent architecture ready to run at its best. This, in turn, helps the downstream models learn DR detection patterns more accurately.
In the second step, a hybrid CNN-ViT agent, a transformer-based agent, and an agent based on convolutional neural networks (CNN) all work together to process enhanced fundus images. Each agent can automatically capture both visual and contextual patterns. The CNN agent and the transformer agent are both good at different things. The ensembled Hybrid CNN + Transformer Agent in Fig. 3 integrates local and global retinal picture patterns using the strengths of CNNs and transformer models. While the CNN agent efficiently learns local spatial features like lesions and textures that indicate early diabetic retinopathy, the transformer agent captures global context by modeling relationships across regions, which is necessary for identifying extensive retinal structural abnormalities. The hybrid CNN-ViT agent combines local and global representations to provide a feature set with comprehensive local information and holistic contextual patterns. This dual attention approach lets the model process tiny and broad visual inputs to detect minute aberrations and measure retinal health. Figure 3 shows that this method catches CNN local detail and transformer systemic changes. The modular, parallel processing technique improves feature diversity and lowers overfitting, enabling more accurate diabetic retinopathy categorization across image qualities and complexities. This update addresses the reviewer’s criticism and shows that the hybrid agent has a comprehensive, adaptable, and contextually enriched feature set for accurate medical diagnosis.
Multi-agent feature extraction.
The CNN agent is good at learning about local properties, like lesions, and the transformer agent is good at learning about global relationships, like retinal structure. These agents can work on their own and adapt to pictures of varied levels of complexity and quality, which is a key idea in agentic AI in Fig. 3. Parallel processing makes the system more efficient and reliable overall by covering more features and reducing overfitting. This modular strategy gets around the problems with single-model systems and makes it feasible to get better classification accuracy by getting a wider range of information23. Because of this, the system is now better able to respond to both localized and systemic changes in retinal illness. This makes it more accurate and flexible when diagnosing24.
JADE and Mesa were used to implement the AADR-AI system. The agents (CNN Agent, Transformer Agent, Hybrid Fusion Agent, and Explainability Agent) acted independently and exchanged messages using FIPA-ACL. The Fusion Agent used an adaptive confidence-based weighting of predictions made by the learning agents and reprocessed where inconsistencies were found.
In step three, the process uses a Multi-Agent Decision Fusion Layer to merge the outputs from the feature extraction agents. This layer uses confidence weighting and consensus-based assessment methods, like averaged probability or a majority vote, to come up with the final Diabetic Retinopathy stage categorization.
Decision fusion and classification.
The fusion mechanism makes use of each agent’s strengths and reduces their weaknesses to make them more reliable. This example of proactive AI shows how numerous agents can work together to reach a shared goal. Using Fig. 4, the system might be able to make categorization more accurate and models more stable when the system works together. This is especially helpful when dealing with unclear image features or settings that are not always the same. This step is very important for getting the greater diagnostic accuracy that the suggested system wants to deliver. It does this by balancing the views of different agents25. The combined confidence measures help cut down on the number of false positives and negatives. This step is important for making decisions in the real world, and it usually leads to a diagnosis that is more reliable and based on consensus26.
The third step is to show the user the DR categorization results through an Explainable Decision Support Interface. This module gives visual reasons for and explains the process of a decision, such as confidence scores and saliency maps (like Grad-CAM). It makes sure that the proposed system’s most important features are openness and interpretability27. Because they are not clear, traditional black-box AI models often do not win over doctors. This step fixes that.
Explainable decision support interface.
By delivering results that are easy to understand, it makes things easier to use and more acceptable in the clinic. Adding agentic reasoning findings fits with the system’s main goal, which is to give clear, flexible, and real-time diagnoses in Fig. 5. It lets doctors and nurses double-check AI decisions, make smart choices, and step in when they need to. This step ensures that AI improves, not replaces, human decision-making in diagnosing diabetic retinopathy28. It supports the system’s crucial features, like being understandable and being ready for use in the real world.
The procedure begins with the acquisition of a picture of the retinal fundus, which is then followed by a thorough preprocessing module that conducts modifications such as contrast correction, noise reduction, resizing, and brightness normalization. This results in improved retinal images that may then be used for further research. Parallel operations are carried out on the preprocessed pictures by a number of different feature extraction agents. These agents include CNN, transformer, and hybrid CNN-ViT models. CNNs concentrate on spatial and local details, transformers concentrate on global contextual patterns, and hybrid models combine both features for full representation. Each agent specializes in capturing a different type of feature. The results obtained from these agents are compiled and directed into a decision fusion layer. In this layer, the opinions of the agents are fused using methods such as probability averaging, majority voting, or confidence weighting. The goal of this layer is to guarantee accurate classification of diabetic retinopathy and to minimize the likelihood of making inaccurate predictions. An explainable decision interface is used to assist the final classification result. This interface offers visual explanations, saliency maps, and confidence scores, which contribute to an increase in clinical trust and transparency. The dependability of the automated diagnosis process is further ensured by the presence of real-time decision support and physician verification procedures. A straightforward implementation and extension for replication or adaptation in other medical image analysis contexts is made possible by the modularity and clarity of each stage, which enables the implementation to be straightforward.
The CNN Agent has a ResNet-50 backbone that consists of 50 convolutional layers, the ReLU activation, and the global average pooling classifier head. Transformer Agent (ViT-Base) has 12 attention heads, 12 encoder layers, and patch 16 × 16 embeddings. The Hybrid Agent is a composite of CNN and Transformer functionality, whereby the weighted feature fusion layer incorporates a dropout of 0.2. Cross-entropy loss is used in all models, and He-normal is applied.
Overall flow diagram.
The first phase in the complete AADR-AI system is to make retinal fundus photos seem better. Three independent agents, CNN, Transformer, and Hybrid CNN-ViT, process these better images at the same time to get different spatial and contextual features in Fig. 6. A decision fusion layer combines its results to get the final Diabetic Retinopathy classification. The explainable interface for the outcome includes visual explanations and confidence scores. This thorough procedure fixes the problems with traditional diagnostic methods while making them more accurate, flexible in real time, and easy to understand, all of which are important for the proposed system. The recommended AADR-AI system combines agentic intelligence ideas with multi-agent deep learning to make sure that DR is diagnosed correctly. It can improve computing efficiency, clinical interpretability, and classification accuracy through decision fusion and parallel feature extraction. The system’s explainable output makes it clear, which makes it suitable for use in real-world medical settings, and it gets around major problems that current AI-based diagnostic systems have.
The Kaggle Diabetic Retinopathy Detection dataset consists of high-resolution retinal images and is used to assess the presence and severity of diabetic retinopathy (DR). It includes over 88,000 annotated images divided into five classes representing the stages of DR (0–4); each image contains labels from clinical DR grading and is of either the left or right eye. Deep learning algorithms can be trained using this dataset to auto-diagnose DR. The dataset first appeared in a Kaggle competition staged by the California Healthcare Foundation and EyePACS, and this dataset has been used as a standard dataset in studies of DR detection and medical imaging29. Patient-level stratification was used to separate the dataset into training (70%), validation (15%), and testing (15%) sets to avoid data leakage. Preprocessing of all the images was done, which included CLAHE, a Gaussian denoising parameter of 0.5, downsizing to 512 × 512, and normalization. Data augmentation was done with rotation (0–15 degrees), horizontal/vertical flipping, and brightness/contrast changes to enhance model generalization. Weighted sampling and augmentation were used to curb the class imbalance among the five grades of DR. These measures made sure that generalization and uniform learning were better at all levels of severity.
Programming Language: Python – for scripting model development, data preprocessing, and evaluation pipelines.
Deep Learning Frameworks: TensorFlow & PyTorch – used for implementing CNNs, Vision Transformers, and model fusion components.
Model Architecture: Hybrid CNN–Transformer Ensemble – combines convolutional neural networks for spatial feature extraction and Vision Transformers for global context modeling.
Agent-Based Modeling: JADE or Mesa (Python-based) – to simulate agentic behaviors such as autonomy, proactivity, and adaptability in the learning system.
Dataset Management: Kaggle API & Pandas – for dataset retrieval (EyePACS, DIARETDB1), preprocessing, and tabular data handling.
Deployment & Visualization: Flask or FastAPI (for REST APIs), with Streamlit or Plotly – to build real-time diagnostic interfaces and visualize model outputs and decision pathways.
Traditional Machine Learning Models: Support Vector Machine (SVM), Random Forest, k-Nearest Neighbors (k-NN).
Standard Deep Learning Models: Convolutional Neural Networks (CNN), ResNet-50, VGG-16.
Vision Transformers: ViT (Vision Transformer), Swin Transformer.
Hybrid CNN-RNN Architectures: CNN + LSTM models for temporal/feature sequence learning.
Agent-less Ensemble Models: Voting/Stacking ensembles of CNNs and Transformers without agentic coordination.
AutoML-Based Models: Google AutoML Vision, AutoKeras – for automated feature and architecture selection.
The multi-agent system was fully constructed utilizing JADE and Mesa frameworks for genuine agentic behavior. The CNN, Transformer, Hybrid, and Support agents were instantiated as modules in these settings. To coordinate decision fusion, JADE agents used the FIPA-ACL protocol to send request, inform, and suggest messages. Agent interactions, state updates, and adaptive weight changes were dynamically controlled in Mesa’s global environment. To test operation, full runtime logs, negotiation traces, and adaptive updates were captured for Python (Mesa) and Java (JADE). Our records show real-time agent negotiation, fusion weight adjustment, and task coordination in the decision fusion layer. End-to-end workflow from preprocessing to classification and explainability showed genuine multi-agent adaptivity, communication, and consensus in Diabetic Retinopathy stage classification, as seen in performance results and explainability modules.
For robustness and transparency, the AADR-AI system was trained using a standardized, reproducible process. Fundus images were contrast-enhanced, Gaussian denoised, and pixel normalised, with 70% training, 15% validation, and 15% testing sets to preserve class balance across five DR stages. CNN, Transformer, Hybrid CNN-ViT, and Supplementary Extractor were trained independently with optimized hyperparameters, Adam or SGD optimizers, and early stopping. Confidence-weighted averaging and agent negotiation techniques initiated and refined fusion weights. Under 0.8s inference delay, the final model had 96.7% accuracy, 0.95 F1-score, and 0.98 ROC-AUC. Explanation and clinician verification ensured interpretability and dependability, with full logs and metrics for reproducibility.
The AADR-AI framework will be evaluated on eight important performance dimensions, which articulate disparate factors on the way performance is measured for agentic AI: accuracy in classification, interpretability of models, computational efficiency, robustness against variability in data, confidence in decision fusion effectiveness, scalability, and robustness across datasets. All of these dimensions contribute to how agentic AI can transform the detection of diabetic retinopathy in the clinic into a more trustworthy, efficient, and reliable task.
To maintain class proportions, experiments have been done using a stratified 70/15/15 train/validation/test split on the Kaggle Diabetic Retinopathy Detection dataset. CLAHE, Gaussian denoising (σ = 0.5), ImageNet normalization, and conventional augmentations (random rotation ± 15°, flips, brightness/contrast jitter) have been applied to all 512 × 512 images. VGG-16, a ViT (patch 16), an SVM trained on ResNet feature embeddings, and a refined ResNet-50 were among the baseline models. Three separate agents—the CNN, Transformer, and Hybrid CNN-ViT agents—as well as a Fusion agent that facilitates majority voting and confidence-weighted aggregation, make up the suggested Agentic AADR-AI ensemble. Using PyTorch with AdamW, a cosine annealing scheduler, batch size 16, mixed precision, an initial LR of 1e-4, and early giving up on validation F1 (patience = 6), the models were trained. Accuracy, per-class precision/recall/F1, macro F1, and ROC-AUC (one-vs-rest) were used to report performance; each model’s inference time (ms/image) and GPU memory use were recorded. Five-fold cross-validation has been employed to repeat the experiments, and the mean ± standard deviation is presented. Explainability maps were created using Grad-CAM for each agent, and agent confidence has been verified by logging clinician overrides. The project repository contains all of the evaluation scripts, checkpoints, training logs, and hyperparameters. Three agents (CNN, Transformer, and Hybrid CNN-ViT) were included in the ensemble. It has been organized by a fusion agent that dynamically changed weights according to agent confidence. The project repository contains all code, configuration files, dataset split indices, trained checkpoints, and evaluation scripts for transparency, as well as Colab notebooks for simple replication. To ensure that the presented results can be replicated and independently checked, random seeds were fixed at every stage, and full logs, confusion matrices, and Grad-CAM explanations are retained.
The Kaggle Diabetic Retinopathy Detection (EyePACS) dataset, which includes retinal fundus images analyzed on a five-point severity scale (0–4), has been employed for all tests. Here, patient-level stratified splitting is used to make sure that no photos from the same patient have been included in more than one group to guarantee a reliable and independent evaluation. The original class distribution continued throughout each split of the dataset, which was split into 70% training, 15% validation, and 15% testing based on patient identities. Here, a 5-fold cross-validation protocol at the patient level for statistical reliability, with an independent 70/15/15 split for each fold. The findings are shown as mean ± standard deviation, and all experiments were conducted five times. To ensure reproducibility, the precise patient IDs linked to every split have been preserved and made available to the public in the supplemental repository. Table 2 shows the Training hyperparameters and configurations.
All the experiments were made on NVIDIA RTX 3090 (24 GB VRAM) and Intel i9 processor, and 32 GB RAM. JADE 4.5, PyTorch 2.1, OpenCV, and Python 3.10 were used to implement the models. The training utilized the AdamW optimizer at a learning rate of 1e −4, batch size of 16, weight decay of 1e −4, and 40 epochs. The patience of the early stopping was set to 6, and the mixed-precision training (AMP) was turned on to be computationally efficient. Random seeds were attached to obtain reproducibility.
Each model was trained with an initial learning rate of 1 × 10− 4, weight decay of 1 × 10-4, and 0.9 as beta 1 and beta 2, respectively. A learning rate scheduler was used, which was a cosine annealing learning rate that was slowly decreasing throughout the training process. The batch size was 16, and the training was done at 40 epochs with an early stopping option in which the training was terminated after 6 consecutive epochs without improvement in the validation loss. Training was stabilized by gradient clipping (max-norm = 5.0) and allowed mixed-precision training (AMP) to minimize the use of memory. To augment data, it used random rotations (within the range of + 15-°), flips, brightness-contrast jitter, and Gaussian noise. The objective function of all agents was the cross-entropy loss.
The experiments were all of the type of workstation with an NVIDIA RTX 3090 with 24 GB VRAM, CPU i9-12900 K, and 32 GB of RAM (Ubuntu 20.04 LTS). Python 3.10, PyTorch 2.1, Tensorflow 2.15, openCV 4.8, scikit-learn 1.3, and Pandas 2.1 were used to develop model training and agentic simulations. The multi-agent system was run in JADE 4.5.0 (Java 11 environment) and Mesa 1.3.0. CUDA 12.1 and cuDNN 8.9 were used in the experiments to accelerate them using the graphics processors. To achieve reproducibility, all random seeds (NumPy, PyTorch, Python) have been fixed, and deterministic mode was used to train them.
To confirm the strength of the suggested AADR-AI architecture, all experiments have been performed with 5-fold cross-validation, and the results are presented in the form of mean and standard deviation. Statistical significance tests were conducted to compare the proposed multi-agent model with baseline CNN, Transformer, and hybrid models. Paired t-test and Wilcoxon signed-rank tests were applied to the continuous performance measures (accuracy, F1-score, AUC), whereas the McNemar test was applied to compare the differences in the classification error rates. The suggested framework statistically improved in accuracy, F1-score, and AUC over the baseline models (p < 0.05), which proved the reliability of the performance gains. These tests make sure that the improvements that are observed are not coincidental and are folds of consistent, repeatable improvement.
Classification Accuracy Analysis.
The AADR-AI architecture demonstrated superior classification accuracy with 96.7% accuracy over the EyePACS dataset due to its multi-agent ensemble that employs a mixture of convolutional neural networks and transformers to sequentially detect DR stages from Fig. 7, where one can expect 3–5% better performance than the average classic CNN systems that resulted in average performances of 91.5% to 93.7%. Importantly, the architecture is built to collect local and global retinal characteristics, ensuring the user will receive a very consistent diagnosis regardless of class imbalance.
Model interpretability analysis.
In 88% of test experiences, the localized agent explanations and attention heatmaps provided by AADR-AI confirmed expert annotations, adding interpretation. The use of each agent’s decision pathways allows practitioners to check forecasts, as mentioned in Fig. 8. The sequential assembly design is useful in healthcare environments where transparency and being accountable are necessary, and provides clarification of confidence scores and lesion attributing factors, unlike black-box deep models.
AADR-AI has demonstrated the ability to maintain rehearsal times of less than 150 milliseconds per image on regular GPU platforms, which highlights the real-time capacity. To minimize round-trip times as in Table 3, the model senses and reacts to any alterations in picture quality and noise practically in real time by deploying agents simultaneously. The AADR-AI has reactive and proactive behaviors when deploying agents that allow AADR-AI to maintain reliable predictions across endless clinical imaging environments compared to static DR classifiers that can be limited by quality, noise, or image resolution.
Computational efficiency analysis.
In a comparison with full transformer-based approaches like ViT, the agent architecture was lightweight and saved 26% of GPU memory and had a corresponding enhancement of inference time, 18% greater than ViT models in Fig. 9. AADR-AI can avoid redundant imaging computation using shallow CNN agents for the first few convolution layers and coordinate their invocation of transformers. This enables the use of mobile fundus cameras and edge devices, which have small footprints, while enabling the image to be processed without affecting classification abilities.
Data variability analysis.
With an accuracy range of ± 2% in 5-fold cross-validation, AADR-AI continually showed a high level of performance across all subject demographics and levels of image quality. The model is resistant to blur, occlusion, or contrast shifts because it is able to learn domain-invariant features/contextual domains and be aware of human circumstances, using Fig. 10. The agentic ensemble model modifies its strategy dynamically as it considers the incoming visual input plus the clinical context, compared to single-stream models, where such agentic functionality leads to failure.
In summary, overall diagnostic accuracy is improved by the decision fusion layer for AADR-AI, with a 4.2% increase in F1-score from the predictions from a single model. AADR-AI improved its handling of ambiguous situations by intelligently fusing the outputs of the CNN and transformer agents based on confidence and domain alignment in Table 4. This leads to increased reliability in the clinical deployment of classification and stability in terms of elementary failures (e.g., borderline DR grades).
Scalability analysis.
Regardless of the platform, AADR-AI is adaptively positioned based on its agentic architecture with verification on many platforms ranging from ARM Cortex-A76-based platforms to NVIDIA RTX 3090 GPUs, demonstrating AADR-AI’s interoperability with vainglorious clinical servers and mobile diagnostic infrastructure in Fig. 11. The plug-and-play modular agent architecture allows for horizontally dynamic addition of agents and vertically compressed, lighter placement ecosystems for varying deployment locations such as hospitals, clinics, rural settings, etc.
With only a 1.4% reduction in overall accuracy, AADR-AI exhibited enormous performance, as found when moving from testing on datasets. Multi-distribution data, when training an AADR-AI, gives it the robust ability to learn domain-invariant characteristics through heterogeneous actions depending on its locally-based agentic architecture, shown in Table 5. While there are many deep models, once they cross anchor points to new data, their performance drops significantly. The generalizability of AADR-AI confirms that it is ready for clinical deployment during times of localities, without requiring any further fine-tuning.
With a time to inference of below 150 milliseconds and an accuracy of 96.7%, AADR-AI was validated as an accurate AI based on interpretable alignment. AADR-AI showed a 26% reduction in GPU consumption, remained consistent in performance across different data sets, and improved F1-scores through decision fusion. AADR-AI is a strong and adaptable framework for DI diagnostics since it scales effectively from desktop computers to mobile devices and handles different datasets well. The suggested AADR-AI system achieves an average inference latency of 48 ms per image with a peak memory consumption of 3.9 GB on an NVIDIA RTX 3090 GPU with batch size 1, which is faster than the 21 ms/1.8 GB for ResNet-50 and the 37 ms/3.1 GB for ViT. The latency remains close to 50 ms per image, despite being somewhat larger than single-model baselines. This indicates that the agentic fusion and explanation modules provide significantly improved diagnostic accuracy and interpretability with only a minor overhead.
Performance analysis based on MAE.
Figure 12 shows the performance of the proposed AADR-AI model under diverse operational settings using Mean Absolute Error (MAE) across different sample sizes. Each condition shows a drop in MAE as the number of samples grows, showing increased learning stability and generalization. The model’s baseline efficiency is shown by its continuous error drop under typical conditions. MAE increases slightly under poor illumination, noisy data, and low resolution, confirming the model’s data degradation resistance. Data augmentation performs best, with the lowest MAE across all sample sizes. Augmentation improves adaptation by increasing feature diversity and reducing overfitting. The progressive convergence trend across all conditions shows that the AADR-AI architecture is resilient and accurate even with poor inputs. The experimental results show that the proposed model performs optimally in ideal conditions and accurately and reliably in imperfect environments, demonstrating its superior generalization and robustness in adaptive AI-driven performance analysis.
Table 6 compares the proposed AADR-AI system to single-model baselines, which may not be fair as fusion approaches outperform individual models. Comparisons with other fusion-based approaches in the literature would help balance the evaluation. Pixel-level, channel-level, and decision-level fusion have enhanced fundus image processing in such models by integrating several modalities or classifiers at input or feature levels. These standards would contextualize your multi-agent fusion strategy’s innovation and retinal disease diagnosis efficacy.
The suggested Agentic AADR-AI framework consistently outperformed all baseline models across a number of evaluation measures, as shown by the overall performance comparison shown in Table 6. Despite becoming computationally economical, the SVM baseline has the lowest accuracy and macro F1-score, which demonstrates its inadequate capacity to identify intricate patterns of retinal diseases. ResNet-50 and VGG-16, two deep CNN-based baselines, performed significantly better, obtaining macro F1-scores of 0.874 and 0.859, respectively. However, both demonstrated moderate heterogeneity among folds, especially for underrepresented classes. The improved ability of the ViT model for global structural analysis was demonstrated by its greater global feature representation and macro F1-score of 0.889 with a ROC-AUC of 0.938. On the other hand, the suggested Agentic AADR-AI maintained computational specifications comparable to the ViT baseline while achieving the greatest accuracy (0.943), macro F1-score (0.927), and ROC-AUC (0.963). In recall, where the ensemble decreased false negatives for moderate-to-severe diabetic retinopathy, a crucial component of clinical evaluation, these improvements are especially significant. These enhancements demonstrate that compared to single-model baselines, the agentic fusion of CNN, Transformer, and Hybrid CNN-ViT agents, coordinated through proactive and reactive behaviours, provides a more balanced and trustworthy diagnostic system.
A simple non-agentic fusion baseline was implemented by averaging the output probabilities of the CNN and ViT models using equal weights. This baseline serves as a traditional ensemble method without agentic coordination, adaptive weighting, or negotiation mechanisms.
There are many fewer moderate and severe DR cases in the EyePACS dataset, which is extremely unbalanced. Class-weighted cross-entropy loss, class-aware oversampling with augmentation for minority classes, and macro-averaged metrics were employed to address this. Instead of prioritizing the majority classes, these actions made sure the model continued to perform well across all DR severity levels.
It conducted an ablation analysis comparing multiple variants to show the efficacy of the suggested fusion and coordination: (i) equal-weight fusion, (ii) static globally learned weights, (iii) adaptive confidence-only weighting, (iv) confidence + reliability weighting, (v) confidence + reliability + entropy penalty, (vi) adaptive fusion with reactive reprocessing, and (vii) the full agentic system with negotiation and explanation. A different performance hierarchy is demonstrated by the results: entropy penalties lower misclassifications in uncertain situations, adding reliability further enhances per-class F1 on rare classes, and adaptive confidence-only fusion performs better than static weighting. While the entire agentic system performs best overall (accuracy = 96.7%, macro F1 = 0.93, ROC-AUC = 0.96), reprocessing lowers false negatives in moderate-to-severe DR. Each improvement produces statistically significant benefits (p < 0.05), as confirmed by Wilcoxon signed-rank tests and paired t-tests. McNemar’s tests on test predictions show notable decreases in error overlap when compared to ViT and ResNet-50. These results confirm that the suggested coordination and fusion approach is, in addition, new yet empirically better than conventional ensemble techniques.
The current research exploits publicly accessed and totally anonymized datasets of diabetic retinopathy (EyePACS and DIARETDB1). The dataset providers de-identified all patient images, and no personal or sensitive information is provided. The data sets were initially gathered with proper clinical and ethical guidelines, with the participating institutions having informed consent. Since this study is a secondary analysis using anonymized public data, the extra Institutional Review Board (IRB) approval was not necessary, as it is considered to be ethically correct at our university. All experiments were performed following the dataset license agreement and in regard to patient privacy. This research follows the principles of responsible AI development, whereby it has transparency, fairness, and non-maleficence in the medical image analysis.
In this research, the proposed system introduced the AADR-AI, which integrates AI with deep learning models to encourage enhanced detection accuracy, interpretability, and flexibility in diabetic retinopathy analysis. Here, a combined approach that enables classification performance (as high as 96.7%), real-time adaptability, and computational efficiency through a multi-agent ensemble that uses a decision fusion process and combines both encoder-decoder transformer-based networks with convolutional networks. Unlike traditional systems that are statically defined specifically, the use of agentic AI allows patient-condition-specific adjustments to the quality of original images for the adaptive decision-making that the system proposes, which embodies the behavioral qualities associated with the principles of agentic AI: autonomy, proactivity, and responsiveness. The effectiveness, scalability, and generalizability of their hybrid framework were validated with extensive evaluation on benchmark datasets, and AADR-AI reduces the dependency on extremely robust computing infrastructures to accommodate deployment in resource-constrained environments.
In the future, the functionality of the framework will include retinal analysis of additional diseases (e.g., glaucoma and AMD) using federated learning to maintain patient privacy when training. Clinical decision-makers will find greater model transparency through future work on explainable AI approaches. To better connect laboratory studies to true patient care, this will be extended by conducting a clinical trial using AADR-AI to evaluate its performance across many demographics and clinical settings.
The data used in this research are available in the following links: [https://www.kaggle.com/competitions/diabetic-retinopathy-detection/data](https:/www.kaggle.com/competitions/diabetic-retinopathy-detection/data).
Rajarajeshwari, G. & Selvi, G. C. Application of artificial intelligence for classification, segmentation, early detection, early diagnosis, and grading of diabetic retinopathy from fundus retinal images: a comprehensive review. IEEE Access vol no 12, pp-172499-172536 (2024).
Yao, J. et al. Novel artificial intelligence algorithms for diabetic retinopathy and diabetic macular edema. Eye Vis. 11 (1), 23 (2024).
Article Google Scholar
Alsadoun, L. et al. Artificial intelligence (AI)-Enhanced detection of diabetic retinopathy from fundus images: the current landscape and future directions. Cureus, 16(8),1-8 (2024).
Ahmed, H. B. & Alzuoubi, M. Designing accessible virtual reality interfaces using reinforcement learning for users with motor and sensory impairments. PatternIQ Min. 2 (1), 1–12. https://doi.org/10.70023/sahd/250201 (2025).
Article Google Scholar
Ikram, A. & Imran, A. ResViT FusionNet model: an explainable AI-driven approach for automated grading of diabetic retinopathy in retinal images. Comput. Biol. Med. 186, 109656 (2025).
Article PubMed Google Scholar
Peters, I. & Kamrul, G. Applications AI-driven solar energy management system for smart grids using predictive analytics and adaptive control. J. Quantum Nano-Green Environ. Syst. 1 (1), 14–24. https://doi.org/10.70023/qnges.251102 (2025).
Article Google Scholar
Grzybowski, A. et al. Artificial intelligence for diabetic retinopathy screening: a review. Eye 34 (3), 451–460 (2020).
Article PubMed Google Scholar
Chen, Q., Keenan, T. D., Agron, E., Allot, A., Guan, E., Duong, B., … Lu, Z. (2024).Towards Accountable AI-Assisted Eye Disease Diagnosis: Workflow Design, External Validation,and Continual Learning. arXiv preprint arXiv:2409.15087.
Vujosevic, S. et al. Screening for diabetic retinopathy: new perspectives and challenges. Lancet Diabetes Endocrinol. 8 (4), 337–347 (2020).
Article PubMed Google Scholar
Hogg, J. A mixed methods evaluation of artificial intelligence-enabled macula services (Doctoral dissertation, Newcastle University, 2024).
Alyoubi, W. L., Shalash, W. M. & Abulkhair, M. F. Diabetic retinopathy detection through deep learning techniques: A review. Inf. Med. Unlocked. 20, 100377 (2020).
Article Google Scholar
Das, D., Biswas, S. K. & Bandyopadhyay, S. A critical review on diagnosis of diabetic retinopathy using machine learning and deep learning. Multimedia Tools Appl. 81 (18), 25613–25655 (2022).
Article Google Scholar
Oganov, A. C., Seddon, I., Jabbehdari, S., Uner, O. E., Fonoudi, H., Yazdanpanah,G., … Arevalo, J. F. (2023). Artificial intelligence in retinal image analysis: Development,advances, and challenges. Survey of ophthalmology, 68(5), 905–919.
Kaur, J., Mittal, D. & Singla, R. Diabetic retinopathy diagnosis through computer-aided fundus image analysis: a review. Arch. Comput. Methods Eng. 29 (3), 1673–1711 (2022).
Article Google Scholar
Stranjak, A. & Campagna, S. Decentralised agent-based medical image reconstruction. Procedia Comput. Sci. 207, 2106–2115 (2022).
Article Google Scholar
Sivakumar, N., Mura, C. & Peirce, S. M. Innovations in integrating machine learning and agent-based modeling of biomedical systems. Front. Syst. Biology. 2, 959665 (2022).
Article Google Scholar
Khan, A. et al. A survey of the vision Transformers and their CNN-transformer based variants. Artif. Intell. Rev. 56 (Suppl 3), 2917–2970 (2023).
Article Google Scholar
Long, H. Hybrid design of CNN and vision transformer: A review. In Proceedings of the 2024 7th International Conference on Computer Information Science and Artificial Intelligence (121–127). (2024).
Annamalai, M. et al. Revolutionizing Medical Diagnostics with Transparent AI-Driven Decision Support Systems. In 2024 4th International Conference on Mobile Networks and Wireless Communications (ICMNWC) (1–7). IEEE. (2024).
Soumya, M. A. A. K. AI-Driven Insights: Revolutionizing Health Diagnostics and Treatment (Budha publication, 2024).
Yamani, I. U. & Basari, B. Leveraging convolutional neural networks for automated detection and grading of diabetic retinopathy from fundus images. Jurnal Teknik Elektro. 15 (2), 68–73 (2023).
Google Scholar
Niu, Y., Gu, L., Zhao, Y. & Lu, F. Explainable diabetic retinopathy detection and retinal image generation. IEEE J. Biomedical Health Inf. 26 (1), 44–55 (2021).
Article Google Scholar
Khan, T. M., Soomro, T. A. & Razzak, I. The Role of AI in Early Detection of Life-Threatening Diseases: A Retinal Imaging Perspective. arXiv preprint arXiv:2505.20810. (2025).
Mohammad, N. K., Rajab, I. A., Al-Taie, R. H., Ismail, M. & Mohammad, N. Machine learning and vision: advancing the frontiers of diabetic cataract management. Cureus, 16(8),1-11 (2024).
Jacoba, C. M. P., Doan, D., Salongcay, R. P., Aquino, L. A. C., Silva, J. P. Y., Salva,C. M. G., … Silva, P. S. (2023). Performance of automated machine learning for diabetic retinopathy image classification from multi-field handheld retinal images. Ophthalmology Retina, 7(8), 703–712.
Rêgo, S., Monteiro-Soares, M., Dutra-Medeiros, M., Dias, C., Nunes, F. & C., & Exploring the feasibility of opportunistic diabetic retinopathy screening with handheld fundus cameras in primary care: insights from Doctors and nurses. Diabetology 5 (6), 566–583 (2024).
Article Google Scholar
Chawla, R., Karkhanis, P., Shah, M., Das, A., Sharma, R., Almaula, D., … Tandon, R.(2025). Artificial intelligence for advancing eye care in resource-poor settings:Assessing the predictive accuracy of an AI-model for diabetic retinopathy screening in India. Global Epidemiology, 100209, (2025).
Son, J., Shin, J. Y., Kong, S. T., Park, J., Kwon, G., Kim, H. D., … Park, S. J. (2023).An interpretable and interactive deep learning algorithm for a clinically applicable retinal fundus diagnosis system by modelling finding-disease relationship. Scientific Reports, 13(1), 5934, (2023).
https://www.kaggle.com/competitions/diabetic-retinopathy-detection/data
Download references
The authors received no specific funding for this study.
Department of Computer Science and Engineering, K.Ramakrishnan college of Technology, Tiruchirappalli, TamilNadu, India
R. Sathya
Department of Computer Applications, Anna university, BIT campus, Tiruchirappalli, TamilNadu, India
A. Valaramathi
PubMed Google Scholar
PubMed Google Scholar
The authors confirm their contributions to the paper as follows: Conceptualization, Methodology: RS; Formal analysis and investigation: RS & AV; Writing – original draft preparation: RS; Writing – review and editing: RS & AV; Supervision: AVAll authors reviewed the results and approved the final version of the manuscript.
Correspondence to R. Sathya.
The authors declare no competing interests.
The authors declare that there are no conflicts of interest regarding the publication of this paper. The authors have no financial or personal relationships that could influence the research outcomes or the interpretation of the data presented in this manuscript.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
Reprints and permissions
Sathya, R., Valaramathi, A. Detection and diagnosis of diabetic retinopathy in retinal fundus images using agentic AI approaches. Sci Rep 16, 3780 (2026). https://doi.org/10.1038/s41598-025-34016-0
Download citation
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-34016-0
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
Advertisement
Scientific Reports (Sci Rep)
ISSN 2045-2322 (online)
© 2026 Springer Nature Limited
Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Leave a Reply