Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.
Advertisement
Nature Medicine (2026)
17k
83
Metrics details
The rising burden of endocrine and metabolic diseases demands scalable and accessible screening tools. Here we developed Reti-Pioneer, a multitask retinal imaging framework that integrates quality-aware modules with pre-trained foundation models for efficient, multidisease detection. In general, the framework was developed using 107,730 color fundus photographs from both community-based and hospital-based cohorts and achieved area under the receiver operating characteristic curve values on internal test data of 0.833 (95% confidence interval 0.810–0.856) for type 2 diabetes mellitus, 0.832 (0.799–0.866) for gout, 0.787 (0.742–0.833) for osteoporosis, 0.740 (0.726–0.755) for hypertension, 0.736 (0.721–0.751) for hyperlipidemia and 0.699 (0.667–0.730) for thyroid disease. The framework generalized well to six external cohorts from both resource-limited and high-resource settings, and showed biological interpretability via plasma proteomic correlations. In a primary care silent trial, it completed screening in 30.6 ± 6.0 s per case, notably faster than standard laboratory workflows. A subsequent clinical pilot for type 2 diabetes mellitus yielded an area under the receiver operating characteristic curve of 0.776 (0.710–0.842) and negative predictive value of 0.966 (0.946–0.983), surpassing the Finnish Diabetes Risk Score, with high acceptance from clinicians and patients. Overall, Reti-Pioneer could provide a translatable, low-cost pathway from oculomics to actionable clinical screening.
The global rise in endocrine and metabolic diseases amid aging populations poses mounting challenges for healthcare systems, underscoring the need for scalable early-detection strategies1,2. Current screening paradigms rely heavily on blood-based biomarkers, the collection of which, even via minimal methods, involves logistical hurdles, patient discomfort and considerable cost, thereby collectively limiting feasibility for frequent longitudinal monitoring and impede deployment in large-scale population screening programs3. Noninvasive and low-cost approaches, particularly those based on retinal imaging, offer a promising solution to address gaps in common disease screening, potentially mitigating healthcare inequities, especially in remote regions.
Oculomics, the use of artificial intelligence (AI) and ocular imaging for monitoring systemic health, has shown promise in characterizing aging and identifying preclinical cardiovascular, renal and neurodegenerative conditions4,5. For endocrine disorders, previous studies have demonstrated that retinal vascular and neural changes may precede clinical manifestations of diabetes, hypertension and thyroid dysfunction6,7,8,9. For instance, a large-scale Chinese cohort study reported the potential of deep learning combined with color fundus photographs (CFPs) to predict type 2 diabetes mellitus (T2DM)10. However, current oculomics research faces considerable bottlenecks: it relies on high-quality imaging, is largely confined to single-disease frameworks and depends on models trained from scratch. These limitations hinder the development of systems capable of detecting the complex multimorbidity patterns prevalent in remote and resource-limited populations4.
Recent advances in medical foundation models offer solutions to these limitations. Pre-trained models reduce data dependency and computational resources while improving generalizability across heterogeneous populations11,12,13. However, the application of such models to high-burden endocrine and metabolic diseases, despite their severe clinical consequences and unmet needs in risk stratification, remains underexplored. Key gaps persist, including the underrepresentation of multi-ethnic populations and the lack of sufficient validation for multidisease risk stratification. These challenges highlight the urgent need for comprehensive risk assessments of endocrine and metabolic diseases and the development of a unified framework capable of simultaneously identifying and characterizing multiple conditions.
In this study, we aim to introduce Reti-Pioneer, a multitask framework, and conduct a biology-linked, stepwise, multi-site clinical validation study to deliver unified screening for a spectrum of endocrine and metabolic diseases across diverse resource settings. The schematic overview of the Reti-Pioneer workflow is shown in Fig. 1. First, we curated a multimodal dataset comprising CFPs of varying image quality, paired with structured clinical metadata. A total of 107,730 CFPs (53,865 individuals), deliberately sourced from the community-based UK Biobank (UKB) and Chinese tertiary hospital registries, were used to construct a heterogeneous dataset spanning diverse imaging conditions and clinical contexts. Second, we developed a multimodal learning model that integrates CFPs of varying image quality with structured clinical metadata. This architecture ensembles large-scale vision foundation models, including Swin Transformer, Vision Mamba and RETFound, to leverage their complementary capabilities. The model subsequently underwent comprehensive external testing on multicenter datasets across different geographical regions. Third, we systematically assessed the framework’s generalizability in disease prediction and its biological interpretability by linking retinal latent features to proteomic and genetic markers. We further evaluated its capacity to enhance ophthalmologists’ diagnostic performance within a human–AI collaborative reader study. Finally, we deployed the model in a prospective silent trial, comparing its efficiency and time cost against traditional screening methods in routine primary care. After its successful validation, a clinical pilot study was conducted to evaluate the framework’s feasibility and practical impact in clinical workflows. By demonstrating a scalable and adaptable AI framework, our findings bridge a key gap in current screening paradigms and provide pioneering insights for the practical deployment of oculomics-driven systemic health assessment.
a, Data curation: CFPs with linked clinical metadata were acquired from population-based and hospital-based cohorts. b, Model architecture: a quality-aware module was integrated with three frozen pre-trained vision foundation models (Swin Transformer, Vision Mamba, RETFound) for the simultaneous screening of six endocrine and metabolic diseases. c, Validation and interpretation: external multi-ethnic validation assessed generalizability. Biological interpretability was examined by associating retinal latent features with plasma proteomic and PRS data. d, Prospective clinical evaluation: a silent trial measured ecological validity and workflow efficiency in primary care; a subsequent pilot study evaluated diagnostic accuracy, clinical utility and acceptance among clinicians and patients. Figure created in BioRender; Yu, H. https://biorender.com/q9qie85 (2026).
The demographic and clinical characteristics of the participants in the training dataset are summarized in Extended Data Table 1. The framework was fine-tuned on an internal dataset including 107,730 CFPs from 53,865 participants, and further validated on independent datasets across resource-limited and high-resource regions in China, and a muti-ethnic cohort in Singapore, collectively including 23,232 CFPs from 11,616 participants. At the image level, image quality was evaluated using the quality-aware module, which leverages image quality metrics to guide model decisions during processing; three frozen pre-trained models were combined with three learnable prediction heads. At the individual level, these three trained heads were integrated via a weighted soft voting ensemble (Extended Data Fig. 1).
Overall, Reti-Pioneer achieved the following AUROC values in the internal test dataset: 0.833 (95% confidence interval (CI) 0.810–0.856) for T2DM, 0.832 (0.799–0.866) for gout, 0.787 (0.742–0.833) for osteoporosis, 0.740 (0.726–0.755) for hypertension, 0.736 (0.721–0.751) for hyperlipidemia and 0.699 (0.667–0.730) for thyroid disease (Supplementary Table 1). Sex-stratified analyses yielded consistent performance across male and female subgroups (Supplementary Table 2). We evaluated Reti-Pioneer on two independent external test sets representing distinct healthcare resource environments in China. In regions with limited resources (pooled data from Tibet, Xinjiang and Guangxi), the model demonstrated strong discriminative performance for six chronic diseases, with AUROC values (95% CI) as follows: T2DM, 0.821 (0.792–0.851); hypertension, 0.805 (0.776–0.833); hyperlipidemia, 0.628 (0.560–0.676); gout, 0.731 (0.662–0.799); osteoporosis, 0.904 (0.876–0.932); and thyroid disease, 0.821 (0.785–0.858) (Fig. 2). In a combined dataset encompassing both resource-limited and high-resource settings, Reti-Pioneer maintained robust performance, which was notably higher for T2DM, hyperlipidemia and osteoporosis in this mixed-resource setting compared to a resource-limited setting alone. A breakdown of results per external population is shown in Extended Data Tables 2 and 3. To assess generalizability across ethnicities, we further validated the model on the multi-ethnic Singapore Epidemiology of Eye Diseases study (SEED). The overall AUROCs were 0.686 (0.680–0.700) for T2DM, 0.749 (0.740–0.760) for hypertension and 0.615 (0.600–0.620) for hyperlipidemia. Stratified analysis according to ethnicity revealed consistent performance across groups. For hypertension, the corresponding AUROCs were 0.731, 0.746 and 0.769. For T2DM, the AUROCs were 0.646 for Indian, 0.674 for Chinese and 0.692 for Malay participants. These results indicate that Reti-Pioneer performs effectively across diverse ethnic populations, with particularly strong performance for hypertension detection in all groups. In external validation, calibration plots and the Brier score indicated that Reti-Pioneer has moderate-to-good calibration and overall performance (Extended Data Fig. 2a). Decision curve analysis of Reti-Pioneer in the external test datasets demonstrated that the model provided net clinical benefit across these diseases. This finding was consistent across multiple external validation settings, including both resource-limited and high-resource settings, as well as in SEED cohorts where applicable (Extended Data Fig. 2b).
AUROC curves for six endocrine and metabolic diseases tested on independent external datasets. Datasets from China were stratified into three healthcare resource settings: resource-limited settings (pooled data from Tibet, Xinjiang and Guangxi), high-resource settings A (data from large tertiary hospitals in Guangdong Province) and high-resource settings B (data from physical examination centers in Guangdong Province). The dataset from Singapore was derived from the SEED cohort.
Both the integration of a quality-aware module and the strategic ensembling of multiple pre-trained foundation models were crucial to the performance of Reti-Pioneer. Ablation studies demonstrated that incorporating the quality-aware module significantly improved the AUROC for T2DM compared to using only good-quality images (P = 0.011) or using all-quality images without the module (P = 0.021). The module also yielded significant accuracy improvements for hypertension (P < 0.001) and hyperlipidemia (P = 0.037) compared to the good-quality image only (Supplementary Table 1). Furthermore, the multimodal combination of retinal images with clinical metadata yielded superior predictive accuracy compared to any single-modality input (Supplementary Table 1).
Given that retinal manifestations of endocrine and metabolic diseases may precede clinical diagnosis, we further evaluated the Reti-Pioneer framework on a fully withheld longitudinal subset from the UKB (January 2006–December 2021) to assess its performance in predicting six endocrine diseases over 5-year and 10-year intervals. The analysis included 15,704 participants (31,748 images; 57.4% female; mean age 56.6 ± 8.0 years), with the following self-reported ethnic distribution: 83.7% White, 6.7% Black, 4.3% Asian and 5.4% other. Participants with pre-existing disease at baseline were excluded from the analysis.
As anticipated, predicting disease onset over longer time intervals proved more challenging than cross-sectional screening (Extended Data Table 4). Reti-Pioneer achieved an AUROC of 0.755 (95% CI 0.702–0.809) for 5-year incident T2DM, which decreased to 0.736 (0.694–0.779) for the 10-year prediction. Similarly, for hypertension, the model attained AUROCs of 0.755 (0.704–0.805) and 0.719 (0.679–0.759) over the 5-year and 10-year intervals, respectively. Corresponding values for hyperlipidemia were 0.748 (0.695–0.801) at 5 years and 0.735 (0.694–0.775) at 10 years. Decision curves for the Reti-Pioneer framework, derived from the prospective validation test, are presented in Extended Data Fig. 3. In a sensitivity analysis excluding individuals diagnosed within the first year after fundus image acquisition, the model maintained comparable performance for both the 5-year and 10-year prediction intervals (Supplementary Table 3).
To interpret the diagnostic decisions of the Reti-Pioneer framework, we generated pixel-level saliency maps that identify fundus regions critical for model predictions (Supplementary Fig. 1). Furthermore, we elucidated the biological explanation bridging retinal latent features and endocrine pathophysiology. We investigated the biological basis linking retinal features to endocrine and metabolic pathophysiology by analyzing 256-dimensional latent embeddings extracted from the model’s penultimate layer. Using orthogonal projections to latent structure-discriminant analysis (OPLS-DA), we derived the predictive component from these embeddings for each disease, which showed significant associations with diseases (all P < 0.001) except osteoporosis, after adjustment for age and sex (Supplementary Table 4).
We then integrated 2,920 plasma proteomic profiles and applied elastic net regression to identify the top five disease-specific protein signatures for each of the six disorders (Supplementary Fig. 2). Several key plasma proteins, including SCARA5 (T2DM), PLA2G7, PTPRF and APOM (hyperlipidemia), showed significant associations with the OPLS-DA-derived predictive components of retinal latent features (Extended Data Fig. 4), even after adjustment for age, sex, ethnicity, body mass index, image quality and assessment center, with false discovery rate correction for multiple testing (Supplementary Table 5). In contrast, polygenic risk scores (PRS) for relevant traits showed limited association: only the PRS for low-density lipoprotein cholesterol was significantly associated with the predictive components of retinal latent features, and this was observed only in the unadjusted model (Supplementary Table 5). Collectively, these findings provide biological plausibility for Reti-Pioneer by demonstrating that the retinal features identified by the model are linked to disease-relevant protein signatures and, to a lesser extent, genetic risk factors.
To benchmark the diagnostic performance of Reti-Pioneer against clinical experts, we conducted a reader study involving retinal specialists from independent institutions, with 3–10 years of experience. Each specialist interpreted bilateral CFPs from 200 distinct patients per disease (1,200 total cases), drawn from an internal held-out test set, to screen for the six target diseases. In a subsequent session, conducted 1–2 weeks later to mitigate recall bias, the same ophthalmologists reevaluated the images with access to Reti-Pioneer’s diagnostic predictions and explanatory heatmaps. When assisted by the Reti-Pioneer copilot, retinal specialists achieved an average accuracy of 88.0% for T2DM, 79.0% for gout and 70.0% for thyroid disease, compared to 71.0%, 51.0% and 63.0% without copilot when applied to the same images (Supplementary Table 6). This trend of improved accuracy with the AI copilot was consistent across other disease screening tasks. As expected, the performance of human experts varied with their level of clinical experience; nevertheless, the Reti-Pioneer copilot surpassed the diagnostic accuracy of any individual specialist.
To validate the deployment feasibility of the Reti-Pioneer framework, we developed a web platform (http://retipioneer.cn) and conducted a prospective silent trial in a primary care setting. The trial evaluated the framework’s performance in screening for six systemic diseases, benchmarking it against comprehensive physical and blood examination (Fig. 3a). In the clinical workflow, bilateral fundus images from 1,017 participants were captured at the point of care and transmitted in real time to a centralized graphics processing unit (GPU) computing infrastructure for immediate Reti-Pioneer analysis, without disrupting routine physical examination procedures. The baseline characteristics of the 1,017 participants are summarized in Supplementary Table 7.
a, Clinical workflow integrating Reti-Pioneer screening alongside routine physical and laboratory examinations. b, Example of an AI-generated screening report. c, Comparison of median diagnosis time between Reti-Pioneer, laboratory tests and the FINDRISC questionnaire (n = 1,017). The bars represent the mean diagnosis time; the error bars represent the s.d. Statistical significance was determined using a paired two-sided t-test. ***P < 0.001 (paired t-test). d, Computational efficiency of Reti-Pioneer compared with models trained from scratch, shown in terms of the number of training parameters and total training time. Panel a created in BioRender; Yu, H. https://BioRender.com/7uxnsxh (2026).
The silent trial confirmed the high robustness of the integrated system, with an image acquisition success rate of 98.7% and an AI inference success rate of 100%. Operationally, the Reti-Pioneer workflow demonstrated superior throughput efficiency. The median total time from image capture to report generation was significantly shorter than that required for laboratory or physical examination reports (30.6 ± 6.0 s versus approximately 7.97–8.11 h; P < 0.001) and the Finnish Diabetes Risk Score (FINDRISC) for T2DM (30.6 ± 6.0 s versus 126.6 ± 48.1 s; P < 0.001, Fig. 3b,c). Furthermore, Reti-Pioneer exhibited substantial computational advantages over models trained from scratch (Fig. 3d). The detailed performance metrics of Reti-Pioneer from the silent trial are provided in Extended Data Fig. 5.
To assess the integration of the Reti-Pioneer system into routine care for multidisease screening, we conducted a prospective real-world study at a community health service center and a physical examination center. A total of 606 participants were enrolled and evaluated by primary care providers (PCPs) with the Reti-Pioneer copilot. At the baseline visit, retinal fundus images and clinical metadata were collected for all participants, and standard biochemical measurements were performed. Patients undergoing routine physical examinations were offered the Reti-Pioneer AI test by their PCPs, with follow-up conducted at 2 weeks to confirm the biochemical results. The baseline characteristics of the cohort are summarized in Supplementary Table 7.
Retinal images and metadata from all participants were analyzed by Reti-Pioneer to screen for six endocrine and metabolic diseases. To minimize missed cases, screening was performed at a preset decision threshold corresponding to 85% sensitivity, as determined in the internal validation. Against a composite diagnostic standard incorporating laboratory tests, physical examination and self-reported history, Reti-Pioneer demonstrated superior discrimination for T2DM compared with the FINDRISC questionnaire (AUROC 0.776, 95% CI 0.710–0.842 versus 0.565, 0.459–0.670; P < 0.001). Reti-Pioneer also showed balanced sensitivity and specificity, with a high negative predictive value (NPV) (Table 1). Corresponding AUROC values for the remaining five diseases were 0.843 (0.804–0.881) for hypertension, 0.699 (0.651–0.748) for hyperlipidemia, 0.804 (0.752–0.855) for gout, 0.877 (0.845–0.910) for osteoporosis and 0.646 (0.592–0.700) for thyroid disease (Fig. 4a,b). The integration of the quality-aware module yielded improved or comparable AUROCs across all six diseases relative to models using only high-quality images or all images without quality guidance (Table 1).
a, Probability density plots of Reti-Pioneer-predicted risks for six diseases, stratified according to disease status based on a composite diagnostic standard. b, Corresponding AUROCs for the six diseases. c, Participant-reported satisfaction (5-point Likert scale) and willingness-to-pay responses after Reti-Pioneer screening. d, Clinician-rated potential deployment barriers for Reti-Pioneer multidisease screening, assessed using a structured questionnaire (n = 15 independent clinicians). The bars represent the mean score; the error bars represent the s.d.
Satisfaction surveys administered to both participants and clinicians demonstrated high acceptance and usability of the system. Over 80% of participants reported being ‘very satisfied’ across all evaluation dimensions, including ease of use, information clarity, and multidisease management efficiency. Regarding payment willingness, 52.0% of participants expressed willingness, while 4.9% were unwilling, primarily because of cost concerns (Fig. 4c). Clinicians rated key deployment factors, identifying system integration, patient acceptance and regulatory issues as the most important considerations, while also giving high scores (mean > 4.4/5) across all decision-support dimensions, including workflow compatibility and accuracy trust (Fig. 4d).
The development of a low-cost, clinically applicable AI framework for multimorbidity screening, using accessible, practical and cost-effective retinal imaging, holds potential to reduce disease burden, disability and mortality at the population level, particularly in remote and underserved regions. To address this pressing public health need, we developed Reti-Pioneer, a retinal imaging-based framework designed for the detection and prediction of six major endocrine and metabolic diseases: T2DM, hypertension, hyperlipidemia, gout, osteoporosis and thyroid disease. Using multi-country, multi-ethnic datasets from China, the UK and Singapore, we trained, validated and externally tested the framework. Its real-world applicability was further demonstrated through a prospective silent trial and a clinical pilot study conducted in routine care settings.
Our main findings demonstrate that Reti-Pioneer accurately detected six endocrine and metabolic diseases, with AUROCs of 0.833 for T2DM, 0.832 for gout, 0.787 for osteoporosis, 0.740 for hypertension, 0.736 for hyperlipidemia and 0.699 for thyroid disease. Performance remained largely consistent across external test datasets encompassing diverse ethnic and geographical populations from Singapore and diverse regions across China, encompassing both high-resource and resource-limited settings. In these cohorts, Reti-Pioneer also effectively identified multimorbidity, outperforming models based solely on clinical traits. Notably, the framework showed strong predictive capability for future disease onset, achieving AUROCs of 0.736 for 10-year incident T2DM, 0.719 for hypertension, 0.735 for hyperlipidemia, 0.753 for gout, 0.813 for osteoporosis and 0.662 for thyroid disease. While a performance gap was observed between certain cohorts, potentially attributable to ethnic variation, differences in image acquisition protocols (for example, dilated versus nondilated pupils), label noise from electronic health records or self-reporting, and systematic under-ascertainment of diseases in routine care14, the model maintained robust generalizability across all tested populations and settings.
In this study, we integrated a quality-aware module into the framework, given that fundus image quality has been associated with systemic health status15. Compared to models trained solely on good-quality images or on all-quality images without explicit quality integration, Reti-Pioneer achieved superior AUROC performance across the full spectrum of image quality, making it particularly suitable for deployment in primary care and resource-limited settings, where image quality may vary substantially16. Although clinicians often attempt to recapture suboptimal fundus images, it is frequently ineffective in patients with systemic comorbidities17. We propose that in real-world clinical environments, especially with automated imaging systems, low-quality images while diagnostically limited may still hold value for developing and refining foundation models. Embedding quality-aware assessment as an auxiliary module for multimodal data fusion could enhance model adaptability to realistic clinical conditions. Future studies should investigate whether specific imaging features, such as blurring patterns or artifacts, exhibit independent associations with systemic physiological decline, such as Parkinson’s disease or diabetes.
Reti-Pioneer further enhances clinical interpretability by generating saliency maps that highlight regions of the fundus most relevant to model predictions. These visualizations help to prove the model’s capacity to capture disease-specific heterogeneity across both diagnostic and prognostic tasks. For example, accumulating evidence indicates that diabetes mellitus primarily affects the retinal microvasculature and peripheral retinal structures through endothelial dysfunction, oxidative stress and thickening of capillary basement membranes18,19,20. When provided with heatmaps and predictive outputs from Reti-Pioneer, clinicians can make more informed decisions, effectively augmenting the clinical decision-making process. To further elucidate the underlying biology, we analyzed the relationship between retinal latent features, derived from dimensionality reduction of retinal imaging parameters, and disease-specific plasma proteomic profiles. These proteins showed substantial associations with the retinal features, offering insights into the molecular pathways linking protein expression to retinal pathophysiology. By integrating oculomics with plasma proteomics, we established a multimodal biomarker framework that allows our AI system to decode spatially resolved disease mechanisms while maintaining clinical interpretability. While several disease-associated proteins did not retain independent associations with retinal latent features after adjustment for confounding factors, the observed associations in unadjusted models nonetheless support the biological interpretability of the model’s predictions. These findings suggest that the retinal features identified by Reti-Pioneer serve as proxies for underlying protein signatures, even in the absence of strict independence after multivariable adjustment.
To date, health economic analyses increasingly suggest that AI-driven retinal screening can generate substantial cost savings by reducing dependence on specialist diagnostics, particularly in resource-limited settings21,22,23. However, prospective real-world deployment studies remain scarce. Our prospective evaluation demonstrated that Reti-Pioneer notably improved operational efficiency, with median time from image acquisition to report generation being substantially shorter than that of laboratory testing, or the FINDRISC questionnaire for T2DM. The Reti-Pioneer framework demonstrated notable computational efficiency compared to conventional models, and its deployment in a primary care setting confirmed clinical utility, with consistent real-world performance and maintained diagnostic accuracy. Although our prospective silent trial and clinical pilot provide crucial preliminary evidence of the framework’s operational stability, we acknowledge that broader clinical adoption will require overcoming several key barriers. First, regulatory approval pathways for AI-based medical devices remain complex and geographically heterogeneous24. Achieving certification will require large-scale, multicenter trials explicitly designed to satisfy regional standards for safety and efficacy25. Second, while the current study establishes predictive accuracy, generating robust evidence of patient-level impact remains essential. Future work should include long-term randomized controlled trials evaluating whether Reti-Pioneer deployment shortens time to diagnosis for endocrine and metabolic disorders, optimizes allocation of specialist resources and ultimately enhances long-term risk stratification and clinical outcomes.
While Reti-Pioneer’s generalist capabilities are well validated, there are also several limitations that may merit further considerations. First, while the training datasets incorporated demographic diversity, the performance disparity observed in the external multi-ethnic SEED cohort underscores that further improvements in dataset balance and representativeness are imperative to enhance model generalizability across all populations. Second, while Reti-Pioneer shows potential in identifying endocrine and metabolic diseases from CFPs, particularly among individuals who do not routinely undergo blood-based screening, its current diagnostic and predictive accuracy remains below the threshold required for broad clinical adoption. Third, the longitudinal assessment in this study was conducted as a binary classification task at fixed time horizons rather than as a formal time-to-event analysis. Future studies should use survival-analytical methods to more rigorously model risk. Additionally, predicting chronic disease incidence is inherently limited by inaccuracies in defining disease onset from electronic health records; potential misclassification between cases and controls may affect the validity of longitudinal predictions. Fourth, although Reti-Pioneer is designed for multitask analysis of common endocrine and metabolic diseases, its current scope does not include rare conditions or severe, life-threatening diseases such as coronary artery disease, stroke or mortality. However, the flexible architecture of Reti-Pioneer enables the incorporation of additional disease targets through the introduction of specialized encoders and sample-efficient fine-tuning of task-specific decoders. Fifth, while we used analytical methods such as elastic net regularization and OPLS-DA to mitigate confounding and focus on disease-specific signals, residual confounding cannot be fully ruled out. Our results primarily support the biological interpretability of the AI model by demonstrating that its predictions serve as proxies for protein signatures, rather than establishing independent associations. Finally, although our prospective pilot study indicated Reti-Pioneer’s potential to improve risk stratification, the effectiveness of Reti-Pioneer-guided prevention strategies as an alternative to conventional systemic screening needs validation in larger randomized trials.
In conclusion, Reti-Pioneer represents a scalable and clinically translatable AI framework that leverages retinal imaging and cost-effective pre-trained models to stratify patient risk for endocrine and metabolic diseases. This framework holds potential for integration into real-world clinical workflows, offering a pathway to more efficient resource allocation and enhanced, evidence-based decision-making in diverse healthcare settings.
For the model training, we used CFPs from the UKB and three hospital-based centers in China to maximize ethnic and demographic heterogeneity in the training dataset (Extended Data Fig. 6). The UKB is an ongoing prospective cohort study with extensive phenotypic data26. In brief, CFPs of the macula were obtained at enrollment using the Topcon 3D OCT-1000 Mark II system without pupil dilation at the baseline visit between 2009 and 2010. Participants were randomly split into two groups in a 2:1 ratio from six assessment centers for retrospective diagnostic modeling and predictive modeling while minimizing potential center-specific biases. Participants with CFPs and complete data on disease outcomes, and relevant covariates, were included in the analysis.
The hospital-based cohort was retrospectively established using data from three clinical centers: Guangdong Provincial People’s Hospital, the Third Affiliated Hospital of Sun Yat-sen University and Linyi People’s Hospital. CFPs of the macula were captured using a variety of standard fundus cameras, including the Topcon TRC-NW6, Canon CR6-45NM and Kowa Nonmyd α-DIII with pupil dilation.
For the external test, we used datasets from distinct healthcare settings in China and the multi-ethnic cohort SEED. The resource-limited dataset was retrospectively assembled from three clinical sites in China: The First Affiliated Hospital of Xinjiang Medical University, Linzhi People’s Hospital of Tibet and the People’s Hospital of Guangxi Zhuang Autonomous Region. This dataset included 1,373 participants (2,746 images) with dilated pupils. The high-resource dataset comprised retrospectively collected data from physical examination centers (2,003 participants and 4,006 images) and large tertiary hospitals (587 participants and 1,174 images), captured predominantly under nondilated conditions. Additionally, we evaluated the model on the SEED study, a population-based cohort encompassing Chinese, Malay and Indian adults in Singapore, as described previously27. A total of 15,306 CFPs from 7,653 Singaporean adults (2,800 Chinese, 2,385 Malay and 2,468 Indian) were included in the external test set.
This study was conducted with approval from the Institutional Review Board of Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences (no. KY-N-2022-134). For the UKB data component, ethical approval was obtained through the NHS National Research Ethics Service (ref. no. 11:/NW/0382; project ID: 86091). All participants provided written informed consent before study participation. To ensure confidentiality, all data were systematically de-identified by removing all personal identifiers before analysis.
To define osteoporosis, gout and thyroid disease at baseline, we used self-reported information and inpatient records using International Classification of Diseases, 10th Revision (ICD-10) codes, down to the three-character category in the ICD hierarchy (Supplementary Table 8).
For the diagnosis of T2DM, hypertension and hyperlipidemia at baseline, we considered blood markers and medication use in addition to self-reported information and ICD-10 codes. T2DM was diagnosed in patients who had a physician-diagnosed case of diabetes mellitus, were using antihyperglycemic medications or insulin, or had a glycated hemoglobin level of ≥48 mmol mol−1 (ref. 28). Hypertension was defined by the use of antihypertensive medications and an average systolic blood pressure of at least 130 mmHg or an average diastolic blood pressure of at least 80 mmHg (ref. 29). Hyperlipidemia was defined by the use of medications for hyperlipidemia or statins, or a total blood cholesterol level of ≥6.22 mmol l−1 (ref. 30).
For the prediction tasks, incident cases of individual diseases were identified using inpatient hospital records and mortality registers. Follow-up visits began on the date of attendance at the assessment center and continued until the earliest recorded date of diagnosis, the date of mortality, or the last available date provided by the hospital or general practitioner, whichever occurred first. Individuals with prevalent disease at baseline were excluded. A sensitivity analysis was performed excluding individuals who developed incident disease within the first year of follow-up.
Overall architecture of the proposed Reti-Pioneer framework is presented in Extended Data Fig. 1. For feature extraction, Reti-Pioneer uses three large-scale pre-trained models, that is, Swin Transformer, Vision Mamba and RETFound, which remain frozen during training11,12,31. To enable multimodal integration, two bilinear modules were used for features fusion, integrating image quality and clinical characteristics, including age, sex, ethnicity and weight32,33.
Specifically, the Reti-Pioneer framework introduces an integration of a quality-aware module into retinal image analysis, enabling the retention and informed use of low-quality CFPs, such as those affected by cataract-induced obscuration that may nonetheless reflect systemic endocrine status34. Rather than excluding these images, the framework leverages image quality metrics to guide model decisions during the early processing stages35. The quality-aware and clinical characteristic fusion modules use a bilinear operation followed by scaled exponential linear units (SELUs) and a linear layer. The bilinear module combines quality metrics (probability scores for good, usable, bad classifications) with F-dimensional deep features, generating fused features of identical dimension (F). This fusion was followed by a linear layer for disease probabilities. The SELU activation function was applied between intermediate layers to enhance training stability and maintain self-normalizing properties36. Both SELU activation and linear transformation maintain this dimensionality, ensuring feature representation consistency.
Fine-tuning was applied to a diverse set of downstream tasks, including screening and prediction tasks. The generalizability of the Swin Transformer, Vision Mamba and RETFound models was systematically evaluated through independent external testing using bilateral CFPs combined with clinical characteristics. The unit of analysis for all model development and evaluation was the individual participant. For each participant, fundus images from both eyes were used as input, three trained heads were integrated via a weighted soft voting ensemble and fused by the model architecture to generate a single, unified prediction.
To quantify the contribution of its core components, we conducted a series of ablation studies on the internal validation set. The full Reti-Pioneer framework was compared against (1) an ablated version without the quality-aware fusion module, trained on all images, (2) the same ablated version trained exclusively on high-quality images and (3) unimodal baselines using either fundus images or clinical metadata alone. These comparisons were designed to isolate the performance gain attributable to the model’s full multimodal architecture.
To understand the retinal features associated with endocrine and metabolic disorders, we used two complementary approaches. Saliency mapping was performed to identify and visualize spatially resolved regions of interest within CFPs that contributed most to the model’s diagnostic and predictive decisions, after Gaussian filtering. We used the integrated gradient algorithm to produce visual explanations of model behavior, attributing prediction relevance to individual pixel37,38. The saliency maps highlight the relative contribution of each pixel to the final predictions, thereby enhancing model interpretability.
To investigate the biological basis of disease predictions, we leveraged plasma proteomic profiling data from UKB participants. Proteins were quantified using the Olink Explore 3072 platform, which integrates cardiometabolic, inflammation, neurology and oncology panels, capturing 2,923 unique proteins. Expression levels are reported as Normalized Protein eXpression on a log2 scale. After quality control, excluding proteins with >30% missing data (GLIPR1, NPM1, PCOLCE) and participants with >50% missing data, 2,920 proteins were retained for analysis. Further details on sample processing and quality control are described elsewhere39. To identify disease-specific protein signatures associated with metabolic and endocrine diseases, we used a two-step analytical approach. First, elastic net regularization analysis was used for retinal latent feature selection, enabling dimensionality reduction and identification of key predictors from the Reti-Pioneer framework. Second, we prioritized candidate proteins based on their coefficient to refine the selection of biologically relevant biomarkers. The OPLS-DA model was performed using the variational autoencoder-derived retinal latent features to assess the separation between the control and case groups. The predictive component, capturing the systematic variance in the data that is directly correlated with the inter-group differences. We used logistic regression to test the impact of predictive components on the prevalence of the corresponding diseases. Pearson correlation analysis was used to quantify the linear relationships between individual protein levels (continuous) and the corresponding predictive components. The association between retinal latent features derived from OPLS-DA and the top-ranked plasma proteins across the six endocrine and metabolic diseases was further evaluated using logistic regression. Both unadjusted models and models adjusted for age, sex, ethnicity, body mass index, image quality and assessment center are presented.
Similarly, we assessed the association between OPLS-DA-derived retinal latent features and disease-relevant PRS using logistic regression. Standard PRS were obtained from the UKB, including scores for T2DM, HbA1c, hypertension, high-density and low-density lipoprotein cholesterol, total cholesterol, triglycerides, osteoporosis and estimated bone mineral density T-score. Similarly, both unadjusted models and models adjusted for age, sex, ethnicity, body mass index, image quality, assessment center and top four ancestry principal components are presented. For multiple comparisons between retinal latent features and top-ranked plasma proteins or PRS, the false discovery rate correction was applied.
The prospective silent trial was conducted between September 2025 and October 2025 to evaluate the technical robustness and seamless integration of the Reti-Pioneer framework within real-world clinical workflows under blinded conditions. Figure 3a illustrates the deployed clinical pipeline for real-time multimorbidity detection. During the trial, the system routinely processed 30–50 cases per day from patients referred for endocrine, metabolic and hematological testing in a primary care setting. The screening performance of Reti-Pioneer for six endocrine and metabolic diseases was evaluated against a reference standard incorporating laboratory measurements, physical examination findings and self-reported history, and was compared with the FINDRISC questionnaire for T2DM. In the clinical workflow, bilateral fundus images from all eligible participants were systematically captured at the point of care, irrespective of operator-related or environmental variations in image quality, and transmitted in real time to a centralized GPU computing infrastructure for immediate analysis without disrupting routine clinical procedures. AI-generated predictions were not disclosed to clinicians or patients and did not influence clinical decision-making, maintaining the trial’s blinded nature.
Key operational metrics were automatically recorded throughout the trial, including image acquisition success rate, AI model inference success rate and system throughput efficiency (defined as total time from image capture to report generation). These data enabled direct comparison between the Reti-Pioneer-assisted screening pipeline and the current standard workflow based on laboratory testing.
A prospective pilot study was conducted between September 2025 and November 2025 at a community health service center and a physical examination center to evaluate the real-world impact of the Reti-Pioneer AI system on clinical practice, patient behavior and clinician-patient perceptions in an open-label setting. Participants undergoing routine physical examinations were recruited and offered the Reti-Pioneer AI test by their PCPs. CFPs and clinical metadata were collected from all participants, alongside comprehensive laboratory-based physical examinations. AI-generated predictions were provided to both clinicians and patients as part of the clinical information. Diagnostic consistency was assessed by calculating the sensitivity, specificity and accuracy of the model’s outputs against a composite diagnostic standard incorporating laboratory results and self-reported medical history.
Satisfaction and acceptance among participants and clinicians were evaluated using structured questionnaires. Participants rated their experience on a 5-point Likert scale across multiple domains, including ease of use, report comprehensibility, clarity of management recommendations, overall satisfaction, willingness to reuse and recommend the service, perceived efficiency of multidisease screening, information load, perceived convenience compared to traditional testing and willingness to pay (Supplementary Table 9).
Clinicians completed a separate acceptance questionnaire, also based on a 5-point Likert scale, assessing overall satisfaction, willingness to integrate the system into practice, motivation to conduct multidisease screening, workflow compatibility, perceived decision-support value, trust in AI predictions and suitability for resource-limited settings. The complete questionnaire items are detailed in Supplementary Table 10.
Descriptive statistics are reported as the mean ± s.d. for continuous variables and as frequency (%) for categorical variables. Discriminative performance was evaluated using AUROCs. Comparisons between AUROCs were conducted using the DeLong’s test40. Calibration was assessed using calibration plots (observed versus predicted risk using deciles). Overall performance was evaluated using the Brier score. Sensitivity, specificity, PPV and NPV for classification are reported with 95% CIs. CIs were obtained using the percentile bootstrap method on 1,000 bootstrap samples. To evaluate clinical utility across a range of risk thresholds, decision curve analysis was considered41,42.
Data analyses and visualizations were conducted under Python (v.3.12) and R (v.4.2.0). Reti-Pioneer was implemented on PyTorch (v.2.8.0) and optimized using the AdamW optimizer with a warm-up strategy and a cosine annealing learning rate scheduler. The construction, fine-tuning and testing processes were performed on an RTX 3080 GPU (10 GB dedicated memory, CUDA v.12.8).
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Individual-level patient data can be accessed with the informed consent of the Data Management Committee from individual institutions and are not publicly available. Requests for access to de-identified individual-level data from China can be submitted via email to H. Yu (yuhonghua@stu.edu.cn) with detailed proposals for approval. SEED data can be made available to researchers who meet the criteria for access to confidential data and upon institutional review board approval; requests for access can be made to C.-Y. Cheng (chingyu.cheng@nus.edu.sg). Investigators who consent to the terms of the data transfer agreement, including, but not limited to, the use of these data only for academic purposes, and to protect the confidentiality of the data and limit the possibility of identification of participants, will be granted access. The data from the UKB can be obtained via controlled access on their web portal (www.ukbiobank.ac.uk/).
The code used in the current study to develop the algorithm is provided via GitHub at https://github.com/lyhyl/Reti-Pioneer.
Khosla, S., Farr, J. N., Tchkonia, T. & Kirkland, J. L. The role of cellular senescence in ageing and endocrine disease. Nat. Rev. Endocrinol. 16, 263–275 (2020).
Article CAS PubMed Google Scholar
Davidson, K. W. et al. Screening for prediabetes and type 2 diabetes: US Preventive Services Task Force recommendation statement. JAMA 326, 736–743 (2021).
Article PubMed Google Scholar
Sacks, D. B. et al. Guidelines and recommendations for laboratory analysis in the diagnosis and management of diabetes mellitus. Diabetes Care 46, e151–e199 (2023).
Article CAS PubMed Google Scholar
Zhu, Z. et al. Oculomics: current concepts and evidence. Prog. Retin. Eye Res. 106, 101350 (2025).
Article CAS PubMed Google Scholar
Liu, S., Chen, R., Hu, W. & Zhu, Z. Ocular ageing biomarkers and their clinical utility: a review. Vis. Neurosci. 42, e009 (2025).
Article Google Scholar
Kashani, A. H. et al. Past, present and future role of retinal imaging in neurodegenerative disease. Prog. Retin. Eye Res. 83, 100938 (2021).
Article PubMed Google Scholar
Avery, C. L. et al. Impact of long-term measures of glucose and blood pressure on the retinal microvasculature. Atherosclerosis 225, 412–417 (2012).
Article CAS PubMed Google Scholar
Heydari, K., Enichen, E. J., Li, B. & Kvedar, J. C. Leveraging retinal vascular features in non-invasive, early diagnosis of preeclampsia. NPJ Digit. Med. 8, 422 (2025).
Article PubMed Google Scholar
Elsamkary, M. A., El-Shazly, A. A. E., Badran, T. A. F., Fouad, Y. A. & Abdelgawad, R. H. A. Optical coherence tomography and electrophysiological analysis of proptotic eyes due to thyroid-associated ophthalmopathy. Int. Ophthalmol. 43, 2057–2064 (2023).
Article PubMed Google Scholar
Zhang, K. et al. Deep-learning models for the detection and incidence prediction of chronic kidney disease and type 2 diabetes from retinal fundus images. Nat. Biomed. Eng. 5, 533–545 (2021).
Article CAS PubMed Google Scholar
Zhu, L. et al. Vision Mamba: efficient visual representation learning with bidirectional state space model. In Proc. 41st International Conference on Machine Learning 62429–62442 (2024).
Zhou, Y. et al. A foundation model for generalizable disease detection from retinal images. Nature 622, 156–163 (2023).
Article CAS PubMed PubMed Central Google Scholar
Wang, M. et al. A clinician-friendly platform for ophthalmic image analysis without technical barriers. Preprint at https://arxiv.org/abs/2504.15928 (2025).
Ghorbian, M., Ghorbian, S. & Ghobaei-Arani, M. AI-driven techniques for detection and mitigation of SARS-CoV-2 spread: a review, taxonomy, and trends. Clin. Exp. Med. 25, 204 (2025).
Article PubMed Google Scholar
Chan, T.-Y., Wang, J.-H., Chen, N. & Chiu, C.-J. The assessment of retinal image quality using a non-mydriatic fundus camera in a teleophthalmologic platform. Diagnostics 14, 1543 (2024).
Article PubMed Google Scholar
Van Der Vegt, A., Campbell, V. & Zuccon, G. Why clinical artificial intelligence is (almost) non-existent in Australian hospitals and how to fix it. Med. J. Aust. 220, 172–175 (2024).
Article PubMed Google Scholar
Zhu, W. et al. Optimal transport guided unsupervised learning for enhancing low-quality retinal images. In Proc. IEEE International Symposium on Biomedical Imaging (IEEE, 2023); https://doi.org/10.1109/ISBI53787.2023.10230719.
Lu, X. et al. Type 2 diabetes mellitus in adults: pathogenesis, prevention and therapy. Signal Transduct. Target. Ther. 9, 262 (2024).
Article PubMed Google Scholar
Wright, W. S., Eshaq, R. S., Lee, M., Kaur, G. & Harris, N. R. Retinal physiology and circulation: effect of diabetes. Compr. Physiol. 10, 933–974 (2020).
Article PubMed Google Scholar
Kalaw, F. G. P., Sharma, P., Kako, R. N., Walker, E. & Borooah, S. Peripheral retinal vessel whitening in patients with diabetes mellitus. Sci. Rep. 13, 7981 (2023).
Article CAS PubMed Google Scholar
Wang, Y. et al. Economic evaluation for medical artificial intelligence: accuracy vs. cost-effectiveness in a diabetic retinopathy screening case. NPJ Digit. Med. 7, 43 (2024).
Article PubMed Google Scholar
Lin, S. et al. Artificial intelligence in community-based diabetic retinopathy telemedicine screening in urban China: cost-effectiveness and cost-utility analyses with real-world data. JMIR Public Health Surveill. 9, e41624 (2023).
Article PubMed Google Scholar
Xie, Y. et al. Artificial intelligence for teleophthalmology-based diabetic retinopathy screening in a national programme: an economic analysis modelling study. Lancet Digit. Health 2, e240–e249 (2020).
Article PubMed Google Scholar
Twala, B. AI-driven precision diagnosis and treatment in Parkinson’s disease: a comprehensive review and experimental analysis. Front. Aging Neurosci. 17, 1638340 (2025).
Article PubMed Google Scholar
Podină, N. et al. Artificial intelligence in pancreatic imaging: a systematic review. United European Gastroenterol. J. 13, 55–77 (2025).
Article PubMed Google Scholar
Palmer, L. J. UK Biobank: bank on it. Lancet 369, 1980–1982 (2007).
Article PubMed Google Scholar
Majithia, S. et al. Cohort profile: the Singapore Epidemiology of Eye Diseases study (SEED). Int. J. Epidemiol. 50, 41–52 (2021).
Article PubMed Google Scholar
Eastwood, S. V. et al. Algorithms for the capture and adjudication of prevalent and incident diabetes in UK Biobank. PLoS ONE 11, e0162388 (2016).
Article PubMed Google Scholar
Whelton, P. K. et al. 2017 ACC/AHA/AAPA/ABC/ACPM/AGS/APhA/ASH/ASPC/NMA/PCNA guideline for the prevention, detection, evaluation, and management of high blood pressure in adults: executive summary: a report of the American College of Cardiology/American Heart Association Task Force on Clinical Practice Guidelines. Hypertension 71, 1269–1324 (2018).
Article CAS PubMed Google Scholar
Opoku, S. et al. Awareness, treatment, control, and determinants of dyslipidemia among adults in China. Sci. Rep. 11, 10056 (2021).
Article CAS PubMed Google Scholar
Liu, Z. et al. Swin Transformer V2: scaling up capacity and resolution. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition https://doi.org/10.1109/CVPR52688.2022.01170 (2022).
Liang, Y. et al. HRadNet: a hierarchical radiomics-based network for multicenter breast cancer molecular subtypes prediction. IEEE Trans. Med. Imaging 43, 1225–1236 (2024).
Article PubMed Google Scholar
Chen, R. J. et al. Pathomic Fusion: an integrated framework for fusing histopathology and genomic features for cancer diagnosis and prognosis. IEEE Trans. Med. Imaging 41, 757–770 (2022).
Article PubMed Google Scholar
Esmaeilkhanian, H., Gutierrez, K. G., Myung, D. & Fisher, A. C. et al. Detection rate of diabetic retinopathy before and after implementation of autonomous ai-based fundus photograph analysis in a resource-limited area in Belize. Clin. Ophthalmol. 19, 993–1006 (2025).
Article PubMed Google Scholar
Fu, H. et al. Evaluation of retinal image quality assessment networks in different color-spaces. In Proc. Medical Image Computing and Computer-Assisted Intervention https://doi.org/10.1007/978-3-030-32239-7_6 (2019).
Klambauer, G., Unterthiner, T., Mayr, A. & Hochreiter, S. Self-normalizing neural networks. In Proc. 31st International Conference on Neural Information Processing Systems 972–981 (2017).
Ikram, A. & Imran, A. ResViT FusionNet Model: an explainable AI-driven approach for automated grading of diabetic retinopathy in retinal images. Comput. Biol. Med. 186, 109656 (2025).
Article PubMed Google Scholar
Selvaraju, R. R. et al. Grad-CAM: visual explanations from deep networks via gradient-based localization. Int. J. Comput. Vis. 128, 336–359 (2016).
Article Google Scholar
Sun, B. B. et al. Plasma proteomic associations with genetics and health in the UK Biobank. Nature 622, 329–338 (2023).
Article CAS PubMed Google Scholar
DeLong, E. R., DeLong, D. M. & Clarke-Pearson, D. L. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 44, 837–845 (1988).
Article CAS PubMed Google Scholar
Kerr, K. F. et al. Assessing the clinical impact of risk prediction models with decision curves: guidance for correct interpretation and appropriate use. J. Clin. Oncol. 34, 2534–2540 (2016).
Article PubMed Google Scholar
Vickers, A. J. & Elkin, E. B. Decision curve analysis: a novel method for evaluating prediction models. Med. Decis. Making 26, 565–574 (2006).
Article PubMed Google Scholar
Download references
We thank all the study investigators and participants. This research has been conducted using the UKB Resource under application no. 86091. The computational resources in this study are supported by the South China University of Technology. We thank Y. Qiu and S. Liu for web platform development and technical support, and W. Chen for community screening coordination. H.Y. is supported by the National Natural Science Foundation of China (no. U24A20707), Guangdong Basic and Applied Basic Research Foundation (no. 2023B1515120028) and the Brolucizumab Efficacy and Safety Single-Arm Descriptive Trial in Patients with Persistent Diabetic Macular Edema (2024-29). W.W.Y.N. is supported by the National Natural Science Foundation of China (nos. 62476100 and U24A20322). C.-Y.C. is supported by the National Medical Research Council of Singapore (nos. MOH-001283-00 and MOH-001477-00). Z.Z. is supported by a National Health and Medical Research Council Investigator Grant (nos. APP2010072 and APP2041559). B.S. is supported by the National Natural Science Foundation of China (no. T2525004). X.Y. is supported by the National Natural Science Foundation of China (no. 82271125) and Zhongshan Social Welfare Science and Technology Research Project (no. 2023B3009). X.Z. is supported by the National Natural Science Foundation of China (no. 82301260). J.C. is supported by the GDPH Supporting Fund (no. KY012026082) and GDPH Postdoctoral Supporting Fund (no. BY012025033). The funders/sponsors had no role in the design or conduct of the study.
These authors contributed equally: Xiayin Zhang, Qinyi Li, Yinhao Liang, Chunran Lai.
These authors jointly supervised this work: Yih Chung Tham, Yukun Zhou, Carol Y. Cheung, Xiaohong Yang, Bin Sheng, Zhuoting Zhu, Ching-Yu Cheng, Wing W. Y. Ng, Honghua Yu.
Guangdong Provincial People’s Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, Guangzhou, China
Xiayin Zhang, Qinyi Li, Yinhao Liang, Chunran Lai, Jiahui Cao, Shan Wang, Ying Fang, Kaiyi Chi, Miao Lin & Xiaohong Yang
Singapore Eye Research Institute, Singapore National Eye Centre, Singapore, Singapore
Xiayin Zhang, Yangqin Feng, Yih Chung Tham & Ching-Yu Cheng
School of Computer Science and Engineering, South China University of Technology, Guangzhou, China
Yinhao Liang & Wing W. Y. Ng
Centre for Eye Research Australia, Melbourne, Victoria, Australia
Wenyi Hu, Li Li & Zhuoting Zhu
Department of Surgery (Ophthalmology), University of Melbourne, Melbourne, Victoria, Australia
Wenyi Hu, Li Li & Zhuoting Zhu
Department of Ophthalmology and Visual Sciences, The Chinese University of Hong Kong, Hong Kong, China
Hongyang Jiang & Carol Y. Cheung
Department of Neurology, The Third Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
Chunxin Liu
Department of Ophthalmology, Linyi People’s Hospital, Linyi, China
Feng Zhang
Department of Ophthalmology, Linzhi People’s Hospital, Linzhi, China
Cuomu Duojie
Department of Ophthalmology, The First Affiliated Hospital of Xinjiang Medical University, Urumqi, China
Lumei Hu
Institute of Ophthalmic Diseases, Guangxi Academy of Medical Sciences; Department of Ophthalmology, The People’s Hospital of Guangxi Zhuang Autonomous Region; Guangxi Key Laboratory of Eye Health; Guangxi Health Commission Key Laboratory of Ophthalmology and Related Systemic Diseases Artificial Intelligence Screening Technology, Guangxi, China
Fan Xu
Department of Ophthalmology, Fuzhou University Affiliated Provincial Hospital, Fuzhou, China
Li Li
Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
Yih Chung Tham & Ching-Yu Cheng
Centre for Innovation and Precision Eye Health, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
Yih Chung Tham & Ching-Yu Cheng
Ophthalmology and Visual Science Academic Clinical Program (Eye ACP), Duke-NUS Medical School, Singapore, Singapore
Yih Chung Tham
Institute of Ophthalmology, University College London, London, UK
Yukun Zhou
NIHR Biomedical Research Centre at Moorfields Eye Hospital NHS Foundation Trust, London, UK
Yukun Zhou
School of Computer Science, Shanghai Jiao Tong University, Shanghai, China
Bin Sheng
MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, China
Bin Sheng
Joint Shantou International Eye Center of Shantou University and The Chinese University of Hong Kong, Shantou, Guangdong, China; Guangdong Provincial People’s Hospital (Guangdong Academy of Medical Sciences), Southern Medical University, Guangzhou, China
Honghua Yu
PubMed Google Scholar
PubMed Google Scholar
PubMed Google Scholar
PubMed Google Scholar
PubMed Google Scholar
PubMed Google Scholar
PubMed Google Scholar
PubMed Google Scholar
PubMed Google Scholar
PubMed Google Scholar
PubMed Google Scholar
PubMed Google Scholar
PubMed Google Scholar
PubMed Google Scholar
PubMed Google Scholar
PubMed Google Scholar
PubMed Google Scholar
PubMed Google Scholar
PubMed Google Scholar
PubMed Google Scholar
PubMed Google Scholar
PubMed Google Scholar
PubMed Google Scholar
PubMed Google Scholar
PubMed Google Scholar
PubMed Google Scholar
PubMed Google Scholar
H.Y., X.Z., W.W.Y.N. and Q.L. conceived and supervised the project. Y.L. designed the deep learning algorithm and the computational framework. X.Z., Q.L., Y.L., C. Lai, J.C., Y.F. and C.Y.C. designed the study and contributed to the initial drafting of the paper. Q.L., C. Lai, Y.F., C. Liu, F.Z., S.W., Y.F., C.D., L.H., F.X. and K.C. collected and organized the data, and participated in the external tests. C. Lai and Q.L. conducted the data collection and analysis in the prospective trial. C.-Y.C., Z.Z., B.S., X.Y., C.Y.C., Y.Z., Y.C.T., W.H., H.J., M.L. and L.L. contributed to the collaboration and provided critical revision of the manuscript for important intellectual content. All authors provided critical comments and reviewed the manuscript. All authors discussed the results and approved the final version before submission.
Correspondence to Zhuoting Zhu, Ching-Yu Cheng, Wing W. Y. Ng or Honghua Yu.
C.-Y.C. is the co-founder of Eye.AI and a consultant of MediWhale. The other authors declare no competing interests.
Nature Medicine thanks Paul Franks, Alexandra Miere and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editors: Mattia Andreoletti and Lorenzo Righetto in collaboration with the Nature Medicine team.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
a, The framework integrates bilateral fundus images with clinical metadata. Each image is processed through a quality-aware module, then fed into three frozen pre-trained backbones (Swin Transformer, Vision Mamba, and RETFound), each followed by a trainable prediction head. Individual-level predictions from both eyes and three architectures are aggregated via a weighted soft voting ensemble to generate final disease risk scores. Clinical metadata (age, sex, ethnicity, and weight) are fused via bilinear modules to enhance multimodal integration. b, The structural diagrams of quality-aware module and three frozen pre-trained backbones (Swin Transformer, Vision Mamba, and RETFound). Figure created in BioRender. Yu, H. https://biorender.com/vynpw0k (2026).
a,b, Calibration plots (a) and decision curve analysis (b) of the Reti-Pioneer framework across diverse external test datasets. Datasets are categorized by geographic region and healthcare resource level: resource-limited (pooled data from Tibet, Xinjiang, and Guangxi, China), high-resource A (physical examination centers in Guangdong, China), high-resource B (tertiary hospitals in Guangdong, China), and the multi-ethnic Singapore Epidemiology of Eye Diseases (SEED) cohort. In a, the Brier score and alignment with the 45-degree dashed line indicate model calibration. In b, decision curves illustrate the net clinical benefit of Reti-Pioneer across a range of threshold probabilities.
a–f, Receiver operating characteristic (ROC) curves and decision curve analysis (DCA) for the incident prediction of type 2 diabetes mellitus (T2DM; a), hypertension (b), hyperlipidemia (c), gout (d), osteoporosis (e), and thyroid disease (f). Performance is evaluated at 5-year and 10-year horizons within a prospective subset of the UK Biobank. The ROC curves (left) illustrate sensitivity and specificity, while DCA (right) demonstrates clinical net benefit for each disease.
Scatter plots showing Pearson correlations between orthogonal projections to latent structures-discriminant analysis (OPLS-DA)-derived predictive components of retinal latent features and expression levels of top-ranked disease-specific proteins. Each point represents one participant (n = 6,273). Solid lines represent linear regressions, and shaded areas indicate the 95% confidence bands. All statistical tests were two-sided. T2DM, type 2 diabetes mellitus.
Bars represent the area under the receiver operating characteristic curve (AUROC) values across three diagnostic standards; error bars represent the 95% confidence intervals (CIs). The sample size for each disease category was n = 1,017 independent participants. Total diagnoses were confirmed by a combination of self-report and blood tests, except for hypertension (based on self-report and blood pressure measurements) and osteoporosis (based on self-report and a 1-minute screening questionnaire).
Flowchart illustrating the selection of color fundus photographs (CFPs) used for model training and evaluation. To maximize ethnic and demographic heterogeneity, training data were pooled from the UK Biobank and three hospital-based centers in China. Out of the total UK Biobank cohort, a subset of 31,408 images was strictly held out as a prospective test set to evaluate the longitudinal predictive performance of Reti-Pioneer.
Supplementary Tables 1–10 and Figs. 1 and 2.
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
Reprints and permissions
Zhang, X., Li, Q., Liang, Y. et al. AI framework for multidisease detection via retinal imaging. Nat Med (2026). https://doi.org/10.1038/s41591-026-04359-w
Download citation
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41591-026-04359-w
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
Advertisement
Nature Medicine (Nat Med)
ISSN 1546-170X (online)
ISSN 1078-8956 (print)
© 2026 Springer Nature Limited
Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Leave a Reply