Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.
Advertisement
Scientific Reports volume 15, Article number: 19811 (2025)
483
1
Metrics details
This study assessed the efficacy of various diagnostic indicators and machine learning (ML) models in predicting childhood myopia. A total of 2,365 children aged 5–12 years were included in the study. The participants were exposed to non-cycloplegic and cycloplegic refraction tests, along with ocular biometric assessments. Cycloplegia was induced using 1% cyclopentolate eye drops, followed by cycloplegic refraction testing. Myopia prevalence was 11.2% (95% confidence interval: 9.9–12.5%). The spherical equivalent (SE) before and after cycloplegia varied with age, significantly differing by 0.5D in children < 10 years (P < 0.05). The most effective single-indicator screening diagnostic methods were axial length/ corneal curvature radius (AL/CCR) and screening myopia, with area under curve (AUC) of 0.919 (95% CI: 0.899 to 0.939) and 0.911 (95% CI: 0.890 to 0.932). In the multi-indicator joint diagnostic model, the best diagnostic model using non-cycloplegic SE, uncorrected distance visual acuity (UCDVA), AL, and age was the Extreme Gradient Boosting model, with an AUC of 0.983 and an accuracy of 0.970. The best diagnostic model using non-cycloplegic SE, AL/CCR, UCDVA, and age was the Random Forest model, with an AUC of 0.981 and an accuracy of 0.975. The AL/CCR demonstrated superior performance in predicting childhood myopia. The ML-based multi-indicator joint diagnostic predictive model enhances the accuracy of childhood myopia diagnosis, screening, and intervention.
Myopia is a major public health concern1. Its rapidly increasing global prevalence and incidence haves garnered considerable attention from the international community2. Its prevalence rose from 28.3% in 2010 to 34% in 2020, an approximate 20% increase from the baseline. By 2050, an estimated 4.758 billion people will be affected, with 938 million at risk of high myopia, representing 49.8% and 9.8% of the global population, respectively3. Myopia adversely affects quality of life, leading to substantial medical expenses and productivity losses amounting to hundreds of billions of dollars annually. Without intervention, these costs will continue to rise4.
The earlier the onset of myopia, the higher the risk of developing high myopia and associated ocular complications5. Therefore, the early stages of myopia in children are critical for prevention and control. Effective management of myopia in young children can significantly delay its onset.
Traditional large-scale myopia surveys often rely on non-cycloplegic refractive data, which may misrepresent the refractive status of young children and obscure risk factors1,6. Cycloplegic refraction, though the diagnostic gold standard7is costly, invasive, and difficult to administer in children, limiting its use in population-wide studies8.
Exploring non-invasive, high-efficacy screening methods for myopia in children is essential. Some experts consider axial length (AL) and corneal curvature radius (CCR) as indicators of the growth and development of children and adolescents9,10. Their measurements are not affected by the adjustment state and provide objective and accurate data without cycloplegic paralysis. Ocular biometric parameters are important indicators for assessing the far-vision reserve in children and objectively evaluating the development of refraction11.
Traditional linear or logistic regression studies12,13 have predominantly focused on the screening efficacy of single indicators14proving unable to handle the nonlinear features of the axial length-to-corneal curvature ratio or interactions between features9. These approaches overlook the synergistic effects of combined indices, which imposes limitations on constructing highly reliable predictive models and hinders the development of effective myopia prevention strategies15. While machine learning (ML) studies16 have enhanced myopia prediction efficacy17 by capturing high-order nonlinear relationships between covariates and outcomes, achieving higher accuracy and construct sophisticated models that better adapt to data characteristics18. Traditional ML still face critical limitations. These models often rely on single algorithms19 that struggle to accommodate diverse data features, while their lack of interpretability20 further restricts both screening applications and clinical utility.
Therefore, exploring ML-based multi-indicator models can more accurately predict childhood myopia. This study develops an ML model integrating age, AL/CCR, and refractive parameters to identify age-specific thresholds for younger school-aged children, leveraging Shapley Additive explanation (SHAP) for risk assessment. This approach fills a research gap by providing interpretable, age-tailored screening, enabling early identification of at-risk children, timely referral, and potential delay of myopia progression to improve visual outcomes21.
This study was conducted from November 2023 to January 2024 and utilized stratified sampling by grade along with probability proportionate to size sampling. The study covered 16 schools and kindergartens across two counties in Changsha City, including senior kindergarten classes and all grades of primary school students. Based on previous myopia survey data, the minimum sample size required for each grade was calculated using the estimation formula for simple random sampling. Notably, children with strabismus, amblyopia, a history of ocular surgery, keratoconus, accommodative spasms, or severe systemic diseases were excluded from the study. Additionally, participants who had contraindications to mydriasis were not included in the study. Ultimately, 2368 children’s data were collected. All participating students joined the survey after providing informed consent. Informed consent was obtained from both participating students and their parents or legal guardians before study commencement. This study has been approved by the Ethics Committee of the Changsha Municipal Center for Disease Control and Prevention and was conducted following the principles of the Declaration of Helsinki.
Uncorrected distance visual acuity (UCDVA) was assessed using a standard logarithmic visual acuity chart at a distance of 5 m, determining an individual’s visual clarity without corrective lenses. Additionally, autorefraction was performed before and after pupil dilation using an automated refractometer (KR-800; Topcon, Japan) to ensure continuity and accuracy of the measurements. All participants were examined by the same optometrist for both pre- and post-dilation autorefraction to maintain consistency in the results.
Ocular biometric parameters were measured by noncontact partial-coherence laser interferometry (IOLMaster 500, Zeiss, Germany), including AL, K1/K2 of CCR, and anterior chamber depth (ACD). Five measurements were taken for each parameter, and the average value was used to ensure data reliability. Before dilation, all children underwent noncontact intraocular pressure measurement and slit-lamp examination to rule out contraindications to dilation. Pupillary dilation was performed using 0.5% tropicamide eye drops administered once every 5 min for a total of four doses, followed by refraction 30 min after the last dose.
Spherical equivalent (SE) was calculated as the sphere power plus half the cylinder power. UCDVA < 5.0 was classified as having poor vision. Screening myopia was defined as UCDVA < 5.0 and non-cycloplegic SE < -0.50 diopters (D). Myopia was diagnosed when cycloplegic SE was ≤ -0.50D. To ensure the quality of the study, all team members received uniform training before the study began, and the equipment was calibrated with standard eyes before formal testing. Moreover, 5% of the ophthalmic examination items were re-tested and verified according to the number of examinees to identify and promptly correct any discrepancies.
The dataset included demographic information (age, gender, height, and weight) for each participant and ocular profile (UCDVA, SE, AL, CCR and ACD) for both eyes. The analysis was limited to measurements from the right eye to maintain data independence and reduce potential bias, ensuring a more accurate and impartial assessment.
Two samples of individuals wearing orthokeratology lenses and one column with outliers were removed from the dependent variable, resulting in a final sample size of 2365 cases.
Missing data were addressed carefully to minimise bias. To address this issue, multiple imputation techniques were employed for the dataset. All variables included had < 20% of their data missing: height (1 case, 0.04%), weight (1 case, 0.04%), UCDVA (10 cases, 0.42%), non-cycloplegic SE (7 cases, 0.3%), cycloplegic SE (81 cases, 3.42%), AL (221 cases, 9.34%), CCR (221 cases, 9.34%), and ACD (221 cases, 9.34%). Assuming the data were missing at random, the fully conditional specification method was utilised, which was available based on five replications and a chained equation approach method. Missing values were imputed using the MICE (Multivariate Imputation by Chained Equations) package (version 3.16.0) for the R programming language.
Correlation analysis was conducted to reduce model bias caused by the interdependence of variables. A heatmap of correlation coefficients was generated to visualize these relationships. To prevent predictive model instability caused by high collinearity among independent variables, variables with inter-independent variable correlation coefficients > 0.7 and relatively lower correlation with the dependent variable were removed.
The diagnostic ability of single indicator variables was visually assessed using ROC curves, and the model’s effectiveness was measured via two key indicators: the area under curve (AUC) and the cutoff point determined by the Youden Index (Sensitivity + Specificity − 1), which was employed to identify the optimal threshold balancing specificity and sensitivity.
In ML, we have adopted Tidymodels (version 1.1.1) for tidy modeling. The model’s predictive accuracy was enhanced by extracting, selecting, and transforming features from the original data through data centering, scaling, zero-variance predictors removal, and dummy variable treatment for categorical variables. Five representative models were selected based on data characteristics: Extreme Gradient Boosting (XGBoost), Regularized Logistic Regression (LR_reg), K-Nearest Neighbor (KNN), Random Forest (RF), and Support Vector Machine (SVM). We divide the dataset into training (70%) and testing (30%) sets to reduce the risk of overfitting and bias. To ensure the robustness of the validation results, we used 5 × 10-fold cross-validation: the dataset was split into 10 subsets, training on 9 and validating on 1 in each fold. Repeating this 5 times with random splits generated 50 results, whose average served as the final performance metric to reduce partitioning bias. Our hyperparameter optimization strategy improved search efficiency through a competitive grid search and increased the possibility of finding the optimal hyperparameter combination. We comprehensively evaluate the performance of multiple variable combinations and predictive models, using techniques such as cross-validation to assess the generalization performance of the models. The key indicator for evaluating model efficacy was the AUC of the ROC curve. Additionally, we used three other indicators: accuracy, brier score, and F1 score, to comprehensively evaluate the performance of the ML algorithms. Using these comprehensive evaluation indicators, we selected the most effective predictive model for further research.
We used R software (version 4.3.3) to conduct all statistical analyses and build multi-indicator predictive models based on ML methods. We used SHAP values to interpret the models, enhancing their interchangeability. Descriptive statistical analyses were performed for both continuous and categorical variables. T-tests were used to compare metric data, and chi-square tests for categorical data. Statistical significance was set at a two-tailed P-value < 0.05. Figure.1 illustrates the workflow of the study.
Flowchart of screening.
The study involved 2,365 children (1184 boys and 1181 girls), aged 5–12 years (mean 7.8 ± 1.7 years). The myopia rate was 11.2% (95% confidence interval [CI]: 9.9%~12.5%), poor vision rate was 34.9% (95%CI: 33.0 ~ 36.8%), and screening myopia rate was 13.4% (95%CI: 12.0 ~ 14.7%). The mean non-cycloplegic SE was − 0.03 ± 1.12 D, mean cycloplegic SE was 0.64 ± 1.23D, mean UCDVA was 4.9 ± 0.2, mean AL was 22.95 ± 0.86 mm, mean CCR was 7.82 ± 0.26 mm, and mean AL/CCR was 2.94 ± 0.09. Significant differences (P < 0.05) were observed in age, grade, height, weight, vision metrics, and ACD between children with myopia and those without. Children with myopia were generally older, taller, and heavier, with longer ALs, higher AL/CCR ratios, and deeper ACDs. Gender distribution showed no significant difference between the two groups (χ2 = 0.207, P = 0.207). Compared with children with poor vision, those identified through screening myopia had a higher proportion of myopia (72.5% vs. 28.1%). (Table 1)
We used a pearson correlation matrix to analyse relationships among 10 variables (Fig. 2a). Non-cycloplegic SE strongly correlated with cycloplegic SE (r = 0.82) and AL/CCR (r = 0.75). It exhibited a moderate association with uncorrected distant visual acuity(r = 0.45), and a negligible correlation with CCR(r = 0.04). Furthermore, age exhibited pronounced collinearity with both height and weight, prompting our decision to exclusively incorporate age in subsequent analyses, omitting height and weight.
Given the strong correlation (r > 0.8) between non-cycloplegic SE and cycloplegic SE, we segmented the age variable for a more nuanced subgroup analysis (Fig. 2b), revealing that the effectiveness of SE testing varied significantly with age. The difference in SE before and after cycloplegic refraction decreased with age. This variance was statistically significant in the 5–10 years age group, with a mean difference exceeding 0.50D. In children > 10 years, the difference was not statistically significant (P > 0.05). Figure 2c illustrates the prevalence of myopia among children of different age groups using data obtained through three distinct methods. The prevalence of poor vision follows a U-shaped pattern as children age, decreasing from ages five to seven years and then gradually increasing beyond seven years. Screening myopia mirrors actual myopia rates, with higher screening rates at five years (8.3% vs. 2.5%) and lower rates at 12 years (51.1% vs. 57.4%).
Correlation heat map of visual screening variables (a). Non-cycloplegic spherical equivalent and cycloplegic spherical equivalent diopters (b). The prevalence of myopia according to three methods in different age groups (c). NS non-significant.
The ROC curve (Fig. 3) compared conventional myopia screening methods and ocular biometric measurements. The cutoff values, sensitivity, and specificity of different screening examined. The most effective single-indicator screening diagnostic methods were AL/CCR and screening myopia, with AUCs of 0.919 (95% CI: 0.899 to 0.939) and 0.911 (95% CI: 0.890 to 0.932), respectively. For AL/CCR, the optimal cutoff value was 3.005, with a specificity of 0.826 and sensitivity of 0.895; screening myopia had a specificity of 0.864 and sensitivity of 0.959.
Comparison of receiver operator characteristic curve.
Additionally, we compared UCDVA, poor vision, non-cycloplegic SE, screening myopia, AL, AL/CCR, and ACD with the gold standard across different age groups (Table 2). Non-cycloplegic SE had the highest AUC values among the conventional screening methods across all age groups, while the AUCs of UCDVA and poor vision significantly increased with age. The UCDVA cutoff was below 5.0 for children ≤ 8 years, with younger ages correlating with lower cutoff values. Among the biometric measurements, AL/CCR consistently had higher AUC values than AL, CCR, and ACD, with a stable cutoff value of approximately 3 across all age groups. ACD exhibited lower diagnostic efficacy in all age groups.
We developed four diagnostic models using various ML algorithms, including XGBoost, LR-reg, KNN, RF, and SVM, to accommodate four distinct variable combinations. The selection of the optimal model within the training set is depicted in Fig. 4. Subsequently, we assessed the ultimate performance and generalization capacity of these models on the test set, as detailed in Table 3. The findings revealed that the XGBoost model, using age, UCDVA, non-cycloplegic SE, and AL variables achieved the highest AUC (0.983). The RF model, using age, UCDVA, non-cycloplegic SE, and AL/CCR, showed the highest accuracy (0.975), F1 score (0.868), and the lowest Brier score (0.022).
Optimization and selection of machine learning joint diagnostic models based on multi-indicator combinations.
We used SHAP values to interpret the top-performing XGBoost model (model1) and RF model (model2), illustrating how these variable indicators predict myopia. Figure 5a shows the important features in the XGBoost model, ranked by significance: non-cycloplegic SE, UCDVA, AL, and age. Figure 5b shows the most important features in the RF model, ranked by significance, with key predictors including non-cycloplegic SE, AL/CCR, UCDVA, and age. Figure 5c and d present two case examples (ID = 6), one classified as non-myopia and the other as myopia, to further highlight model’s interpretability.
Model variable importance ranking (top) and Shapley Additive explanation force plot for selected students (bottom). Panels a and c represent the XGBoost model (model1), while Panels b and d represent the RF model (model2).
Our study confirms that age-specific characteristics are crucial for screening and diagnosing myopia. We found that visual acuity and screening myopia are less effective in young children, while non-cycloplegic refraction is more suitable for children > 10 years. Ocular biometric measurements, particularly the AL/CCR, show higher efficacy, with an optimal diagnostic threshold above 3.003. The multi-indicator joint diagnostic model based on ML exhibited stronger predictive accuracy and generalization. The best model for diagnosing myopia using age, UCDVA, non-cycloplegic SE, and AL was XGBoost, with an AUC of 0.983 and an accuracy of 0.970. The RF model using age, UCDVA, non-cycloplegic SE, and the AL/CCR ratio had an AUC of 0.981 and an accuracy of 0.975.
Previous studies have shown that age and years of education are highly correlated with refractive error in children22. Liu et al.14 noted that non-cycloplegic refractive assessments in children often overestimate myopia owing to accommodation, resulting in frequent misdiagnoses of myopia and hyperopia. Our study also observed differences in myopia rate and SE before and after cycloplegia, particularly in children < 10 years. The differences in SE among age groups are due to stronger accommodative abilities in younger children, which cause shifts in SE during non-cycloplegic refraction23. The Tehran Eye Study24 reported 99% sensitivity but only 80.4% specificity for non-cycloplegic self-assessment refraction for myopia. The variation in measurements with and without cycloplegia is influenced by age, refractive category, and individual differences.
In this study, myopia rates in children aged 5–6 range from 2.8 to 3.0%, aligning with the consensus that the myopia in children under 6 is typically < 5%25,26. The efficacy and stability of UCDVA were low and unstable in younger age groups, particularly children ≤ 8 years, with a cutoff value below 5.0. This finds that visual development in children is a gradual process of emmetropisation27during which some children may not achieve the standard far-vision level of 5.0 before the age of 6. Consequently, the level of screened myopia may be overestimated in younger children, making it difficult to achieve China’s target of 3% for myopia prevention and control in this age group by 2030, based on non-cycloplegic diagnostic criteria for screened myopia. Current school-based myopia screening, relying on distant visual acuity measurement and non-cycloplegic autorefraction, has limited value in accurately determining refractive status28.
These findings highlight the need to continuously improve screening tools to ensure accuracy and effectiveness across various developmental stages. Previous studies have explored the use of ocular biometric measurement in predicting myopia29,30particularly, AL which has demonstrated good predictive value28,31. Notably, the shift toward myopia and acceleration of AL elongation may be evident up to four years before myopia onset, with similar patterns observed across different ethnic groups2. Liu et al.14 suggested that the AL/CCR has a stronger correlation with myopia than AL or CR alone. In our study, the AL/CCR excelled in single-indicator screening, surpassing myopia screening, UCDVA, AL, and ACD. The ROC curve cutoff values were consistently around 3 across age groups, indicating that the AL/CCR can serve an alternative indicator for identifying preschool children with low hyperopia reserves and myopia, aiding in early detection.
Traditional predictive models, including linear regression12logistic regression32cox proportional hazards regression33and generalized estimating equations. 34, typically rely on statistically significant variables, with baseline SE dominating myopia prediction. For instance, zadnik et al.35 reported that baseline SE achieved an AUC of 0.88, and even combined with axial length/corneal curvature, AUC only marginally improved to 0.893, reflecting overdependence on single parameters. In contrast, ML excels in high-dimensional data processing and longitudinal prediction. A recent study31 comparing five ML algorithms (RF, SVM, GBDT, CatBoost, logistic regression) in children aged 6–13 found CatBoost achieving AUC = 0.951 (vs. logistic regression AUC = 0.739), demonstrating ML’s superiority in leveraging complex feature interactions.
Multi-model approaches outperform single models19due to data complexity, algorithm heterogeneity, and diverse clinical needs. For example, in pediatric myopia datasets36orthogonal matching pursuit (OMP) excelled in SE prediction, while kernel ridge (KR) and multilayer perceptron (MLP) dominated AL estimation. In this study, five representative algorithms—logistic regression, XGBoost, KNN, RF, and SVM—were selected to capture distinct data patterns and age-specific trends. This complementary integration enhanced adaptability across clinical scenarios. Results identified XGBoost and RF as top performers in multi-indicator models, validating algorithmic diversity as a robustness enhancer.
Ocular biometric measurement methods, while superior9 to traditional detection methods, have limitations in monitoring myopia progression due to inconsistent relationships between the AL/CCR and myopia severity. Multi-indicator joint diagnosis is crucial in addressing these limitations, and ML17 offers promising solutions, particularly through cross-validation and hyperparameter tuning, which can fully utilize data, reduce the risk of overfitting, and improve model adaptability37,38. Although non-cycloplegic measurements may not reliably determine individual refractive errors, proper data modelling can help classify and identify risk groups for cycloplegic refraction28. Liu et al.39 recommended using the AL/CCR or combining AL with non-cycloplegic auto-refraction for higher accuracy in preschool children. A three-year retrospective study covering 13 cities found that40 age, uncorrected distant visual acuity, and SE were predictive factors for high myopia in school-age children, with the RF algorithm achieving an accuracy rate of 0.948 and an AUC of 0.975. Du et al.41 reported that the AdaBoost model predict refractive status more accurately than direct non-cycloplegic SE estimation, with an 81.7% accuracy rate and 75.2% of SE prediction errors < 0.50D. The reduced effectiveness of screening myopia compared to UCDVA and SE before cycloplegia may be due to the loss or simplification of some information caused by converting continuous variables into categorical variables, affecting the precision of data analysis and the effectiveness of the test.
This study used SHAP values42 to demystify the decision-making processes of ML models, particularly XGBoost and RF. SHAP values for each feature variable in the test dataset revealed their contributions to the prediction outcomes. The overall feature importance plot provided an average assessment of each feature’s contribution to the overall predictive results. For personalized risk prediction, the SHAP force plot demonstrated how various features influence individual risk predictions18. Although individual risk predictions aligned with overall feature importance, variations in specific indicator highlight the heterogeneity among individuals.
Our study had some limitations. First, we used 0.5% tropicamide as the cycloplegic agent rather than cyclopentolate, considered the gold standard. Second, the endpoint of cycloplegia was based on examiner records without objective measurement standards. Therefore, complete cycloplegia cannot be confirmed in all cases. Third, the study was not cohort-based and lacked a temporal correlation between refractive error and the AL/CCR, leaving room for observational and inclusion bias. Fourth, the predictive model did not include influencing factors43.Baseline ocular biometrics and refractive error, while predictive of incident myopia, also reflect previous risky behaviours, which may have led to underestimating the potential benefits of behavioral changes. This may lead to an underestimation of the potential benefits that can be obtained from behavioural changes. Finally, this study did not undergo rigorous external validation. Future studies should incorporate multicenter validation to enhance the reliability and generalizability of the model.
Our study highlights the limitations of traditional vision tests in screening for myopia in children, particularly their unsatisfactory detection efficiency. Age specificity is crucial when diagnosing and screening for myopia at different developmental stages. The AL/CCR demonstrated superior performance and can serve as a single indicator for identifying myopia in children, while non-cycloplegic SE is more suitable for older children. Our ML-based multi-indicator joint diagnostic model enhances diagnostic accuracy and practical applicability. The interpretability of SHAP values allows for more in-depth group predictions and individual myopia diagnoses.
The datasets generated and analyzed during the current study are not publicly available because of involving students’ privacy but are available from the corresponding author upon reasonable request.
Sankaridurg, P. et al. IMI impact of myopia. Investig. Ophthalmol. Vis. Sci. 62 https://doi.org/10.1167/iovs.62.5.2 (2021).
Wolffsohn, J. S. et al. IMI – Myopia control reports overview and introduction. Investig. Ophthalmol. Vis. Sci. 60, M1–m19. https://doi.org/10.1167/iovs.18-25980 (2019).
Article Google Scholar
Holden, B. A. et al. Global prevalence of myopia and high myopia and Temporal trends from 2000 through 2050. Ophthalmology 123, 1036–1042. https://doi.org/10.1016/j.ophtha.2016.01.006 (2016).
Article PubMed Google Scholar
Kandel, H., Khadka, J., Goggin, M. & Pesudovs, K. Impact of refractive error on quality of life: a qualitative study. Clin. Exp. Ophthalmol. 45, 677–688. https://doi.org/10.1111/ceo.12954 (2017).
Article PubMed Google Scholar
Huang, J., Ma, W., Li, R., Zhao, N. & Zhou, T. Myopia prediction for children and adolescents via time-aware deep learning. Sci. Rep. 13, 5430. https://doi.org/10.1038/s41598-023-32367-0 (2023).
Article CAS PubMed PubMed Central ADS Google Scholar
Ma, Y. et al. Cohort study with 4-year follow-up of myopia and refractive parameters in primary schoolchildren in Baoshan district, Shanghai. Clin. Exp. Ophthalmol. 46, 861–872. https://doi.org/10.1111/ceo.13195 (2018).
Article PubMed PubMed Central Google Scholar
Morgan, I. G. et al. IMI risk factors for myopia. Investig. Ophthalmol. Vis. Sci. 62 https://doi.org/10.1167/iovs.62.5.3 (2021).
Colpa, L. et al. Nonsurgical consecutive Exotropia following childhood esotropia: A multicentered study. Am. J. Ophthalmol. 258, 130–138. https://doi.org/10.1016/j.ajo.2023.07.021 (2024).
Article PubMed Google Scholar
Jong, M., Sankaridurg, P., Naduvilath, T. J., Li, W. & He, M. The relationship between progression in axial length/corneal radius of curvature ratio and spherical equivalent refractive error in myopia. Optometry Vis. Science: Official Publication Am. Acad. Optometry. 95, 921–929. https://doi.org/10.1097/opx.0000000000001281 (2018).
Article Google Scholar
Liu, S. et al. Cutoff values of axial length/corneal radius ratio for determining myopia vary with age among 3–18 years old children and adolescents. Graefe’s Archive Clin. Experimental Ophthalmol. = Albrecht Von Graefes Archiv fur Klinische Und Experimentelle Ophthalmologie. 262, 651–661. https://doi.org/10.1007/s00417-023-06176-0 (2024).
Article Google Scholar
Jonas, J. B. et al. IMI prevention of myopia and its progression. Investig. Ophthalmol. Vis. Sci. 62 https://doi.org/10.1167/iovs.62.5.6 (2021).
Chua, S. Y. et al. Age of onset of myopia predicts risk of high myopia in later childhood in myopic Singapore children. Ophthalmic Physiological Optics: J. Br. Coll. Ophthalmic Opticians (Optometrists). 36, 388–394. https://doi.org/10.1111/opo.12305 (2016).
Article Google Scholar
Ghorbani Mojarrad, N., Williams, C. & Guggenheim, J. A. A genetic risk score and number of myopic parents independently predict myopia. Ophthalmic Physiological Optics: J. Br. Coll. Ophthalmic Opticians (Optometrists). 38, 492–502. https://doi.org/10.1111/opo.12579 (2018).
Article Google Scholar
Liu, L. et al. Prediction of premyopia and myopia in Chinese preschool children: a longitudinal cohort. BMC Ophthalmol. 21, 283. https://doi.org/10.1186/s12886-021-02045-8 (2021).
Article CAS PubMed PubMed Central Google Scholar
Guo, X. et al. Noncycloplegic compared with cycloplegic refraction in a Chicago School-Aged population. Ophthalmology 129, 813–820. https://doi.org/10.1016/j.ophtha.2022.02.027 (2022).
Article PubMed Google Scholar
Beam, A. L. & Kohane, I. S. Big data and machine learning in health care. Jama 319, 1317–1318. https://doi.org/10.1001/jama.2017.18391 (2018).
Article PubMed Google Scholar
Zhang, J. & Zou, H. Artificial intelligence technology for myopia challenges: A review. Front. Cell. Dev. Biology. 11, 1124005. https://doi.org/10.3389/fcell.2023.1124005 (2023).
Article Google Scholar
Tong, H. J. et al. Machine learning to analyze the factors influencing myopia in students of different school periods. Front. Public. Health. 11, 1169128. https://doi.org/10.3389/fpubh.2023.1169128 (2023).
Article PubMed PubMed Central Google Scholar
Yang, X. et al. Prediction of myopia in adolescents through machine learning methods. Int. J. Environ. Res. Public. Health. 17 https://doi.org/10.3390/ijerph17020463 (2020).
Lin, H. et al. Prediction of myopia development among Chinese school-aged children using refraction data from electronic medical records: A retrospective, multicentre machine learning study. PLoS Med. 15, e1002674. https://doi.org/10.1371/journal.pmed.1002674 (2018).
Article PubMed PubMed Central Google Scholar
Han, X., Liu, C., Chen, Y. & He, M. Myopia prediction: a systematic review. Eye (London England). 36, 921–929. https://doi.org/10.1038/s41433-021-01805-6 (2022).
Article PubMed Google Scholar
Morgan, I. G. & Jan, C. L. China turns to school reform to control the myopia epidemic: A narrative review. Asia-Pacific J. Ophthalmol. (Philadelphia Pa). 11, 27–35. https://doi.org/10.1097/apo.0000000000000489 (2022).
Article Google Scholar
He, X. et al. Normative data and percentile curves for axial length and axial length/corneal curvature in Chinese children and adolescents aged 4–18 years. Br. J. Ophthalmol. 107, 167–175. https://doi.org/10.1136/bjophthalmol-2021-319431 (2023).
Article PubMed Google Scholar
Fotouhi, A., Morgan, I. G., Iribarren, R., Khabazkhoob, M. & Hashemi, H. Validity of noncycloplegic refraction in the assessment of refractive errors: the Tehran eye study. Acta Ophthalmol. 90, 380–386. https://doi.org/10.1111/j.1755-3768.2010.01983.x (2012).
Article PubMed Google Scholar
Dirani, M. et al. Prevalence and causes of decreased visual acuity in Singaporean Chinese preschoolers. Br. J. Ophthalmol. 94, 1561–1565. https://doi.org/10.1136/bjo.2009.173104 (2010).
Article CAS PubMed Google Scholar
Lan, W. et al. Refractive errors in 3–6 year-old Chinese children: a very low prevalence of myopia? PLoS One. 8, e78003. https://doi.org/10.1371/journal.pone.0078003 (2013).
Article CAS PubMed PubMed Central ADS Google Scholar
Schein, Y., Yu, Y., Ying, G. S. & Binenbaum, G. Emmetropization during early childhood. Ophthalmology 129, 461–463. https://doi.org/10.1016/j.ophtha.2021.11.021 (2022).
Article PubMed Google Scholar
Sankaridurg, P. et al. Comparison of noncycloplegic and cycloplegic autorefraction in categorizing refractive error data in children. Acta Ophthalmol. 95, e633–e640. https://doi.org/10.1111/aos.13569 (2017).
Article CAS PubMed PubMed Central Google Scholar
Sanz Diez, P., Yang, L. H., Lu, M. X., Wahl, S. & Ohlendorf, A. Growth curves of myopia-related parameters to clinically monitor the refractive development in Chinese schoolchildren. Graefe’s Archive Clin. Experimental Ophthalmol. = Albrecht Von Graefes Archiv fur Klinische Und Experimentelle Ophthalmologie. 257, 1045–1053. https://doi.org/10.1007/s00417-019-04290-6 (2019).
Article Google Scholar
Xiao, J. et al. Analysis and modeling of myopia-related factors based on questionnaire survey. Comput. Biol. Med. 150, 106162. https://doi.org/10.1016/j.compbiomed.2022.106162 (2022).
Article PubMed Google Scholar
Peng, W. et al. Does multidimensional daily information predict the onset of myopia? A 1-year prospective cohort study. Biomed. Eng. Online. 22 https://doi.org/10.1186/s12938-023-01109-8 (2023).
French, A. N., Morgan, I. G., Mitchell, P. & Rose, K. A. Risk factors for incident myopia in Australian schoolchildren: the Sydney adolescent vascular and eye study. Ophthalmology 120, 2100–2108. https://doi.org/10.1016/j.ophtha.2013.02.035 (2013).
Article PubMed Google Scholar
Wang, S. K. et al. Incidence of and factors associated with myopia and high myopia in Chinese children, based on refraction without cycloplegia. JAMA Ophthalmol. 136, 1017–1024. https://doi.org/10.1001/jamaophthalmol.2018.2658 (2018).
Article PubMed PubMed Central Google Scholar
Zhang, M. et al. Validating the accuracy of a model to predict the onset of myopia in children. Investig. Ophthalmol. Vis. Sci. 52, 5836–5841. https://doi.org/10.1167/iovs.10-5592 (2011).
Article Google Scholar
Zadnik, K. et al. Ocular predictors of the onset of juvenile myopia. Investig. Ophthalmol. Vis. Sci. 40, 1936–1943 (1999).
CAS Google Scholar
Zhu, S. et al. Prediction of spherical equivalent refraction and axial length in children based on machine learning. Indian J. Ophthalmol. 71, 2115–2131. https://doi.org/10.4103/ijo.Ijo_2989_22 (2023).
Article PubMed PubMed Central Google Scholar
Wang, Y. et al. Machine learning models for predicting Long-Term visual acuity in highly myopic eyes. JAMA Ophthalmol. 141, 1117–1124. https://doi.org/10.1001/jamaophthalmol.2023.4786 (2023).
Article PubMed PubMed Central Google Scholar
Li, W. et al. Study of myopia progression and risk factors in Hubei children aged 7–10 years using machine learning: a longitudinal cohort. BMC Ophthalmol. 24, 93. https://doi.org/10.1186/s12886-024-03331-x (2024).
Article PubMed PubMed Central Google Scholar
Li, S. M. et al. Annual incidences and progressions of myopia and high myopia in Chinese schoolchildren based on a 5-Year cohort study. Investig. Ophthalmol. Vis. Sci. 63, 8. https://doi.org/10.1167/iovs.63.1.8 (2022).
Article Google Scholar
Guan, J. et al. Prevalence patterns and onset prediction of high myopia for children and adolescents in Southern China via Real-World screening data: retrospective School-Based study. J. Med. Internet. Res. 25, e39507. https://doi.org/10.2196/39507 (2023).
Article PubMed PubMed Central Google Scholar
Du, B. et al. Prediction of spherical equivalent difference before and after cycloplegia in school-age children with machine learning algorithms. Front. Public. Health. 11, 1096330. https://doi.org/10.3389/fpubh.2023.1096330 (2023).
Article PubMed PubMed Central Google Scholar
Zhao, J. et al. Development and validation of predictive models for myopia onset and progression using extensive 15-year refractive data in children and adolescents. J. Translational Med. 22, 289. https://doi.org/10.1186/s12967-024-05075-0 (2024).
Article Google Scholar
Tideman, J. W. L., Polling, J. R., Jaddoe, V. W. V., Vingerling, J. R. & Klaver, C. C. W. Environmental risk factors can reduce axial length elongation and myopia incidence in 6- to 9-Year-Old children. Ophthalmology 126, 127–136. https://doi.org/10.1016/j.ophtha.2018.06.029 (2019).
Article PubMed Google Scholar
Download references
The authors thank Han Xun and Haixiang Zhou for their assistance with this study.
Supported by Health Research Project of Hunan Provincial Health Commission (grant number: W20243235).
Changsha Municipal Center for Disease Control and Prevention, No. 509, Wanjiali Second North Road, Kaifu District, Changsha, 410001, Hunan, China
Qi Feng, Xin Wu, Qianwen Liu, Yuanyuan Xiao, Xixing Zhang & Yan Chen
PubMed Google Scholar
PubMed Google Scholar
PubMed Google Scholar
PubMed Google Scholar
PubMed Google Scholar
PubMed Google Scholar
Q.F. designed the study, contributed the statistical analysis, interpreted the results and wrote the original draft. Q.W. L., X.W. and Y.Y.X contributed data curation, software, and materials/ analytic. X.X.Z. and Y.C. contributed designed, administration, and supervision. All authors reviewed and approved the manuscript prior to submission.
Correspondence to Yan Chen.
The authors declare no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
Reprints and permissions
Feng, Q., Wu, X., Liu, Q. et al. Interpretable machine learning models for predicting childhood myopia from school-based screening data. Sci Rep 15, 19811 (2025). https://doi.org/10.1038/s41598-025-05021-0
Download citation
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-025-05021-0
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
Advertisement
Scientific Reports (Sci Rep)
ISSN 2045-2322 (online)
© 2025 Springer Nature Limited
Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.