Skip to main content
All evidence

A critical comparative study of the performance of three AI-assisted programs for bone age determination


To date, AI-supported programs for bone age (BA) determination for medical use in Europe have almost only been validated separately, according to Greulich and Pyle (G&P). Therefore, the current study aimed to compare the performance of three programs, namely BoneXpert, PANDA, and BoneView, on a single Central European population.


For this retrospective study, hand radiographs of 306 children aged 1–18 years, stratified by gender and age, were included. A subgroup consisting of the age group accounting for 90% of examinations in clinical practice was formed. The G&P BA was estimated by three human experts—as ground truth—and three AI-supported programs. The mean absolute deviation, the root mean squared error (RMSE), and dropouts by the AI were calculated.


The correlation between all programs and the ground truth was prominent (R2 ≥ 0.98). In the total group, BoneXpert had a lower RMSE than BoneView and PANDA (0.62 vs. 0.65 and 0.75 years) with a dropout rate of 2.3%, 20.3% and 0%, respectively. In the subgroup, there was less difference in RMSE (0.66 vs. 0.68 and 0.65 years, max. 4% dropouts). The standard deviation between the AI readers was lower than that between the human readers (0.54 vs. 0.62 years, p < 0.01).


All three AI programs predict BA after G&P in the main age range with similar high reliability. Differences arise at the boundaries of childhood.


Embrace a future beyond the atlas

Learn more
Podium Boneage V4