Artifcial intelligence‐based detection of paediatric appendicular skeletal fractures: performance and limitations for common fracture types and locations
Background
Research into artifcial intelligence (AI)-based fracture detection in children is scarce and has disregarded the detection of indirect fracture signs and dislocations.
Objective
To assess the diagnostic accuracy of an existing AI-tool for the detection of fractures, indirect fracture signs, and dislocations.
Materials and methods
An AI software, BoneView (Gleamer, Paris, France), was assessed for diagnostic accuracy of fracture detection using paediatric radiology consensus diagnoses as reference. Radiographs from a single emergency department were enrolled retrospectively going back from December 2021, limited to 1,000 radiographs per body part. Enrolment criteria were as follows: suspected fractures of the forearm, lower leg, or elbow; age 0–18 years; and radiographs in at least two projections.
Results
Lower leg radiographs showed 607 fractures. Sensitivity, specifcity, positive predictive value (PPV), and negative predictive value (NPV) were high (87.5%, 87.5%, 98.3%, 98.3%, respectively). Detection rate was low for toddler’s fractures, trampoline fractures, and proximal tibial Salter-Harris-II fractures. Forearm radiographs showed 1,137 fractures. Sensitivity, specifcity, PPV, and NPV were high (92.9%, 98.1%, 98.4%, 91.7%, respectively). Radial and ulnar bowing fractures were not reliably detected (one out of 11 radial bowing fractures and zero out of seven ulnar bowing fractures were correctly detected). Detection rate was low for styloid process avulsions, proximal radial buckle, and complete olecranon fractures. Elbow radiographs showed 517 fractures. Sensitivity and NPV were moderate (80.5%, 84.7%, respectively). Specifcity and PPV were high (94.9%, 93.3%, respectively). For joint efusion, sensitivity, specifcity, PPV, and NPV were moderate (85.1%, 85.7%, 89.5%, 80%, respectively). For elbow dislocations, sensitivity and PPV were low (65.8%, 50%, respectively). Specifcity and NPV were high (97.7%, 98.8%, respectively).
Conclusions
The diagnostic performance of BoneView is promising for forearm and lower leg fractures. However, improvement is mandatory before clinicians can rely solely on AI-based paediatric fracture detection using this software.