We invite clinicians, geneticists, and anyone experienced in dysmorphology to participate in an open annotation challenge.
By contributing your expertise, you'll help improve AI models that support the diagnosis of rare diseases.
Click here to join the quiz and test your diagnostic intuition while advancing next-generation phenotyping.

Making Diagnoses Visible: Joint inference of disorders and facial features by AI

Next-generation phenotyping approaches have demonstrated high accuracy in suggesting diagnoses based on dysmorphic facial features. We propose a novel approach using convolutional neural networks (CNNs) to simultaneously infer Human Phenotype Ontology (HPO) terms and classify disorders based on syndromic faces. By integrating HPO extraction and disorder prediction, our method enables cross-validation of results, improving diagnostic robustness. We aim to enhance both diagnostic support and explainability in AI-driven rare disease classification.

Figure: Images are randomly selected from GMDB. Afterwards, these are annotated by experts using the Face2HPO annotation tool to annotate the absence or presence of HPO terms considered frequent for some disorders.

Through GestaltMatcher Database, we set up a large-scale effort to annotate the presence and absence of HPO-terms. As current annotation practices for HPO terms may reflect differing interpretations and definitions of phenotypic features, we invited multiple experts to annotate the same images to mitigate inconsistencies and enhance robustness. By simultaneously training the model to predict disorders and to characterize the observed facial phenotype, the resulting differential diagnoses can be more robustly assessed and, if necessary, dismissed. Further, since HPO-terms inherently carry spatial information, we can gain new insights into the predicted disorders.

First, as a proof of concept, we aim to annotate ~1000 images, across 10 disorders, focusing on the 72 common HPO-terms for the disorders. Alongside these 72 common terms, we also ask annotators to list any other term they considered important.

We trained several CNNs and compared the performance when predicting clinical features encoded in HPO terms, disorders, and a combination of the two. The latter allowed us to combine the predictions to further improve performance and explainability.

Figure: Flow of our NGP pipeline: (1) our model gets an image as input, (2) the model outputs HPO terms and ranked disorders, and (3) the predicted HPO terms are used to refine the predicted disorders.

Our proposed approach allows for more and better analyses to be conducted straightforwardly. For example, we can refine the resulting disorder classification by using information from the inferred HPO terms. E.g., when HPO terms inferred to be present are commonly associated with a different disorder than rank #1. Further, it inherently improves the explainability of our AI models, improving their usefulness in clinical settings. Additionally, it reduces the subjectiveness of labeling HPO-terms manually and allows clinicians to work more efficiently.

Patient image source: Toker AS, Ay S, Yeler H, Sezgin I. Dental findings in Cornelia de Lange syndrome. Yonsei Med J.. (2009).