How accurate is GestaltMatcher AI? How should I interpret the accuracy?

The accuracy of GestaltMatcher AI strongly depends on the syndrome. This is because syndromes differ in how typical and consistent their facial gestalt is:

Syndromes with a very distinct and characteristic facial gestalt, such as Kabuki syndrome, generally achieve higher accuracy.
Syndromes with more subtle or less consistent facial features, such as Turner syndrome, tend to have lower accuracy.

When measuring accuracy, we need to know how many ranks we are considering. For example, in top-10 accuracy, we look at the cases in which the correct diagnosis appears among the first 10 suggestions from the AI. Likewise, in top-1 accuracy, we only look at the cases where the AI ranked the correct diagnosis in first place.

For syndromes with a very characteristic gestalt, GestaltMatcher AI can reach a top-1 accuracy well above 90%.
For other syndromes with a less consistent gestalt, it may be around 50%.

A top-1 accuracy of 50% may seem modest at first glance, but in the context of rare disease diagnostics it represents a substantial improvement over chance: When the AI must distinguish among a thousand possible syndromes, random guessing would yield a correct top-1 prediction in only about 0.1% of cases. Achieving 50% in this setting is therefore roughly a 500-fold increase over random chance, suggesting the model can provide valuable support in a clinical setting.