were assessed by a group of 65 dysmorphologists achieving
75% accuracy on the same task.
TABLE II
RESULTS ON THE BINARY PROBLEM OF DETECTING CORNELIA DE LANGE
SYNDROME PATIENTS
Method Accuracy
Rohatgi et al. [55] 75%
Basel-Vanagaite et al. [4] 87%
DeepGestalt 96.88%
2) Angelman Syndrome (AS): This binary experiment focuses on separating Angelman Syndrome (AS) patients from
patients with other syndromes (e.g. Williams, Russell-Silver,
Fragile X, Moebius, DiGeorge, Mowat-Wilson, Aarskog,
Chromosome 1p36 - Microdeletion, Prader-Willi, Kleefstra, Phelan-McDermid, Proteus, Feingold, Coffin-Siris). The
model is trained using 766 AS images as the positive cohort,
and 2699 images as the negative cohort.
In a previous survey done by [56], a group of 20 dysmorphologists were asked to examine a set of 25 patient images
and note which patients had AS and which did not. The
test set included 10 patients with AS (positive cohort) and
15 patients with other genetic syndromes (negative cohort).
However, experts were not aware of the number of patients in
each cohort. The recognition rate reported in the survey was
71% accuracy, 60% sensitivity and 78% specificity.
TABLE III
RESULTS ON THE BINARY PROBLEM OF DETECTING ANGELMAN
SYNDROME PATIENTS
Method Accuracy Sensitivity Specificity
Bird et al. [56] 71% 60% 78%
DeepGestalt 92% 80% 100%
DeepGestalt was evaluated on the same test set and achieved
a recognition rate of 92% accuracy, 80% sensitivity and 100%
specificity (Table III), reducing the error rate by more than
72%.
B. Specialized Gestalt Model
In this section, we describe how DeepGestalt may be used
for a small scale problem, using only a small number of images
per cohort. We focus on the problem of distinguishing between
molecular subtypes of a syndrome which is genetically heterogeneous and derives from genetic errors in the same signaling
pathway.
We use this experiment as an example of a specialized
Gestalt model, aimed at predicting the right genotype from
patients with very subtle phenotype differences.
In 2010, Allanson et al. published The face of Noonan
syndrome: Does phenotype predict genotype [57]. They explored whether dysmorphology experts can predict the Noonan
syndrome related genotype using the facial phenotype. They
presented a set of 81 images of Noonan syndrome patients
to two dysmorphologists. The patients’ genotypes have been
KRAS PTPN11 RAF1 SOS1 RIT1
Fig. 3. Composite photos of Noonan syndrome patients with different genotypes show subtle differences, such as less prominent eye brows in individuals
with a SOS1 mutation, which might reflect the previously recognized sparse
eye brows as an expression of the more notable ectodermal findings associated
with mutations in this gene.
confirmed molecularly as PTPN11, SOS1, RAF1 and KRAS.
The task was to predict the right genotype from a facial image.
Their conclusion was that experts in the field could not succeed
in this task, as written in the article abstract: ”Thus, the facial
phenotype, alone, is insufficient to predict the genotype, but
certain facial features may facilitate an educated guess in some
cases”.
We aim to examine if the technology described in this paper
can perform better and propose a novel way to harness the
Gestalt model technology to solve the problem of predicting
the right genotype.
Fig. 4. Test set confusion matrix for the Specialized Gestalt Model
We collected patient images diagnosed with Noonan syndrome and molecularly diagnosed with a mutation in one of
the following genes: PTPN11, SOS1, RAF1, RIT1 and KRAS.
All of the images were annotated by experts, taken either
from published articles or the internal Face2Gene phenotype
database. A set of 25 images, five images per gene (type), are
excluded from training and used as a test set. Those images
were curated from [58], [59], [57], [60], [61], [62]. To illustrate
the general appearance of each cohort, we create composite
photos by averaging the images of each cohort, see Figure 3.
Using the framework described above, we use this specialized dataset along with our internal dataset and train a
full DeepGestalt Model. The specialized Gestalt model is a
truncated version of the full model and predicts only the five
desired classes. The resultant model is then applied to the
test set and achieves a top-1-accuracy of 64%. This is more
than three times better than the random chance of 20%. The
confusion matrix for this test set is presented in Figure 4. A
similar work using our technology with comparable results can
be seen in [63].
Besides the phenotypes that are caused by mutations in
the MAPkinase pathway, DeepGestalt has also been used
to analyze two further molecular pathway diseases that are
known for their high phenotypic similarity. In GPI-anchor
biosynthesis deficiencies, DeepGestalt was able to reproduce
the phenotypic substructure that was already delineated by
expert clinicians and, beyond that, to deduce significant genespecific phenotypes [64]. For five metabo