J Autism Dev Disord (2016) 46:3006–3022 DOI 10.1007/s10803-016-2843-0
ORIGINAL PAPER
Using the PDD Behavior Inventory as a Level 2 Screener: A Classification and Regression Trees Analysis Ira L. Cohen1 • Xudong Liu2 • Melissa Hudson2 • Jennifer Gillis3 • Rachel N. S. Cavalari3 • Raymond G. Romanczyk3 • Bernard Z. Karmel4 Judith M. Gardner4
•
Published online: 18 June 2016 Ó Springer Science+Business Media New York 2016
Abstract In order to improve discrimination accuracy between Autism Spectrum Disorder (ASD) and similar neurodevelopmental disorders, a data mining procedure, Classification and Regression Trees (CART), was used on a large multi-site sample of PDD Behavior Inventory (PDDBI) forms on children with and without ASD. Discrimination accuracy exceeded 80 %, generalized to an independent validation set, and generalized across age groups and sites, and agreed well with ADOS classifications. Parent PDDBIs yielded better results than teacher PDDBIs but, when CART predictions agreed across informants, sensitivity increased. Results also revealed three subtypes of ASD: minimally verbal, verbal, and atypical; and two, relatively common subtypes of non-ASD children: social pragmatic problems and good social skills. These subgroups corresponded to differences in behavior profiles and associated bio-medical findings. Keywords Level 2 screeners Autism Spectrum Disorder Decision trees Data mining Machine learning Seizures
& Ira L. Cohen
[email protected] 1
Department of Psychology, New York State Institute for Basic Research in Developmental Disabilities, 1050 Forest Hill Road, Staten Island, NY 10314, USA
2
Queen’s Genomics Lab at Ongwanada, Ongwanada Resource Center, Department of Psychiatry, Queen’s University, 191 Portsmouth Ave, Kingston, ON K7M 8A6, Canada
3
Department of Psychology, Binghamton University - State University of N.Y., Binghamton, NY 13902-6000, USA
4
Department of Infant Development, New York State Institute for Basic Research in Developmental Disabilities, 1050 Forest Hill Road, Staten Island, NY 10314, USA
123
Monoamine Oxidase A Genotype Phenotype Subgroups
Introduction Autism Spectrum Disorder (ASD) is a behaviorally defined condition with both etiological and phenotypic heterogeneity (Benvenuto et al. 2009; Gillberg and Coleman 2000; Wenger et al. 2016). Numerous attempts have been made to identify specific behavioral subgroups within this heterogeneity (Wing and Gould 1979). A recent rigorous multiple measure attempt (Eagle et al. 2010), using finite mixture modeling analysis, produced two latent components or subgroups. The groups did not differ on autism symptom severity, but did differ significantly in IQ, receptive language, and social interactions. The two groups did not differ by age or proportion of males or females, but differed in terms of diagnoses (Autistic Disorder vs. Asperger’s Disorder and PDD-NOS). Given that these previous disorders were separate in DSM-IV TR and are now a single disorder, ASD, in DSM5, screening for individuals who might fall under this rubric is challenging. Screening is critically important to the process of identifying children with ASD since research demonstrates the importance of behavioral intervention in mitigating severity and improving outcome (Fein et al. 2013; Tonge et al. 2014; Estes et al. 2015; Mark and Civic 2015; Mohammadzaheri et al. 2015; Kasari et al. 2015; Schreibman et al. 2015). Such identification is usually a two-step process in which a concerned parent/caregiver, and pediatrician or other professional may suspect a problem (which may be confirmed by a Level 1 screening instrument) which then prompts diagnostic confirmation through a more thorough comprehensive evaluation that
J Autism Dev Disord (2016) 46:3006–3022
may include a Level 2 screener [designed for such ‘‘atrisk’’ children (Zwaigenbaum et al. 2015)]. Typical metrics for such screeners include: (1) sensitivity (SE) (the percent of cases with ASD classified by the instrument as ASD); (2) specificity (SP) (the percent of cases without ASD classified as not having ASD); (3) positive predictive validity (PPV) (the percent of cases accurately predicted as having ASD; and (4) negative predictive validity (NPV) (the percent of cases accurately predicted as not having ASD). There have been a number of research and review articles examining the efficacy of both Level 1 and Level 2 screeners for ASD (Stephens et al. 2012; Dereu et al. 2012; Allison et al. 2012; Schanding et al. 2012; Yama et al. 2012; Kamio et al. 2014; Wiggins et al. 2014; Fein et al. 2016; Wetherby et al. 2008; Bolte et al. 2008; Hampton and Strand 2015; Zwaigenbaum et al. 2015). These reports indicate that efficacy can vary depending on a host of factors including the content of the instrument, the age at which the instrument is employed, whether the instrument is designed for Level 1 or Level 2 screening, and the type of metric to be maximized. Unlike SE and SP, for example, both PPV and NPV are dependent upon the base rate of the disorder in the screening sample. Since ASD is a relatively infrequent condition in the general population, PPV can be quite low even with good or even excellent SE and SP indices identified from small clinical samples. Since Level 1 screening is usually targeted at identification of ASD in the general population, it can therefore misclassify a high number of cases without ASD as having the disorder. Further complicating the issue is the fact that ASD is a moving target since signs may emerge at different ages which would have an effect on SE. For example, Stenberg et al. (2014) found, in a population-based sample, that screening for ASD with the M-CHAT at 18 months picked up only about one-third of children later identified as having the disorder, these most likely the ones more severely affected. Therefore, from a practical point of view, any concern regarding a child’s development from parents or professionals, irrespective of whether or not a Level 1 screening was done, should prompt more formal assessment. Level 2 instruments are concerned with this much smaller pre-identified group and therefore the low base rate of ASD is less of an issue. Instead, specificity is important; hence the need for well-validated Level 2 screeners. However, there are additional concerns regarding accurate identification of ASD since results can vary with the characteristics of the contrasting samples who come to the attention of diagnosticians. For example, when children with ASD are contrasted with an unaffected comparison sample such as typically developing siblings (Schanding et al. 2012), both SE and SP can be quite high but when contrasted with a more clinically defined sample, both SE and SP often suffer (Eaves et al. 2006; Muratori et al. 2011; Havdahl et al. 2016). In a previous study, it was found that the PDD
3007
Behavior Inventory (PDDBI), a rating instrument originally designed to assess intervention outcome (Cohen et al. 2003b), had an SE of 100 % and an SP of 79 % for the Autism Composite score when children with ASD were contrasted with Not-ASD cases who were typically developing (Cohen et al. 2010). However, results did not fare as well in a separate investigation where ASD cases were contrasted with a clinically defined sample, some of whom were positive for ASD on the ADOS (Reel et al. 2012) (SE = 74 % and SP = 62 %). These percentages were higher in their subset of cases with non-verbal IQs B70: SE = 92 % and SP = 67 %. As others have noted (Gillberg 2014), there frequently exists overlap in behavioral characteristics and co-morbidity amongst DSM disorders, and this is especially so for ADHD (Reiersen 2011; Antshel et al. 2013; van der Meer et al. 2012) which, unlike in the past, can now be diagnosed as co-occurring with ASD. It is suggested, therefore, that attempts to discriminate disorders from each other that are highly similar on multiple dimensions based on a single additive summary cut-off score from a rating scale may not be the most fruitful approach in this endeavor. In an earlier publication, it was argued that the use of machine learning tools such as neural networks would be helpful in this regard (Cohen et al. 1993) and preliminary data were quite promising. Briefly, neural networks are computer simulations of simple nervous systems and are instantiated as neurons and synapses created through software. They are designed to learn by a variety of rules that alter the strength of the synaptic connections among neurons in the network and can learn very complex problems. However, the manner in which neural networks arrive at their ‘‘decisions’’ can be difficult to ascertain as the strengths of the synaptic connections exist in a complex, non-linear multidimensional space. So while neural networks are capable of complex forms of learning, ascertaining the rules or patterns they use to solve these problems can be quite opaque. Another machine learning approach which yields more transparent decisions is the Classification and Regression Trees (CART) algorithm (Breiman et al. 1984; Nisbet et al. 2009). Basically, CART is a non-linear, non-parametric algorithm that builds up a series of ‘‘if–then’’ logical partitions on user selected input variables to predict group membership (in the case of classification analysis) resulting in a ‘‘tree’’ in which the branches represent the partitions and the nodes (i.e., the ‘‘leaves’’) the resultant categories. Thus, the decision processes are much easier to see than those in a neural network. For each binary partition (e.g., ASD vs. Not-ASD), all predictor variables are assessed for which of them results in the greatest improvement in a goodness of fit measure [e.g., the Gini index (Breiman et al. 1984), a measure directly proportional to the area under the curve in an ROC analysis
123
3008
(Dell Inc. 2015)]. The partitioning procedure is repeated recursively at each node resulting in a tree that could theoretically grow until all cases are perfectly classified. Unfortunately, due to various sources of error, often idiosyncratic to the data on which the CART is ‘‘trained’’, such an approach is unrealistic and the resultant model will likely fail to generalize to other data sources. In order to prevent such ‘‘over-fitting’’, various procedures have been developed to decide when to stop partitioning. These include setting a minimum fraction of cases at each node and examining the ability of the tree to generalize its predictions to a separate data (‘‘test’’) set. The use of CART to discriminate ASD from ADHDlike cases has been explored (Cohen 2013) and this type of data mining has been used to help in predicting outcome in ‘‘baby sib’’ studies (Chawarska et al. 2014; Macari et al. 2012). These studies resulted in reasonably good group classification and yielded clinically relevant sub-groups. Based on the above findings, the present study was focused on using CART with parent and teacher PDDBI data from a large, multi-site sample of children. It was predicted that this methodology would result in a diagnostic algorithm that would: (1) maintain SE and improve SP; (2) validate the use of the PDDBI as a Level 2 screening tool; and (3) yield potentially meaningful subgroups of children with and without ASD in order to better help define these conditions. Meaningfulness here was determined by the extent to which subgroups identified by CART were uniquely associated with a variety of developmental-behavioral and bio-medical factors.
J Autism Dev Disord (2016) 46:3006–3022
Inclusion criteria were as follows: 1.
2.
ASD (n = 535) and OTHER (n = 76) cases had to have DSM-IV or DSM-IV TR clinical diagnoses that were also supported by ADI and/or ADOS data (Autism or Spectrum counted as ASD here). These cases were from the BAR, Q-GLO, and ICD sites. OTHER cases consisted of children from the BAR and ICD sites who had clinical diagnoses indicating problems with behavior, and/or language/intellectual development and who also scored negative for ASD on the ADOS-G and/or ADI-R. Cases that did not match the ADI or ADOS results or for whom such ‘‘gold standard’’ scores were not available or were not applicable because of the age and language age requirements of these instruments were excluded so as to maximize performance of the model that was developed (see below). There were 83 such cases and the ability of the model to predict their diagnoses is described in ‘‘Results’’. TYPICALS (n = 49) consisted of 18 unaffected siblings of children with ASD, one child who was reported to have had ASD in the past but no longer showed signs of ASD on diagnostic evaluation with the ADOS-G, (all from the BAR site), and the rest, from the INFANT source, scored in the typically developing range on a variety of developmental scales. The unaffected siblings were older (mean age = 6.6 years) than the children from the INFANT source (mean age = 3.4 years) but their mean PDDBI T-scores did not statistically differ from the INFANT cases once age differences were statistically controlled (F = 0.21, p [ 0.65) suggesting that the siblings were not perceived by their parents to have broader phenotype issues.
Methods Participants A total of 660 participants between the ages of 1.5–13 years were in the sample. All had complete PDDBI forms with no missing data (660 parent and 263 teacher forms) and consisted of cases with ASD, cases referred for diagnostic evaluation where ASD was ruled out but who presented with a non-ASD developmental disorder (designated as OTHER), and a typically developing group (designated as TYPICAL). Sources were as follows: New York State Institute for Basic Research in Developmental Disabilities (IBR) Behavioral Assessment and Research Laboratory (BAR); IBR Infant Development Laboratory (INFANT); Queen’s Genomics Lab at Ongwanada (QGLO); and SUNY Binghamton Institute for Child Development (ICD). All research studies were approved by the source’s Institutional Review Board.
123
Almost all parent PDDBI forms were completed by mothers. Educational levels were available for 369 of them and 99 % had at least a high school education with 75 % having at least some college. Of 574 parents, 78 % reported that they were Caucasian. The teacher forms were completed by a variety of different professionals including special education teachers, teaching assistants, behavior analysts, and speech and language pathologists. Males were more prevalent than females in this dataset, as expected. Eighty-four percent of the ASD, 78 % of the OTHER and 57 % of the TYPICAL groups were males (v2(2) = 17.9, p \ .000). When comparing the ASD and OTHER groups, the gender proportions no longer significantly differed (v2(1) = 2.0, ns). The OTHER and TYPICAL groups were approximately one year younger [mean (SD) = 4.6 (2.3), and 4.6 (2.4), respectively) than the ASD group (mean (SD) = 5.7 (2.9)]. In the quantitative analyses described below examining the
J Autism Dev Disord (2016) 46:3006–3022
3009
Table 1 PDDBI domain and autism composite descriptions Abbreviation AWP
Description and characteristics Approach-withdrawal problems dimension. Higher domain T-scores indicate greater severity
SENSORY
Sensory/perceptual approach behaviors—staring at objects, pica, repetitive toy play, hand flapping, etc.
RITUAL
Ritualisms/resistance to change—carrying out rituals or indicating dissatisfaction with a change in the environment or routine
SOCPP
Social pragmatic problems—problems reacting to the approaches of others, understanding social conventions, or initiating social interactions
SEMPP
Semantic/pragmatic problems—echolalia, perseverative language, unusual voice quality, etc.
AROUSE
Arousal regulation problems—emotional constriction, hyperactivity, sleeping problems, etc.
FEARS AGG
Specific fears—fears and anxieties associated with withdrawal from social or asocial stimuli Aggressiveness—Aggressiveness toward self or others and associated negative mood states
REXSCAa
Receptive/expressive social communication abilities dimension. Higher domain scores indicate increasing levels of competence
SOCAPP
Social approach behaviors—non-vocal social communication skills such as paying attention, joint attention, effective use of gesture, imaginative skills, social play skills, imitation skills, etc.
EXPRESS
Expressive language—ability to speak sounds associated with the English language as well as competence with grammar, tone of voice, and conversational pragmatics
LMRL
Learning, memory, and receptive language—memory for locations and movement sequences, understanding possessives, prepositions, adverbs, etc.
AUTISM/C
Autism composite—a measure of lack of appropriate social communication skills along with repetitive/ritualistic behaviors
a
Each of these domains in the REXSCA dimension is highly correlated with tested IQ [Pearson r (n = 76) ranging from 0.63 to 0.77] and with the Vineland Communication Domain score [Pearson r (n = 238) ranging from 0.52 to 0.69] (Cohen and Sudhalter 2005)]
characteristics of sub-groups identified by the classification tree, age served as a covariate. For the classification tree analysis itself all of the PDDBI predictors were age standardized on an ASD sample. Genotyping for the X-Linked MAOA-uVNTR Polymorphism Previous research had indicated that the 3-repeat uVNTR allele of the X-linked MAOA gene, linked to low activity of the enzyme monoamine oxidase A, was associated with increased severity of ASD and decreased cognitive skills in boys with ASD from simplex families (i.e., with no ASD siblings) (Cohen et al. 2003a). Seventy-nine boys with ASD from simplex families (a subset of the larger ASD sample) were studied. Twenty-six males had the low activity 3-repeat allele (33 %) and 53 (67 %) carried the 4-repeat high-activity allele, percentages similar to that seen in the general male population (cf. Cohen et al. 2003a). These results were used to examine associations between the 3-repeat allele and the CART subgroups for the ASD sample. All genotyping was performed blind to the behavioral assessments using methods described previously (Cohen et al. 2003a). Seizure History and Neurogenetic Disorders For BAR and Q-GLO cases, this information was available and abstracted from the introductory section of the
PDDBI which asks questions about co-morbid seizure history and other health related conditions. The neurogenetic disorders refers to reports of abnormal neural development documented by neuroimaging and/or neurological evaluation and to genetic syndromes. These reports were used to evaluate the external validity of the subgroups identified by CART to medical conditions often associated with ASD.
Behavioral Assessments PDD Behavior Inventory (PDDBI) The PDDBI has been described in detail elsewhere (Cohen et al. 2010, 2003b; Cohen and Sudhalter 2005). Briefly, it is an informantbased rating instrument constructed, a priori, in a hierarchical manner, and divided into two behavioral dimensions: (a) Approach Withdrawal Problems (AWP); and (b) Receptive/Expressive Social Communication Abilities (REXSCA). Each dimension is composed of behavioral domains best reflecting that dimension. The PDDBI generates age-normed (1.5–12.5 years) T-scores [mean (SD) = 50 (10)] for each domain. The T-scores are normally distributed within the reference sample. Scores less than or greater than 50 indicate greater deviations from a typical case of autism. Brief descriptions of the domains and other scores in the parent version, age standardized on 369 well-diagnosed children, are presented in Table 1.
123
3010
Cognitive and Adaptive Assessments Cognitive assessments with the Griffiths Mental Development Scales (GMDS; Griffiths 1984) were obtained in a subset of 80 cases. The GMDS assesses six aspects of cognitive functioning in young children including locomotor (gross motor) skills, personal-social skills, hearing and speech, eye-hand coordination (fine motor skills), performance (non-verbal skills) and reasoning. Each skill area yields a developmental quotient (based on MA/CA) as well as an overall General Quotient (GQ). Adaptive skills were assessed with the Vineland Adaptive Behavior Scales (Sparrow et al. 1984) or Vineland Adaptive Behavior Scales-II (Sparrow et al. 2005) in 353 cases. Classification Tree Analysis Model Development The General Regression and Classification Trees module from the Statistica 12 package (Dell Inc. 2015) was used (where more details about its methods can be found). The outcome variable to be predicted was group membership (ASD vs. Not-ASD), i.e., the ASD cases were contrasted with the combined TYPICAL and OTHER groups (designated Not-ASD). This was done to maximize the sample size of the Not-ASD sample thereby allowing the classification tree algorithm to identify these two groups based on their profiles of PDDBI T-scores and isolate them from the ASD group. Also, no attempt was made to match these groups on cognitive or adaptive skills. Instead, we assumed the algorithm would be able to identify which patterns were associated with low or high levels of functioning since some of the PDDBI domains (see Table 1 note) are highly correlated with adaptive and cognitive ability within the ASD standardization sample. Since specificity is arguably more important than sensitivity for Level 2 screeners, the parameters of the model were adjusted so as to maximize the influence of the NotASD group on the cut-off scores adopted by the algorithm. First, prior probabilities for group membership were set to 0.5 thereby giving equal weight to the Not-ASD group even though the sample size was smaller. In addition, NotASD cases were weighted more heavily (approximately 2:1) than ASD cases. This resulted in an effective doubling of Not-ASD cases thereby increasing their contribution to the derivation of cut-off scores with the aim of improving specificity rates. Predictor variables were informant form (parent or teacher form) and all domain T-scores from the PDDBI except for the LMRL domain. The latter was not included since, when the intercorrelation matrix was examined, it was highly correlated (and hence redundant) with the SOCAPP (r = 0.75) and EXPRESS (r = 0.77) domains. All other domain T-score intercorrelations were less than 0.70. Nine
123
J Autism Dev Disord (2016) 46:3006–3022
hundred twenty-three parent and teacher PDDBI forms were used in the CART analyses. The data used to train the classification tree algorithm constituted the training set. A separate test set was used to evaluate generalization during the model building process. Since both the training and test sets were used to develop the model, it was important to have an independent validation set (not used in the model building process) to evaluate generalization. In order to limit the size of the tree, thereby reducing the effects of idiosyncratic scores and improving generalization, the Fraction of Objects (FACT) stopping rule was applied such that the tree stopped growing when a node contained less than N forms and/ or contained less than 5 % of one of the two groups in the data (this defined the terminal node). A variety of models were built, varying the size of the stopping rule in terms of the minimum number of cases at which a node would be split, and the tree that resulted in the best generalization to the test set was selected. The optimal stopping size for this data set was 100 forms (weighted) as this stopping size maximized test set generalization. For the above analyses, the data were initially sorted by group, informant form, and age. Then, for each group and form, cases were sequentially assigned codes as follows: 1, 1, 2, 3, 1, 1, 2, 3 and so on, where 1 refers to the training set data, 2 to the test set data and 3 to the validation set data. This resulted in approximately 25 % of the cases assigned to the validation set. Of the remaining forms 67 % were thus assigned to the training set and 33 % to the test set.
Results The results described below are divided into several sections. First, we describe the classification tree that emerged from the analyses in terms of its features; its ability to predict group membership in the validation set for each type of informant taken separately, and for overall ability to predict membership in the ASD, OTHER and TYPICAL groups across informants, age group, ADOS grouping, and data source. Next, the various subgroups that emerged from the analysis were examined for their association with PDDBI domain scores, Vineland scores, Griffiths DQ scores, parent reported seizure history, parent reported neuro-genetic comorbidity (i.e., known genetic syndromes or neurological conditions, described below) and, in ASD males from simplex families, their association with the X-linked 3-repeat monoamine oxidase-A VNTR allele which was found to be linked to ASD severity (Cohen et al. 2011, 2003a, b), as noted above. Finally, we present data on sex ratios and early development for the various subgroups.
J Autism Dev Disord (2016) 46:3006–3022
3011
Classification Tree Features and Accuracy Across Data Sets
Examination of the terminal nodes revealed three types of ASD subgroups and seven types of Not-ASD subgroups in the training set. The initial split was on non-verbal social skills (SOCAPP) which can be continued into the rightmost split identifying those with very high scores in this domain ([2 SDs from the ASD mean) and a separate group with scores that were somewhat lower (between 67 and 72), both linked to the Not-ASD group. The latter node was not split further because there were only 65 forms (weighted N) comprising it, less than the 100 forms cut-off in the stopping rule, and so it became a terminal node. At that node (ID = 19), the numbers of ASD cases were higher than Not-ASD cases but their proportion relative to all ASD cases was lower than Not-ASD cases in the training set and so CART identified that node as belonging to the Not-ASD group. The leftmost branches identified various subgroups which were characterized by much
Classification Tree Figure 1 shows the final classification tree which can be read as a flow chart. It consists of 10 Terminal nodes and used 6 of the PDDBI domain T-scores: SENSORY, RITUAL, SEMPP, AGG, SOCAPP and EXPRESS. The bar graphs show the weighted frequency of forms in each group comprising that node. Informant form was not selected by the algorithm as a predictor. Ns refer to the weighted numbers of forms at a given node and, for the Not-ASD group, are inflated slightly due to the extra weight assigned to that group. ID numbers for the nodes refer to the successive splits that took place as the tree grew in size.
Classification Tree
NOT-ASD ASD
Num. of non-terminal nodes: 9, Num. of terminal nodes: 10 ID=1 N=591 NOT-ASD
SOCAPP <= 66 ID=2
> 66
N=469 ASD
ID=3 N=122 NOT-ASD
EXPRESS
SOCAPP
<= 59 ID=4
N=337 ASD
> 59
<= 72
> 72
ID=5 N=132 NOT-ASD
ID=18 N=65 NOT-ASD
ID=19 N=57 NOT-ASD
AGG <= 77 ID=6
AGG > 77
N=325 ASD
<= 69 ID=10
ID=7 N=12 NOT-ASD
SENSORY <= 44 ID=8
ASD
ID=9
ID=11 N=12 NOT-ASD
SEMPP > 44
N=98
> 69
N=120 ASD
N=227 ASD
<= 41
> 41 ID=13
ID=12 N=14 NOT-ASD
N=106 ASD
RITUAL <= 76 ID=14
> 76 ID=15 N=3 NOT-ASD
N=103 ASD
AGG <= 39 ID=16 N=15 NOT-ASD
Fig. 1 This graph shows the classification tree that emerged using CART. It consists of 10 terminal nodes with multiple branches. It can be read as a flow chart starting at the top. Thus, the first classification used the SOCAPP T-score to separate ASD from Not-ASD cases using a cutoff score of 66. Those forms with SOCAPP T-scores B66 were then examined for their EXPRESS T-scores and those forms
> 39 ID=17
ASD
N=88
with T-scores B59 were assigned to the ASD group, those without to the Not-ASD group, and this partitioning continued as indicated. Ns at each node refer to the weighted Ns and ID numbers refer to the successive leaves as the tree grew in size during the learning process (see text)
123
3012
J Autism Dev Disord (2016) 46:3006–3022
greater delays in non-verbal social (SOCAPP) and expressive (EXPRESS) language skills (more similar to T-scores characteristic of cases with ASD) along with problems with aggressiveness (AGG) and sensory behaviors (SENSORY). Nodes 8 and 9 identified two sub-types of ASD and node 8 was not split further, again because of the stopping rule (there were \100 forms comprising that node in the training set). The middle branches identified cases with better expressive language skills but with problems with aggressiveness (AGG), ritualistic behavior (RITUAL) and pragmatic language problems (SEMPP). The unweighted relative frequency of the terminal nodes and the mean (SE) age of the children in each of the terminal nodes in the entire data set are shown in Table 2 separated by informant form. The relative frequencies of each node were similar across informants indicating that these subgroups were prevalent to the same degree in both parent and teacher forms. Of the seven Not-ASD terminal nodes, 5 (Nodes 7, 11, 12, 15 and 16) represented rare subgroups, accounting for less than 4 % of the total parent or teacher sample. Three of these nodes (7, 11 and 15) were relative outliers in that the T-scores for the AGG and RITUAL domain cut points were [2 SDs from that expected for children with ASD. Most likely these represented idiosyncratic over-reporting parent/teacher rating styles but they were present in data from all sources. It is also possible, however, that these reflect a small subset of cases with severe behavior problems (Cohen 2013). Overall Classification Results In order to characterize the classification results, the following criteria were adopted: \70 (poor); 70–79 (fair); 80–89 (good); and 90–100 (excellent) (Cicchetti et al. 1995). This classification tree performed well in
Table 2 Percentage of all children with parent PDDBIs or teacher PDDBIs at each terminal node and mean (SE) ages of the children in years
Terminal node
classifying ASD and Not-ASD groups in all data sets, as shown in Table 3, which shows SE, SP, and their 95 % confidence levels across training, test, and validation sets for the combined parent and teacher data and for each informant separately. Sensitivity and specificity results for the parent data, ranged from good to excellent but SP scores were poor for the teacher forms and percentages became progressively worse in the test and validation sets. This may have been, in part, related to the fact that only two TYPICAL cases had teacher forms. However, as shown in Table 4, parent SP results remained good when the Not-ASD group was split into OTHER and TYPICAL groups, with the latter perfectly classified. It is more likely that the sample size of the Not-ASD teacher data (including typical cases) was not large enough for the algorithm to determine optimal domain cut-point scores for these informants. As shown in Table 5, for the parent forms, overall Positive Predictive Validity (PPV) for ASD versus OTHER was excellent (97 %) but Negative Predictive Validity (NPV) was poor (41 %), due to the differences in relative frequencies of the groups which affects predictive validity measures as noted above. For this reason, the table also shows the positive (LR?) and negative (LR-) likelihood ratios for these same comparisons (D. Robins, personal communication). LR? is defined as the ratio of sensitivity to the false positive error rate (SE/(1 - SP)) (Jekel et al. 2007), in much the same way that an ROC curve is determined by plotting SE versus (1 - SP), while LR- is the ratio of the false negative error rate to the specificity. The larger the LR? and the smaller the LR- , (i.e., the closer it is to zero) the better the test. Both of these measures are independent of the prevalence of the condition that the test is designed to predict.
Parent (% of 660 cases)
Teacher (% of 263 cases)
7
2
2
3.9 (0.5)
8
18
23
6.0 (0.3)
9
36
40
5.7 (0.2)
11
2
0
4.3 (0.5)
12**
2
2
7.0 (3.1)
15
\1
\1
2.9 (0.5)
16
3
4
5.3 (0.7)
17
16
13
5.8 (0.3)
18
12
9
5.3 (0.3)
19*
8
6
3.5 (0.2)
Relative terminal node frequencies were similar across informants * Significantly (Tukey HSD, p \ .01) younger than nodes 8, 9, 12, 17, 18 ** Significantly older than node 7
123
Mean (SE) age
J Autism Dev Disord (2016) 46:3006–3022 Table 3 Sensitivity (SE), specificity (SP) and 95 % confidence intervals for train, test, and validation sets for cart result by informant, along with positive (PPV) and negative (NPV) predictive validities
3013
SE (95 % CI)
N
SP (95 % CI)
N
PPV (%)
NPV (%)
Parent and teacher Train
82 (77–85)
380
81 (71–87)
93
95
52
Test
86 (80–90)
185
79 (64–88)
42
95
56
Validation
85 (79–90)
183
78 (63–88)
40
95
53
Train
82 (77–86)
272
88 (78–94)
66
97
54
Test
83 (76–89)
132
87 (71–95)
31
96
55
Validation
86 (79–91)
130
93 (78–98)
29
98
60
81 (72–87)
108
63 (44–78)
27
90
45
Test
92 (82–97)
53
55 (28–79)
11
91
60
Validate
83 (71–91)
53
36 (15–65)
11
86
31
ASD % (95 % CI)
N
OTHER % (95 % CI)
N
TYPICAL %
N
Train
82 (77–86)
272
80 (66–90)
41
100
25
Test
83 (76–89)
132
79 (57–91)
19
100
12
Validation
86 (79–91)
130
88 (66–97)
17
100
12
Parent only
Teacher only Train
Table 4 CART classification accuracy for parent PDDBI for train, test, and validation sets across diagnostic groups
Parent
Teacher results from Table 3 apply here as well since there only two TYPICAL cases with teacher forms
Table 5 CART classification accuracy collapsed across data sets for parents, teachers, and 76 % of cases where CART predictions of ASD or Not-ASD agreed ASD (SE) % (95 % CI)
N
OTHER (SP) % (95 % CI)
N
LR? (95 % CI)
LR- (95 % CI)
Parent*
83 (80–86)
534
82 (72–89)
77
4.58 (2.85–7.37)
0.20 (0.16–0.25)
Teacher**
84 (79–88)
214
53 (39–67)
47
1.80 (1.32–2.45)
0.30 (0.20–0.45)
P–T agree***
93 (87–96)
168
81 (62–91)
26
4.86 (2.21–10.69)
0.08 (0.04–0.15)
* PPV (ASD vs. OTHER) = 97 %, NPV (OTHER vs. ASD) = 41 % ** PPV (ASD vs. OTHER) = 89 %, NPV (OTHER vs. ASD) = 42 % *** PPV (ASD vs. OTHER) = 97 %, NPV (OTHER vs. ASD) = 66 %
For the parent form, LR? (95 % CI) was 4.58 (2.85–7.37). This corresponds to a small to moderate increase in likelihood that a child with ASD will have a predicted diagnosis of ASD. LR- (95 % CI) was 0.20 (0.16–0.25) or a small to moderate decrease in likelihood that a child with ASD will have a predicted diagnosis of Not-ASD (cf. http://omerad.msu.edu/ebm/diagnosis/diag nosis6.html). Teacher classifications did not fare as well.
Selecting only those cases in which informant CART classifications agreed yielded a sensitivity of 93 %, specificity of 81 %, PPV of 97 %, NPV of 66 %, LR? of 4.86, and LR- of 0.08. Both LR? and LR- now yielded moderate to large increases in likelihood of an ASD or NotASD diagnosis given the CART predictions.
Combining Informants and Classification Accuracy
Children were grouped into three age cohorts: 1.5 to less than 3; 3–5; and 6–13 years. Overall classification accuracies for parents and teachers are shown in Table 6. For the parent forms, there was a significant improvement in classification accuracy for the 6–13 year cohort relative to age cohort 3–5 (v2(1) = 11.16, p \ .001) and relative to age cohort 1.5–\3 years (v2(1) = 8.3, p \ .004). Although
Although teacher forms did not yield impressive results, combining parent and teacher classifications that agreed with one another resulted in improvements in SE, PPV, NPV and LR- as shown in the last row of Table 5. Parent and teacher CART classifications agreed 76 % of the time.
Classification results by Age Group
123
3014
J Autism Dev Disord (2016) 46:3006–3022
Table 6 CART classification accuracy across age groups (years) for parent and teacher PDDBI forms ASD (SE) % (95 % CI)
N
OTHER (SP) % (95 % CI)
N
LR? (95 % CI)
LR- (95 % CI)
Parent 1.5–\3
79 (74–89)
108
81 (45–83)
21
2.48 (1.35–4.56)
0.26 (0.16—0.44)
3.0–5.0
79 (79–88)
221
80 (72–93)
41
5.76 (2.74–12.08)
0.19 (0.13–0.26)
6.0–13.0
91 (84–93)
204
87 (62–96)
15
6.69 (1.84–24.33)
0.12 (0.08–0.19) 0.22 (0.10–0.51)
Teacher 1.5– \ 3
81 (82–93)
47
89 (35–88)
9
2.55 (1.00–6.48)
3.0–5.0
87 (81–93)
102
43 (46–79)
28
2.48 (1.50–4.09)
0.18 (0.10–0.33)
6.0–13.0
80 (73–92)
59
63 (31–86)
8
2.26 (0.92–5.67)
0.24 (0.11–0.55)
SE and SP were similar for the youngest age groups, there was improvement in both LR? and LR- as age increased. Interestingly, teacher results were similar to the parent results, but only for the youngest group.
Table 7 CART Classification accuracy for parent and teacher PDDBI forms for the BAR and ICD sites ADOS group
Source
% ASD
% OTHER
N
BAR
26
74
46
Parent PDDBI
Classification Results by Data Source SE and SP were examined across the two sites which specialized in diagnostic assessment of ASD: BAR and ICD in order to assess generalization across different clinics, an important concern for usefulness of the PDDBI as a screener. Predictions as a function of clinical diagnosis and ADOS classifications (in children 30 months of age and older) into Not Spectrum, Spectrum, and Autism are shown in Table 7. There were no significant differences across site and the CART classifications agreed reasonably well with the ADOS classifications, supporting the generalizability of the CART model. Classification Results for Cases Not Having ADI or ADOS Assessments As noted above, there were 83 children who did not have these evaluations for a variety of reasons, 69 with clinically defined ASD and 14 with other behavioral issues but without ASD. For this independent sample, both SE and SP were 80 %. Classification Tree Subgroups: Primary Features PDDBI Domain T-Score Profiles As described above, the classification tree resulted in 5 relatively frequent subgroups: 3 ASD subgroups and 2 NotASD subgroups. These subgroups were examined in terms of their Parent PDDBI profiles in order to better describe their characteristics and these are described below (teacher forms yielded similar findings) using repeated measures ANCOVA with age serving as a covariate (Fig. 2).
123
NOT SPECTRUM NOT SPECTRUM
ICD
24
76
21
SPECTRUM
BAR
69
31
35
SPECTRUM
ICD
83
17
6
AUTISM
BAR
86
14
188
AUTISM
ICD
81
19
27
NOT SPECTRUM
BAR
53
47
36
NOT SPECTRUM
ICD
57
43
14
SPECTRUM SPECTRUM
BAR ICD
85 100
15 0
27 2
AUTISM
BAR
83
17
115
AUTISM
ICD
94
6
16
Teacher PDDBI
Clinical diagnoses
SE %
SP %
Parent PDDBI* BAR
82
81
338
ICD
80
81
61
Teacher PDDBI** BAR
84
50
201
ICD
92
62
37
Results are for ADOS Classifications in Children C30 months of Age and SE/SP % for Clinical Classifications for All Ages * BAR: PPV (ASD vs. OTHER) = 96 %, NPV (OTHER vs. ASD) = 46 %; ICD: PPV (ASD vs. OTHER) = 89 %, NPV (OTHER vs. ASD) = 68 % ** BAR: PPV (ASD vs. OTHER) = 89 %; NPV (OTHER vs. ASD) = 61 %; ICD: PPV (ASD vs. OTHER) = 81 %, NPV (OTHER vs. ASD) = 80 %
ASD Nodes Terminal node 8 participants had a profile characterized by relatively low AWP T-scores and REXSCA T-scores at the
J Autism Dev Disord (2016) 46:3006–3022 PDDBI PROFILES ACROSS CART SUBGROUPS Current effect: F(40, 5800)=113.88, p<0.0001 Vertical bars denote 0.95 confidence intervals
90 80 70
T-SCORE
Fig. 2 The patterns of parent PDDBI domain and autism composite (AUTISM/C) T-scores across the five most common subgroups that emerged from the CART algorithm. All groups showed different patterns of behavior problems and social communication skills and these patterns were used to help define the subgroup names (see text). The F indicates the group by domain interaction
3015
ATYPICAL ASD MINIMALLY VERBAL ASD VERBAL ASD SOCIAL PRAGMATIC BEHAVIOR PROBLEMS GOOD SOCIAL SKILLS
60 50 40 30 20 10 SENSORY SOCPP AROUSE AGG EXPRESS AUTISM/C RITUAL SEMPP FEARS SOCAPP LMRL
DOMAIN/COMPOSITE
expected mean of 50 and these cases were designated as Atypical ASD subgroup because of their below ASD average AWP T-scores in the presence of problems with both non-verbal and verbal social communication. Terminal node 9 participants were characterized by AWP domain T-scores that were about 1/2 SD above the expected mean, except for the SEMPP domain, a measure of repetitive language, and REXSCA domain T-scores that were correspondingly low, about 1/2 SD below the expected mean. These cases were designated as Minimally Verbal ASD subgroup. Terminal node 17 participants had AWP domain T-scores that were at or slightly above the expected mean except for the SENSORY domain score which was 1/2 SD below the expected mean, and REXSCA domain T-scores that were relatively high, especially for expressive language. These cases were designated as Verbal ASD subgroup. Not-ASD Nodes Terminal node 18 participants had relatively low AWP T-scores but these T-scores were still significantly higher than terminal node 19 participants on the SENSORY, RITUAL, SOCPP, SEMPP, and AROUSE domains (all p values \0.015) indicating mild problems with arousal regulation, social and language pragmatics, and rigidity. Their REXSCA domain T-scores were about 1 SD above the expected ASD mean but were lower than terminal node 19 participants which had T-scores ranged from 2 to 3 SDs
above that expected for a child with ASD, along with low AWP T-scores averaging about 1 SD below the expected ASD mean. Therefore, terminal node 18 cases were designated as Social Pragmatic Behavior Problems subgroup while terminal node 19 cases were designated as Good Social Skills subgroup. DSM-IV Diagnoses DSM-IV diagnoses that were provided for BAR cases for the 3 ASD subgroups, the Social Pragmatic Behavior Problems and the Good Social Skills subgroups are shown in Table 8 using the profiles based on parent report. Due to rounding, percentages may not sum to 100 %. About 77 % of the Atypical ASD subgroup were classified as Autistic Disorder and 18 % with PDD-NOS. Eighty-eight percent of the Minimally Verbal ASD subgroup was classified with Autistic Disorder. The Verbal ASD subgroup was equally likely to be classified as Autistic Disorder or PDD-NOS and more likely than the other ASD subgroups to have a label of Asperger’s Disorder. Those in the Social Pragmatic Behavior Problems subgroup in the BAR sample were more likely to be classified with an ASD (PDD-NOS the most common) as well as ADHD. Seventy-three percent of Asperger cases fell into the Verbal ASD subgroup. Finally, 71 % of the Good Social Skills subgroup was classified with a disorder other than ASD. This indicated that the CART was able to isolate a group based on the SOCAPP domain that was more likely to not have ASD and, indeed,
123
3016
J Autism Dev Disord (2016) 46:3006–3022
Table 8 DSM-IV diagnoses of common CART subgroups clinically evaluated at the BAR lab (% of subgroup) CART SUB-GROUPS
N
ATYPICAL ASD
Autistic Disorder
PDD-NOS
Asperger’s
ADHD
Other
72
77
18
0
0
4
111
88
10
1
1
1
VERBAL ASD
54
35
41
15
4
6
SOCIAL PRAGMATIC BEHAVIOR PROBLEMS
44
23
39
5
7
27
GOOD SOCIAL SKILLS
14
14
14
0
7
64
MINIMALLY VERBAL ASD
Percentages may not sum to 100 % due to rounding errors
73 % of the 49 TYPICAL cases were in this Good Social Skills subgroup with the remainder in the Social Pragmatic Behavior Problems subgroup. Vineland Adaptive Behavior Scales Scores Vineland data were available for the BAR children clinically evaluated for behavior disorders and their standard scores across the four domains are shown in Fig. 3 in order to examine external clinical validity of these subgroups to adaptive skills. The Minimally Verbal ASD subgroup had the lowest scores across all domains followed by the Atypical ASD subgroup indicating developmental delay in both. Their motor skills were delayed but were a relative asset. The Verbal ASD subgroup had Communication domain scores in the typical range with Daily Living Skills and Socialization domain scores in the borderline range and motor skills score in the low average range. The Social Pragmatic Behavior Problems subgroup scored in the low
Griffiths DQ Scores Mean DQ scores across domains and subgroups are shown in Fig. 4 in order to examine external clinical validity with tested cognitive skills, as opposed to scores based solely on informant report. There was a significant group by domain interaction (F(20,260) = 9.62, p \ .0001) here as well with a pattern similar to that for the Vineland. The Atypical and Minimally Verbal ASD subgroups showed a typical ASD profile with gross motor skills and performance relative assets, but the mean scores of the subgroups differed in level (p \ 0.02) though not in pattern (p = 0.61) with
VINELAND PROFILES ACROSS CART SUBGROUPS Current effect: F(12, 897)=7.59, p<.00001 Vertical bars denote 0.95 confidence intervals 120 115 110
MEAN STANDARD SCORE
Fig. 3 The patterns of Vineland standard scores for children evaluated at the Behavioral Assessment and Research lab (BAR) across the five most common subgroups that emerged from the CART algorithm. COM = communication, DLS = daily living skills, SOC = socialization, MOT = motor skills. Level and pattern of adaptive skills scores varied with group (see text) supporting their external clinical validity
average range across domains while the Good Social Skills subgroup had scores closer to the expected mean of 100 in Communication and Motor Skills. ANCOVA indicated a significant group by domain interaction (F(12, 900) = 7.54, p \ .0001) indicating that the patterns differed across subgroups.
105
ATYPICAL ASD MINIMALLY VERBAL ASD VERBAL ASD SOCIAL PRAGMATIC BEHAVIOR PROBLEMS GOOD SOCIAL SKILLS
100 95 90 85 80 75 70 65 60 55 50
COM
DLS
SOC
DOMAINS
123
MOT
J Autism Dev Disord (2016) 46:3006–3022 GRIFFITHS PROFILES ACROSS CART SUBGROUPS Current effect: F(20, 360)=9.62, p<0.0001 Vertical bars denote 0.95 confidence intervals
180 160
ATYPICAL ASD MINIMALLY VERBAL ASD VERBAL ASD SOCIAL PRAGMATIC BEHAVIOR PROBLEMS GOOD SOCIAL SKILLS
140
MEAN DQ
Fig. 4 The patterns of Griffiths DQ scores for children evaluated at the BAR and Infant Development labs across the five most common subgroups that emerged from the CART algorithm. LOCOM = locomotor skills, PERSOC = personal-social skills, HEARSP = hearing and speech, EYEHAND = eye-hand coordination, PERF = performance, REASON = reasoning. Level and pattern of cognitive skills scores varied with group (see text), supporting their external clinical validity to tested cognitive skills. The F indicates the group by domain interaction. The Minimally Verbal ASD and Atypical ASD subgroups show profiles often seen in ASD with relative assets in motor and performance skills
3017
120 100 80 60 40 20
LOCOM
PERSOC
HEARSP
EYEHAND
PERF
REASON
DOMAINS
the Minimally Verbal ASD subgroup showing relatively greater impairment in all domains. The Social Pragmatic Behavior Problems and Verbal ASD subgroups had relatively flat profiles differing in level (F(1, 12) = 3.81, p = .07) whereas the Good Social Skills subgroup had a different pattern and was relatively advanced in the Locomotor, Personal Social and Hearing and Speech domains, consistent with informant reports of very good social and language abilities in this group (Fig. 2). Seizure History, Neurogenetic Comorbidity, and MAOAuVNTR Allele Frequency For the ASD and Not-ASD groups from the BAR and Q-GLO sites information on seizure history and neurogenetic comorbidity was available by parental report, and MAOA-uVNTR allele frequency was also available for the subset of boys with ASD from these same sites based on methods described elsewhere (Cohen et al. 2011, 2003a, b). Reported neurogenetic findings included: Down Syndrome (n = 3); Fragile X Syndrome (n = 1); Williams Syndrome (n = 1); short y chromosome (n = 1); 16p13.11 microdeletion syndrome (n = 1); Rett Syndrome (n = 1); Duchenne’s Muscular Dystrophy (n = 1); Russell-Silver Syndrome (n = 1); Tuberous Sclerosis (n = 1); CharcotMarie-Tooth disease (n = 1); mosaic marker chromosome 47, XX ? mar/46, XX (n = 1); microcephaly (n = 1); and other CNS malformations (n = 4) including two cases of Arnold Chiari 1 malformation, one case with ventriculomegaly, and one case with periventricular leukomalacia
and macrocephaly. Seizures were reported to have occurred only in the ASD sample whereas the neurogenetic conditions, as a whole, were equally present in about 4 % of the ASD and OTHER groups. As shown in Table 9, the reported prevalence of seizures was significantly highest (13 %) in the Minimally Verbal ASD subgroup and lowest in the Verbal ASD, Social Pragmatic Behavior Problems and Good Social Skills subgroups (2, 0, and 0 %, respectively). Its additional association with sex was examined since it has been reported that females with ASD are more likely than males with ASD to have epilepsy (Amiet et al. 2008). Within the Minimally Verbal ASD subgroup, a history of seizures was present in 27 % (8/30) of females and in 19 % (19/172) of males v2(1) = 4.6, p \ .035). Reported neurogenetic conditions were also highest (8 %) in the Minimally Verbal ASD group and lowest in the Verbal ASD, Social Pragmatic Behavior Problems and Good Social Skills subgroups (1, 1 and 0 %, respectively). Since these percentages were based on parental report, it is possible that they underestimate true prevalence. Previous studies have shown that the X-linked low activity MAOA-uVNTR allele was associated with increased severity of ASD and lower cognitive ability and this was also seen here where the frequency of this allele was significantly highest (50 %) in the Minimally Verbal ASD subgroup. However, it was lowest in the Atypical ASD subgroup (10 %), which was also developmentally delayed, with the Verbal ASD and Social Pragmatic Behavior Problems subgroups in-between suggesting a
123
3018
J Autism Dev Disord (2016) 46:3006–3022
Table 9 Biomedical associations of CART subgroups (% (N)) from the BAR and Q-GLO Sources. MAOA-uVNTR results from subset of males with ASD from simplex families TREE SUB-GROUP
SEIZURE HISTORY
NEUROGENETIC DISORDER
LOW ACTIVITY MAOA-uVNTR ALLELE
ATYPICAL ASD
7 (116)
3 (113)
10 (20)
MINIMALLY VERBAL ASD
13 (202)
8 (196)
50 (36)
VERBAL ASD SOCIAL PRAGMATIC BEHAVIOR PROBLEMS
2 (101) 0 (75)
1 (96) 1 (73)
27 (15) 25 (8)
GOOD SOCIAL SKILLS
0 (36)
0 (32)
–
Max Likelihood V2
31.6
17.3
10.8
P
\0.000
\0.002
\0.012
more complex interaction that will require follow-up with larger samples. Classification Tree Subgroups: Sex Ratios, Developmental Milestones and Age of Onset Sex Ratios The sex ratio was similar across all 3 ASD subgroups (between 81 and 85 % of cases) and the Social Pragmatic Behavior Problems subgroup did not statistically differ with 77 % males. There were relatively fewer males (64 %) in the Good Social Skills subgroups compared to the other groups (v2(4) = 12.6, p \ .015), similar to that observed in the TYPICAL group.
Minimally Verbal and Atypical ASD subgroups (74 and 65 %, respectively) whereas it was 39, 30 and 14 % in the Verbal ASD, Social Pragmatic Behavior Problems and Good Social Skills subgroups (v2(4) = 46.2, p \ .0001). Age Problems First Noted Retrospectively, the age at which problems were first noticed also differed across subgroups (F(3,312) = 8.51, p \ .001) in children who were over 3 years of age at the time of assessment. Those in the Minimally Verbal ASD group had a reported mean (SE) onset at 15.7 (0.8) months and those in the Social Pragmatic Behavior Problems subgroup had an estimated onset of 17.8 (1.8) months. By contrast, those in the Atypical ASD and Verbal ASD subgroups were not noted to have an issue until 21.7 (1.3) and 24.9 (2.3) months of age, respectively.
Developmental Milestones Motor Milestones Retrospective information on motor developmental milestones (age sitting by self, age walking) were abstracted from PDDBI parent forms in BAR children and who were over 3 years of age at the time of assessment. Motor milestones did not statistically differ across clinical groups with an average reported age of sitting of 6.6 months and age of walking by 14.5 months. Language Milestones Age of first words and first phrases were also abstracted from PDDBI parent forms in BAR children and who were over 3 years of age at the time of assessment. Using the language development cut-off age recommendations from the ADI-R, the percentage of children with first words greater than 24 months was highest in the Minimally Verbal and Atypical ASD and subgroups (53 and 43 %, respectively) whereas it was 16, 10 and 6 % in the Verbal ASD, Social Pragmatic Behavior Problems and Good Social Skills subgroups (v2(4) = 48.3, p \ .0001). Likewise, the percentages of those with first phrases greater than 33 months was highest in the
123
Discussion These results confirm that machine learning programs such as CART can be beneficial in improving differentiation between ASD and similar conditions. Both SE and SP were very good for the training set and generalized well across test and validation sets; clinical setting; and age group for the parent version of the PDDBI suggesting that the CART classification tree will make the PDDBI useful as a Level 2 screener and as an indicator of meaningful subgroups within the ASD spectrum, in addition to its role as a measure of treatment outcome (Dawson et al. 2012). Agreement was good with clinical diagnoses as well as with the ADOS, in which diagnostic categorization was independent of the PDDBI and based on direct observation. Generalization also occurred to the small subset of children who did not have ADI or ADOS assessments. The best agreement with clinical diagnosis occurred when the CART predications from parent and teacher PDDBI data agreed. Thus it would seem that it is helpful to
J Autism Dev Disord (2016) 46:3006–3022
have input from people who see the child in different settings when making a diagnosis. A related notion in data mining and machine learning paradigms is that combining results from different algorithms (e.g., combining the results of CART with other decision tree algorithms such as ‘‘random forests’’ or machine learning programs such as neural networks), a concept referred to as ‘‘bagging’’, often results in better prediction than a single model by itself (Nisbet et al. 2009). The three ASD subgroups that were identified differed in their behavioral patterns, DSM-IV labels, cognitive and adaptive skills, age of onset and language development assessed retrospectively, MAOA genotype, and ‘‘syndromic’’ features such as seizure history and neurogenetic conditions. The research literature supports these observations in that ASD with co-occurring seizures, for example, tends to be associated with developmental delay (Danielsson et al. 2005) and we observed that seizures were reported to be more common in the Minimally Verbal ASD subgroup (especially in females) and relatively uncommon in the Verbal ASD group, suggesting that the Minimally Verbal subgroup has much greater CNS involvement, and, perhaps, more likely to have an identifiable genetic etiology. Their age at which parents were concerned was also earlier than the other groups. Boys in the Minimally Verbal subgroup were also the ones most likely to carry the X-linked low activity MAOA allele which we found to be associated with greater intellectual disability in ASD males (Cohen et al. 2003a). Thus, this subgroup likely will need intensive clinical and educational support. It has been commonly observed that persons with ASD have varied verbal and cognitive abilities and that those who have better cognitive skills, such as the Verbal ASD subgroup identified in the current sample, tend to have better outcomes with time (Ben-Itzchak et al. 2014). Whether this is true for our cases is not known, of course. In the current sample of boys who had MAOA assessments, 90 % of the 20 cases who were in the Atypical ASD subgroup carried the high activity MAOA allele, which we found to be associated with a less severe form of ASD, especially in boys whose mothers were heterozygous for the 3- and 4-repeat alleles (Cohen et al. 2011). Thus, these ASD subgroups differed in many ways and this could possibly impact outcome, intervention efficacy and may have to be controlled in future biomedical and genetic/epigenetic investigations. It is also of interest that a recent study of baby siblings that employed a CART analysis on ADOS observations at 18 months to predict outcome at 36 months also revealed three behavioral patterns that predicted ASD outcome (Chawarska et al. 2014). The most common group had severe symptoms and this group was highly likely to be later diagnosed with ASD. A second group was less affected early on but with both
3019
verbal and non-verbal cognitive delays. The third group had initially borderline ADOS scores that worsened over time. They had below-average verbal ability but average non-verbal ability. These three patterns bore some similarity to the three subgroups identified in the current sample. Of the cases negative for ASD in our sample, the Social Pragmatic Behavior Problems subgroup is of interest as it resembles, to some degree, a newly defined group in the DSM 5, Social Communication Disorder (SCD). In the BAR sample, the Social Pragmatic Behavior Problems subgroup tended to be higher functioning and over 60 % of them were actually diagnosed on the autism spectrum at the BAR site, the most common diagnosis being PDD-NOS even though there was a relatively higher percentage of this sub-group within the Not-ASD group than in the ASD group. This node was not split further by the CART algorithm because of the stopping rule. In the DSM 5, SCD defines children who have social pragmatic problems but without the sensory and ritualistic behaviors characteristic of autism (Swineford et al. 2014). Given the fact that the definition of PDD-NOS in the DSM-IV could include persons who also don’t have repetitive behaviors, it is possible that some of the cases we defined as PDD-NOS would now be identified as SCD in the DSM 5. In the Social Pragmatic Behavior Problems subgroup, average SENSORY and RITUAL domain T-scores were lower than the Verbal ASD group which was comparable in developmental level, but not as low as the Good Social Skills subgroup. The percentages of children with both SENSORY and RITUAL domain T-scores less than 40 (one SD below the expected ASD mean) was 23 % in the Social Pragmatic Behavior Problems subgroup but only 3 % in the Verbal ASD subgroup. Caution should be exercised in interpreting classification into this group for cases greater than 7 years since the SOCAPP T-score has a ceiling less than 73 from this age forward and therefore typically developing children with maximum SOCAPP T-scores will all fall into this group. In such cases, other scores from the PDDBI, such as the Autism Composite, may be helpful in clinical decision making. Five of the subgroups identified by the CART algorithm were rare and all were identified as belonging to the NotASD group. Three of these were based on PDDBI T-scores that were unusually high and so the most likely interpretation is that the algorithm identified these as outliers that should not be counted as belonging to the ASD group. They may represent over-reporting of aberrant behavior but, due to their relative rarity in these samples, it is likely that such cases will require even greater clinical scrutiny and, as noted, could reflect a small subgroup with very severe behavior problems. One of these cases from the BAR site who was clinically diagnosed with ASD and who
123
3020
had extreme AGG T-scores from both parent and teacher, was later found to have inflammatory bowel disease and, after treatment by the family’s physician, was classified by the CART algorithm as belonging to the Minimally Verbal ASD subgroup when seen on follow-up. Finally, our results also bear similarity to other attempts to define subgroups within the autism spectrum based on overall behavior patterns [such as Wing’s three subgroups: aloof, passive, and active but odd (Wing et al. 1987)], developmental progress (Stevens et al. 2000), language ability (Kjelgaard and Tager-Flusberg 2001), and syndromic versus non-syndromic features (Courchesne et al. 1994; Miles et al. 2008; Amiet et al. 2013). These studies all involved a specific focus on an autism sample and not on differentiating these subgroups from other disorders of childhood and identifying subgroups within a comparison sample. As noted, ‘‘nature does not draw a line without smudging it’’ (Wing et al. 1987) which can make differential diagnosis difficult. The use of CART methodology with the standardized and quantitative information from the PDDBI appears to help in both discriminating ASD from other disorders and in identifying meaningful subtypes that replicate, to some extent, behavioral subgroups observed in other studies.
Conclusions In summary, our results are promising and support the use of a decision tree algorithm such as CART to benefit discrimination of disorders with overlapping features, and to identify clinically and research relevant subgroups. Prediction of an ASD or Not-ASD diagnosis was best when both parent and teacher CART results agreed. Limitations include not having a large enough Not-ASD sample for CART to predict such cases from the teacher forms, and not having enough non-Caucasian informants so as to examine if culture influences the results which, if true, would require such information to be part of the classification tree. Unanswered questions include: (1) How stable are these subgroups over time, i.e., are these ‘‘trait’’ or ‘‘state’’ measures? (2) What accounts for the rare subgroups identified here? Are these related to rater or child characteristics such as ADHD (Cohen 2013) and/or cooccurring medical issues? (3) Do these subgroups predict response to intervention? (4) Do these subgroups change with intervention? (5) Are these subgroups more likely to be associated with other specific genetic or epigenetic factors? (6) Does the relative frequency of the ASD subgroups differ in different at-risk groups? (7) What is the association, if any, between SCD and the Social Pragmatic Behavior Problems subgroup? (8) How well do our findings generalize to other diagnostic centers or ethnic groups
123
J Autism Dev Disord (2016) 46:3006–3022
and (9) Can the teacher predictions improve with a much larger sample of controls? Acknowledgments The authors would like to thank the many families that participated in these studies, Deborah Fein for her advice on sensitivity and specificity levels, and Diana Robins for her recommendations on use of LR? and LR-. This research was supported by funds from the New York State Office for People with Developmental Disabilities, the NYS Special Legislative Grant for Autism Research, by Grant #12-FY99-211 from the March of Dimes Birth Defects Foundation to Ira L. Cohen, by Ongwanada Research Fund to Xudong Liu, and by Grant #PO1-HD047281 to Judith M. Gardner. The PDDBI generates a royalty, 50 % of which is used to support research at the Institute with the other 50 % distributed to the authors of the PDDBI. Author Contributions Dr. Cohen prepared the manuscript, taking comments of the co-authors into account, did the statistical analyses, assisted in diagnosing cases and sustained collaboration with the other authors. Drs. Liu and Hudson performed the genetic analyses, confirmed the clinical and ADI-R diagnoses, and supplied their PDDBI data. Drs. Romanczyk, Gillis and Cavalari performed the diagnostic work-ups on their sample and supplied the PDDBI data. Drs. Karmel and Gardner provided the clinical, ADOS and PDDBI data on the children participating in their studies. All authors reviewed the manuscript. Funding This research was supported in part by funds from the New York State Office for People with Developmental Disabilities, the NYS Special Legislative Grant for Autism Research, by Grant #12FY99-211 from the March of Dimes Birth Defects Foundation to Ira L. Cohen, by Ongwanada Research Fund to Xudong Liu, and by Grant #PO1-HD047281 to Judith M. Gardner. Compliance with Ethical Standards Conflict of interest The PDDBI generates a royalty, 50 % of which is used to support research at the Institute with the other 50 % distributed to the authors of the PDDBI and Dr. Cohen is one of the authors. Other authors declare no conflicts of interest. Ethical Approval All procedures performed in the research studies described above were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. Informed Consent Informed consent was obtained from all caregivers participating in the various research projects included in this study.
References Allison, C., Auyeung, B., & Baron-Cohen, S. (2012). Toward brief ‘‘red flags’’ for autism screening: The short autism spectrum quotient and the short quantitative checklist for autism in toddlers in 1,000 cases and 3,000 controls. Journal of the American Academy of Child and Adolescent Psychiatry, 51, 202–212. Amiet, C., Gourfinkel-An, I., Bouzamondo, A., Tordjman, S., Baulac, M., Lechat, P., et al. (2008). Epilepsy in autism is associated with intellectual disability and gender: Evidence from a metaanalysis. Biological Psychiatry, 64, 577–582.
J Autism Dev Disord (2016) 46:3006–3022 Amiet, C., Gourfinkel-An, I., Laurent, C., Bodeau, N., Genin, B., Leguern, E., et al. (2013). Does epilepsy in multiplex autism pedigrees define a different subgroup in terms of clinical characteristics and genetic risk? Molecular Autism, 4, 47. Antshel, K. M., Zhang-James, Y., & Faraone, S. V. (2013). The comorbidity of ADHD and autism spectrum disorder. Expert Review of Neurotherapeutics, 13, 1117–1128. Ben-Itzchak, E., Watson, L. R., & Zachor, D. A. (2014). Cognitive ability is associated with different outcome trajectories in autism spectrum disorders. Journal of Autism and Developmental Disorders, 44, 2221–2229. Benvenuto, A., Manzi, B., Alessandrelli, R., Galasso, C., & Curatolo, P. (2009). Recent advances in the pathogenesis of syndromic autisms. International Journal of Pediatrics, 2009, 198736. Bolte, S., Holtmann, M., & Poustka, F. (2008). The Social Communication Questionnaire (SCQ) as a screener for autism spectrum disorders: Additional evidence and cross-cultural validity. Journal of the American Academy of Child and Adolescent Psychiatry, 47, 719–720. Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classfication and regression trees. Monterey, CA: Wadsworth and Brooks/Cole Advanced Books and Software. Chawarska, K., Shic, F., Macari, S., Campbell, D. J., Brian, J., Landa, R., et al. (2014). 18-Month predictors of later outcomes in younger siblings of children with autism spectrum disorder: A baby siblings research consortium study. Journal of the American Academy of Child and Adolescent Psychiatry, 53, 1317–1327. Cicchetti, D. V., Volkmar, F., Klin, A., & Showalter, D. (1995). Diagnosing autism using ICD-10 criteria: A comparison of neural networks and standard multivariate procedures. Child Neuropsychology, 1, 26–37. Cohen, I. L. (2013). Behaviour profiles of children with attention deficit hyperactivity disorder and autism spectrum disorder on the parent PDD Behaviour Inventory. OA Autism, 1, 1–8. Cohen, I. L., & Sudhalter, V. (2005). The PDD Behavior Inventory. Lutz, FL: Psychological Assessment Resources Inc. Cohen, I. L., Sudhalter, V., Landon-Jimenez, D., & Keogh, M. (1993). A neural network approach to the classification of autism. Journal of Autism and Developmental Disorders, 23, 443–466. Cohen, I. L., Liu, X., Schutz, C., White, B. N., Jenkins, E. C., Brown, W. T., et al. (2003a). Association of autism severity with a monoamine oxidase A functional polymorphism. Clinical Genetics, 64, 190–197. Cohen, I. L., Schmidt-Lackner, S., Romanczyk, R., & Sudhalter, V. (2003b). The PDD Behavior Inventory: A rating scale for assessing response to intervention in children with pervasive developmental disorder. Journal of Autism and Developmental Disorders, 33, 31–45. Cohen, I. L., Gomez, T. R., Gonzalez, M. G., Lennon, E. M., Karmel, B. Z., & Gardner, J. M. (2010). Parent PDD Behavior Inventory profiles of young children classified according to autism diagnostic observation schedule-generic and autism diagnostic interview-revised criteria. Journal of Autism and Developmental Disorders, 40, 246–254. Cohen, I. L., Liu, X., Lewis, M. E., Chudley, A., Forster-Gibson, C., Gonzalez, M., et al. (2011). Autism severity is associated with child and maternal MAOA genotypes. Clinical Genetics, 79, 355–362. Courchesne, E., Saitoh, O., Yeung-Courchesne, R., Press, G. A., Lincoln, A. J., Haas, R. H., et al. (1994). Abnormality of cerebellar vermian lobules VI and VII in patients with infantile autism: Identification of hypoplastic and hyperplastic subgroups with MR imaging. American Journal of Roentgenology: AJR, 162, 123–130.
3021 Danielsson, S., Gillberg, I. C., Billstedt, E., Gillberg, C., & Olsson, I. (2005). Epilepsy in young adults with autism: a prospective population-based follow-up study of 120 individuals diagnosed in childhood. Epilepsia, 46, 918–923. Dawson, G., Jones, E. J., Merkle, K., Venema, K., Lowy, R., Faja, S., et al. (2012). Early behavioral intervention is associated with normalized brain activity in young children with autism. Journal of the American Academy of Child and Adolescent Psychiatry, 51, 1150–1159. Dell Inc. (2015). Dell statistica (data analysis software system) (Version 12) [Computer software]. Dereu, M., Roeyers, H., Raymaekers, R., Meirsschaut, M., & Warreyn, P. (2012). How useful are screening instruments for toddlers to predict outcome at age 4? General development, language skills, and symptom severity in children with a false positive screen for autism spectrum disorder. European Child and Adolescent Psychiatry, 21, 541–551. Eagle, R., Romanczyk, R. G., & Lenzenweger, M. (2010). Classification of children with autism spectrum disorders: A finite mixture modeling approach to heterogeneity. Research in Autism Spectrum Disorders, 4, 772–781. Eaves, L. C., Wingert, H. D., Ho, H. H., & Mickelson, E. C. (2006). Screening for autism spectrum disorders with the social communication questionnaire. Journal of Developmental and Behavioral Pediatrics, 27, S95–S103. Estes, A., Munson, J., Rogers, S. J., Greenson, J., Winter, J., & Dawson, G. (2015). Long-term outcomes of early intervention in 6-year-old children with autism spectrum disorder. Journal of the American Academy of Child and Adolescent Psychiatry, 54, 580–587. Fein, D., Barton, M., Eigsti, I. M., Kelley, E., Naigles, L., Schultz, R. T., et al. (2013). Optimal outcome in individuals with a history of autism. Journal of Child Psychology and Psychiatry, 54, 195–205. Fein, D., Robins, D., & Barton, M. (2016). Testing two screening instruments for autism spectrum disorder. Developmental Medicine and Child Neurology, 58, 314–315. Gillberg, C. (2014). ESSENCE gathers the diagnoses into a whole. Lakartidningen, 111, 1643–1646. Gillberg, C., & Coleman, M. (2000). The biology of the autistic syndromes (3rd ed.). London: Mac Keith Press. Hampton, J., & Strand, P. S. (2015). A review of level 2 parent-report instruments used to screen children aged 1.5–5 for Autism: A meta-analytic update. Journal of Autism and Developmental Disorders, 45, 2519–2530. Havdahl, K. A., von Tetzchner, S., Huerta, M., Lord, C., & Bishop, S. L. (2016). Utility of the child behavior checklist as a screener for autism spectrum disorder. Autism Research, 9, 33–42. Jekel, J. F., Katz, D. L., Elmore, J. G., & Wild, D. M. G. (2007). Epidemiology, biostatistics and preventative medicine (3rd ed.). Philadelphia: Saunders Elsevier. Kamio, Y., Inada, N., Koyama, T., Inokuchi, E., Tsuchiya, K., & Kuroda, M. (2014). Effectiveness of using the modified checklist for autism in toddlers in two-stage screening of autism spectrum disorder at the 18-month health check-up in Japan. Journal of Autism and Developmental Disorders, 44, 194–203. Kasari, C., Gulsrud, A., Paparella, T., Hellemann, G., & Berry, K. (2015). Randomized comparative efficacy study of parentmediated interventions for toddlers with autism. Journal of Consulting and Clinical Psychology, 83, 554–563. Kjelgaard, M. M., & Tager-Flusberg, H. (2001). An investigation of language impairment in autism: Implications for genetic subgroups. Language and Cognitive Processes, 16, 287–308. Macari, S. L., Campbell, D., Gengoux, G. W., Saulnier, C. A., Klin, A. J., & Chawarska, K. (2012). Predicting developmental status from 12 to 24 months in infants at risk for autism spectrum
123
3022 disorder: A preliminary report. Journal of Autism and Developmental Disorders, 42, 2636–2647. Mark, D. H., & Civic, D. (2015). Special report: Early intensive behavioral intervention and other behavioral interventions for autism spectrum disorder. Technology Evaluation Center Assessment Program Executive Summary, 30, 1–3. Miles, J. H., Takahashi, T. N., Hong, J., Munden, N., Flournoy, N., Braddock, S. R., et al. (2008). Development and validation of a measure of dysmorphology: Useful for autism subgroup classification. American Journal of Medical Genetics Part A, 146A, 1101–1116. Mohammadzaheri, F., Koegel, L. K., Rezaei, M., & Bakhshi, E. (2015). A randomized clinical trial comparison between pivotal response treatment (PRT) and adult-driven applied behavior analysis (ABA) intervention on disruptive behaviors in public school children with autism. Journal of Autism and Developmental Disorders, 45, 2899–2907. Muratori, F., Narzisi, A., Tancredi, R., Cosenza, A., Calugi, S., Saviozzi, I., et al. (2011). The CBCL 1.5–5 and the identification of preschoolers with autism in Italy. Epidemiology and Psychiatric Sciences, 20, 329–338. Nisbet, R., Elder, J., & Miner, G. (2009). Handbook of statistical analysis and data mining applications (1st ed.). Burlington: Academic Press/Elsevier. Reel, K. H., Lecavalier, L., Butter, E., & Mulick, J. A. (2012). Diagnostic utility of the pervasive developmental disorder behavior inventory. Research in Autism Spectrum Disorders, 6, 458–465. Reiersen, A. M. (2011). Links between autism spectrum disorder and ADHD symptom trajectories: Important findings and unanswered questions. Journal of the American Academy of Child and Adolescent Psychiatry, 50, 857–859. Schanding, G. T., Nowell, K. P., & Goin-Kochel, R. P. (2012). Utility of the social communication questionnaire-current and social responsiveness scale as teacher-report screening tools for autism spectrum disorders. Journal of Autism and Developmental Disorders, 42, 1705–1716. Schreibman, L., Dawson, G., Stahmer, A. C., Landa, R., Rogers, S. J., McGee, G. G., et al. (2015). Naturalistic developmental behavioral interventions: Empirically validated treatments for autism spectrum disorder. Journal of Autism and Developmental Disorders, 45, 2411–2428. Sparrow, S. S., Balla, D. A., & Cicchetti, D. V. (1984). Vineland Adaptive Behavior Scales. Interview Edition. Survey Form Manual. Circle Pines, MN: American Guidance. Sparrow, S., Cicchetti, D. V., & Balla, D. A. (2005). Vineland Adaptive Behavior Scale-II. Circel Pines, MN: American Guidance. Stenberg, N., Bresnahan, M., Gunnes, N., Hirtz, D., Hornig, M., Lie, K. K., et al. (2014). Identifying children with autism spectrum disorder at 18 months in a general population sample. Paediatric and Perinatal Epidemiology, 28, 255–262. Stephens, B. E., Bann, C. M., Watson, V. E., Sheinkopf, S. J., PeraltaCarcelen, M., Bodnar, A., et al. (2012). Screening for autism
123
J Autism Dev Disord (2016) 46:3006–3022 spectrum disorders in extremely preterm infants. Journal of Developmental and Behavioral Pediatrics, 33, 535–541. Stevens, M. C., Fein, D. A., Dunn, M., Allen, D., Waterhouse, L. H., Feinstein, C., et al. (2000). Subgroups of children with autism by cluster analysis: A longitudinal examination. Journal of the American Academy of Child and Adolescent Psychiatry, 39, 346–352. Swineford, L. B., Thurm, A., Baird, G., Wetherby, A. M., & Swedo, S. (2014). Social (pragmatic) communication disorder: A research review of this new DSM-5 diagnostic category. Journal of Neurodevelopmental Disorders, 6, 41. Tonge, B., Brereton, A., Kiomall, M., Mackinnon, A., & Rinehart, N. J. (2014). A randomised group comparison controlled trial of ‘preschoolers with autism’: A parent education and skills training intervention for young children with autistic disorder. Autism, 18, 166–177. van der Meer, J. M., Oerlemans, A. M., Van Steijn, D. J., Lappenschaar, M. G., de Sonneville, L. M., Buitelaar, J. K., et al. (2012). Are autism spectrum disorder and attention-deficit/ hyperactivity disorder different manifestations of one overarching disorder? Cognitive and symptom evidence from a clinical and population-based sample. Journal of the American Academy of Child and Adolescent Psychiatry, 51, 1160–1172. Wenger, T. L., Kao, C., McDonald-McGinn, D. M., Zackai, E. H., Bailey, A., Schultz, R. T., et al. (2016). The role of mGluR copy number variation in genetic and environmental forms of syndromic autism spectrum disorder. Scientific Reports, 6, 19372. Wetherby, A. M., Brosnan-Maddox, S., Peace, V., & Newton, L. (2008). Validation of the Infant-Toddler Checklist as a broadband screener for autism spectrum disorders from 9 to 24 months of age. Autism, 12, 487–511. Wiggins, L. D., Piazza, V., & Robins, D. L. (2014). Comparison of a broad-based screen versus disorder-specific screen in detecting young children with an autism spectrum disorder. Autism, 18, 76–84. Wing, L., & Gould, J. (1979). Severe impairments of social interaction and associated abnormalities in children: Epidemiology and classification. Journal of Autism and Developmental Disorders, 9, 11–29. Wing, L., Attwood, A., & Donnellan, A. M. (1987). Syndromes of autism and atypical development. In D. J. Cohen (Ed.), Handbook of autism and pervsaive developmental disorders (pp. 3–19). New York: Wiley. Yama, B., Freeman, T., Graves, E., Yuan, S., & Karen, C. M. (2012). Examination of the properties of the Modified Checklist for Autism in Toddlers (M-CHAT) in a population sample. Journal of Autism and Developmental Disorders, 42, 23–34. Zwaigenbaum, L., Bauman, M. L., Fein, D., Pierce, K., Buie, T., Davis, P. A., et al. (2015). Early screening of autism spectrum disorder: Recommendations for practice and research. Pediatrics, 136(Suppl 1), S41–S59.