ABSTRACT Alcohol misuse during adolescence (AAM) has been linked with disruptive structural development of the brain and alcohol use disorder. Using machine learning (ML), we analyze the link between AAM phenotypes and adolescent brain structure (T1-weighted imaging and DTI) at ages 14, 19, and 22 in the IMAGEN dataset ( n ∼ 1182). ML predicted AAM at age 22 from brain structure with a balanced accuracy of 78% on independent test data. Therefore, structural differences in adolescent brains could significantly predict AAM. Using brain structure at age 14 and 19, ML predicted AAM at age 22 with a balanced accuracy of 73% and 75%, respectively. These results showed that structural differences preceded alcohol misuse behavior in the dataset. The most informative features were located in the white matter tracts of the corpus callosum and internal capsule, brain stem, and ventricular CSF. In the cortex, they were spread across the occipital, frontal, and temporal lobes and in the cingulate cortex. Our study also demonstrates how the choice of the phenotype for AAM, the ML method, and the confound correction technique are all crucial decisions in an exploratory ML study analyzing psychiatric disorders with weak effect sizes such as AAM.
Article Figures and data Abstract Editor's evaluation Introduction Results Discussion Materials and methods Appendix 1 Data availability References Decision letter Author response Article and author information Metrics Abstract Alcohol misuse during adolescence (AAM) has been associated with disruptive development of adolescent brains. In this longitudinal machine learning (ML) study, we could predict AAM significantly from brain structure (T1-weighted imaging and DTI) with accuracies of 73 -78% in the IMAGEN dataset (n∼1182). Our results not only show that structural differences in brain can predict AAM, but also suggests that such differences might precede AAM behavior in the data. We predicted 10 phenotypes of AAM at age 22 using brain MRI features at ages 14, 19, and 22. Binge drinking was found to be the most predictable phenotype. The most informative brain features were located in the ventricular CSF, and in white matter tracts of the corpus callosum, internal capsule, and brain stem. In the cortex, they were spread across the occipital, frontal, and temporal lobes and in the cingulate cortex. We also experimented with four different ML models and several confound control techniques. Support Vector Machine (SVM) with rbf kernel and Gradient Boosting consistently performed better than the linear models, linear SVM and Logistic Regression. Our study also demonstrates how the choice of the predicted phenotype, ML model, and confound correction technique are all crucial decisions in an explorative ML study analyzing psychiatric disorders with small effect sizes such as AAM. Editor's evaluation This study uses a large dataset on alcohol misuse in adolescents that have been followed up for several years. MRI data are used to test whether the structure and connectivity of the brains of adolescents can predict their alcohol misuse later in their early twenties. The results show that binge drinking can be predicted out of multiple brain phenotypes with good accuracy, even after controlling for many confounding variables. This study can be impactful as it suggests a re-evaluation of studies of the effect of alcohol on the adolescent brain. https://doi.org/10.7554/eLife.77545.sa0 Decision letter Reviews on Sciety eLife's review process Introduction Many adolescents participate in risky and excessive alcohol consumption behaviors (Crews et al., 2007), especially in European and North American countries. Several studies have identified that such early and risky exposure to alcohol is a potential risk factor that can lead to the development of Alcohol Use Disorder (AUD) later in life (DeWit et al., 2000; Grant et al., 2006; Nixon and McClain, 2010). During adolescence and early adulthood (age 10–24), the human brain undergoes maturation characterized by an increase in white matter (WM) (Lebel and Beaulieu, 2011) and an initial thickening and later thinning of grey matter (GM) regions (Giedd, 2004). Researchers have suggested that excessive alcohol use during this period might disrupt normal brain maturation, causing lifelong effects (Crews et al., 2007; Monti et al., 2005; Chambers et al., 2003). Therefore, understanding how alcohol misuse during adolescence is related to the development of Alcohol Use Disorder (AUD) later in life is crucial to understanding alcohol addiction. Furthermore, uncovering how adolescent alcohol misuse (AAM) is associated with their brain at different stages of adolescent brain development can help to implement a more informed public health policy surrounding alcohol use during this age. Previous studies: Several studies in the last two decades have attempted to uncover how adolescent alcohol misuse (AAM) and their structural brain are related. These are summarised in Table 1. Earlier studies collected data with small sample size of 30–100 subjects and compared specific brain regions (such as the hippocampus or the pre-frontal cortex (pFC)) between adolescent alcohol misusers (AAMs) and mild users or non-users (controls). They used structural features such as regional volume (De Bellis et al., 2000; Nagel et al., 2005; De Bellis et al., 2005), cortical thickness (Squeglia et al., 2012), or white matter tract volumes (McQueeny et al., 2009; Jones and Nagel, 2019). These studies found differences between the groups in regions such as the hippocampus (De Bellis et al., 2000; Nagel et al., 2005), cerebellum (De Bellis et al., 2005), and the frontal cortex (De Bellis et al., 2005). However, these findings are not always consistent across studies (Jones et al., 2018). This inconsistency is also evident from the findings in the last column of Table 1. Another group of studies investigated into whether AAM disrupts the natural developmental trajectory of adolescent brains (Jacobus et al., 2013; Luciana et al., 2013; Pfefferbaum et al., 2018; Jones and Nagel, 2019; Sullivan et al., 2020; Robert et al., 2020). These studies reported that the brains of AAMs showed accelerated GM decline (Luciana et al., 2013; Pfefferbaum et al., 2018; Sullivan et al., 2020) and attenuated WM growth (Luciana et al., 2013; Sullivan et al., 2020) compared to controls. However, brain regions reported were not consistent between these studies either and do not tell a coherent story (Jones et al., 2018) (see Table 1). These differences in findings could be potentially due to the following reasons: Heterogeneous disease with a weak effect size: Alcohol misuse has a heterogeneous expression in the brain (Zahr and Pfefferbaum, 2017). This heterogeneity might be driven by alcohol misuse affecting diverse brain regions in different sub-populations depending on demographic, environmental, or genetic differences (Grant et al., 2015). Furthermore, the effect of alcohol misuse on adolescent brain structure can be weak and hard to detect (especially with the mass-univariate methods used in previous studies). The possibility of several disease subtypes exasperated by the small signal-to-noise ratio can generate incoherent findings regarding which brain regions are affected by alcohol. Higher risk of false-positives: Most previous studies have small sample size that are prone to generate inflated effect size (Button et al., 2013). Furthermore, these studies employ mass-univariate analysis techniques that are vulnerable to multiple comparisons problem (Lindquist and Mejia, 2015) and can produce false-positives if ignored. These factors coupled with the possibility of publication bias to produce positive results (Ioannidis, 2005) can have a high likelihood of generating false-positive findings (Scheel et al., 2021). Several metrics to measure alcohol misuse: There is no consensus on what is the best phenotype to measure AAM. Many studies use binge drinking or heavy episodic drinking as a measure of AAM (Squeglia et al., 2012; Whelan et al., 2014; Jones and Nagel, 2019; Robert et al., 2020), while few others use a combination of binge drinking, frequency of alcohol use, amount of alcohol consumed and the age of onset of alcohol misuse (Squeglia et al., 2015; Pfefferbaum et al., 2018; Kühn et al., 2019; Seo et al., 2019; Sullivan et al., 2020). These differences in analyses could potentially produce different findings. Table 1 Literature review of studies that look into structural brain differences between adolescent alcohol misusers (AAMs) and control subjects. The studies are sorted by the year of publication. For each study, the sample size ‘n’, the main analysis technique, and the main structural differences found in AAMs are listed. Study (year)nAnalysis / methodSructural differences in AAMsDe Bellis et al., 200036Statistically compare (univariate)regional brain volumes between groupsLower hippocampal volume.Nagel et al., 200531Statistically compare (univariate)regional brain volumes between groupsLower volume only in left hippocampus aftercontrolling for other psychiatric comorbidities.De Bellis et al., 200542Statistically compare (univariate)regional brain volumes between groupsLower pFC, cerebellum volumes in malesbut AAMs had comorbid mental disorders.McQueeny et al., 200928Mass-univariate analysis ofskeletonized FA voxels (DTI)Binge drinkers had lower FA in18 white matter areas.Squeglia et al., 201259Statistically compare (univariate) regional brain volumes between groupsNo effect of binge drinking oncortical thickness and sex-specificdifferences among AAMs in left frontal cortex.Jacobus et al., 201354Mass-univariate analysis of skeletonized FA voxels (DTI)No effect in AAM-only group, but lowerFA in AAM and comorbid marijuana users.Luciana et al., 201355Longitudinal mass-univariate analysis of cortical thickness, white matter extent, DTI-extracted FA and MDAccelerated GM thinning in mid frontal gyrus, attenuated WM growth with lower FAin left caudate, thalamus.Whelan et al., 2014692Exploratory analysis using ML to find best predictors of AAM amongdemographic, psychosocial, genetic, cortical volumes, and fMRI variablesCurrent AAMs have lower GMVs in parts of frontal lobe and higher GMV in right putamen. Future AAMs have lower GMV in right parahippocampal gyrus and higher in left postcentral gyrus.Squeglia et al., 2015137Exploratory analysis using ML to find best predictors of AAM among demographic, neuropsychological, cortical thickness, and fMRI variablesFuture AAM have thinner GM inprecuneus, lateral occipital, ACC, PCC, and frontal and temporal cortex.Pfefferbaum et al., 2018483Longitudinal mass-univariate analysisof GMV developmentAccelerated GMV reduction in frontal brain regions.Jones and Nagel, 2019113Modeling the WM microstructure development (DTI) for each voxelAltered frontostriatal WM microstructureis predictive of future AAM.Kühn et al., 2019≈1500Growth curve modeling ofGM volumesHigher GMV in caudate nucleus and left cerebellum predicts future AAMsSeo et al., 2019≈1000ML analysis of cue-related brain region followed by mass-univariate analysis for identifying region importanceCurrent AAMs show reduced GMV inmedial-pFC, oFC, thalamus, bilateral ACC,left amygdala and anterior insular.Sullivan et al., 2020548Longitudinal mass-univariate (GLM)analysis of cerebellar region volumesCerebellum: accelerated GM decline in 2 sub-regions and accelerated expansion ofWM in one sub-region and CSF.Robert et al., 2020726Mass-univariate analyses of voxels, followed by analysis of the direction of causality using causal bayesian networksAccelerated GM atrophy in parts of the temporal cortex and left prefrontal cortex.Filippi et al., 2021671ML analysis for predictors ofresilence towards polysubstance useAdolescents resilient to PSU show larger GMV in the bilateral cingulate gyrus. Acronyms::: GM:grey matter; WM:white matter; CSF-cerebrospinal fluid; GMV:grey matter volume; pFC:prefrontal Cortex; oFC:orbitofrontal cortex; ACC:anterior cingulate cortex; PCC:posterior cingulate cortex; GLM:generalized linear models; ML:machine learning; DTI:Diffusion Tensor Imaging; FA:Fractional Anisotropy; MD:mean diffusivity. Multivariate exploratory analysis: Over the last years, data collection drives such as IMAGEN (Mascarell Maričić et al., 2020), NCANDA (Brown et al., 2015), and UK Biobank (Sudlow et al., 2015) made available large-sample multi-site data with n>1000 that are representative of the general population. This enabled researchers to use multivariate, data-driven, and exploratory analysis tools such as machine learning (ML) to detect effects of alcohol misuse on multiple brain regions (Whelan et al., 2014; Squeglia et al., 2017; Seo et al., 2019; Filippi et al., 2021; Jia et al., 2021; Yip et al., 2022). Such whole-brain multivariate methods are preferable over the previous mass-univariate methods as they have a higher sensitivity to detect true positives (Hebart and Baker, 2018). Furthermore, ML can be easily used for clinical applications such as computer-aided diagnosis, predicting future development of AUD, and future relapse of patients into AUD (Shiraishi et al., 2011). Due to these advantages, several exploratory studies using ML have been attempted in AUD research (Whelan et al., 2014; Seo et al., 2019; Squeglia et al., 2017). We further extend this line of work by analyzing the newly available longitudinal data from IMAGEN (n∼1182 at 4 time points of adolescence) (Mascarell Maričić et al., 2020) by designing a robust and reliable ML pipeline. The goal of this study is to explore the relationship between adolescent brain and AAM using ML and discover any brain features that can be associated with AAM. As shown in Figure 1, we predict AAM at age 22 using brain morphometrics derived from structural imaging captured at three stages of adolescence – ages 14, 19, and 22. The structural features of different brain regions are extracted from two modalities of structural MRI, that is, T1-weighted imaging (T1w) and Diffusion Tensor Imaging (DTI). The most informative structural features for the ML model prediction are discovered using SHAP (Lundberg and Lee, 2017; Lundberg et al., 2020) to reveal the most distinct structural brain differences between AAMs and controls. Furthermore, we use multiple phenotypes of alcohol misuse such as the frequency of alcohol consumption, amount of consumption, onset of misuse, binge drinking, the AUDIT score, and other combinations, and systematically compare them. We also compare four different ML models, and multiple methods of controlling for confounds in ML and derive important methodological insights which are beneficial for reliably applying ML to psychiatric disorders such as AUD. To promote reproducibility and open science, the entire codebase used in this study, including the initial data analysis performed on the IMAGEN dataset are made available at https://github.com/RoshanRane/ML_for_IMAGEN(Rane and Kim, 2022; copy archived at swh:1:rev:6c493672ed700ded73c2b77e8976a5551921e634). Figure 1 Download asset Open asset An overview of the analysis performed. Morphometric features extracted from structural brain imaging are used to predict Adolescent Alcohol Misuse (AAM) developed by the age of 22 using machine learning. To understand the causal relationship between AAM and the brain, three separate analyses are performed by using imaging data collected at three stages of adolescence: age 14, age 19, and age 22. Results The results are reported in the following four subsections: In subsection 1, different confound-control techniques are compared and the most suitable technique for this study is determined. Subsection 2 shows the results of the ML exploration performed with ten AAM labels, four ML models, and using imaging data from three time points of adolescence. This stage helps to determine the best phenotype of AAM and the best ML model. Subsection 3 reports the final results on the independent data holdout for all three time point analyses and subsection 4 shows the most informative features found in each of the analyses. Subsection 5 reports the result from the additional leave-one-site-out experiment. Confound correction techniques The sex csex and recruitment site csite of subjects confound this study (refer to subsection 5.1 in ‘Materials and methods’) and their influence on the study needs to be controlled. We test three confound correction techniques on data explore – (a) confound regression (b) counterbalancing with undersampling and (c) counterbalancing with oversampling. To verify if these methods work as expected, the same analysis approach from Görgen et al., 2018 and the approach by Snoek et al., 2019 are employed. For the two confounds csex and csite, this requires us to test five input-output combinations (X→y, X→csex, X→csite, csex→y and csite→y) for a given X→y analysis. Figure 2 shows the results of comparing different confound correction techniques for the ‘Binge’ phenotype. The following conclusions can be derived from this comparison: Figure 2 Download asset Open asset Comparing confound correction techniques. Five input-output settings are compared within each confound correction technique: X→y, X→csex, X→csite, csex→y, and csite→y. (a) shows the results before any correction is performed, (b) shows the results of performing confound regression, and (c) and (d) show the results from counterbalancing by undersampling the majority class and oversampling the minority class, respectively. Statistical significance is obtained from 1,000 permutation tests and is shown with ** if p<0.01, * if p<0.05, and ‘n.s’ if p≥0.05. 1. Sex and site can confound the AAM analysis: As shown in subplot (a), all the input-output combinations involving the confounds (X→csex, X→csite, csex→y and csite→y) produce significant prediction accuracies before any confound correction is performed. This further adds to the evidence that both the confounds csex, csite can strongly influence the accuracy of the main analysis X→y and confound the analysis. 2. Confound regression is not a good choice when followed by a non-linear ML method: Following confound regression, the results of X→csex and X→csite should become non-significant as the signal sc has been removed from X. However, it is seen that in some cases the non-linear models SVM-rbf and GB are capable of detecting the confounding signal sc from the imaging data. The red arrow in the subplot (b) points out one such case in the example shown. This is not surprising as the standard confound regression removes linear components of the signal sc but does not remove any non-linear components that might still be present in X (Görgen et al., 2018; Dinga et al., 2020). Furthermore, confound regression carries an additional risk of also regressing-out the useful signal in X that does not confound the analysis X→y but is a co-variate of both c and y (Dinga et al., 2020). 3. Counterbalancing with oversampling is the best choice for this study: As expected, counterbalancing forces the csex→y and csite→y accuracies to chance-level by removing the correlation between c∼y (subplots c and d). It can be seen that after the undersampled counterbalancing the results of the main analysis X→y also become non-significant as indicated by the red arrow in (c). This drastic reduction in performance is likely due to the reduction in the sample size of the training data by n∼100-250 from undersampling. Therefore, counterbalancing with oversampling of the minority group is a better alternative compared to undersampling. This comparison was also repeated for two other AAM phenotypes - ‘Combined-seo’ and ‘Binge-growth’ and the above findings were found to be consistent across all of them. Hence, counterbalancing with oversampling is used as the confound-control technique in the main analysis. When performing over-sampled counterbalancing, it is ensured that the oversampling is done only for the training data. ML exploration The results from the ML exploration experiments are summarised in Figure 3. For the different AAM phenotypes, the balanced accuracies range between 45 and 73%. It must be noted that the results across different phenotypes are not directly comparable as each AAM phenotype classification task has a different sample size varying between ≈620-780 (refer to ‘Materials and methods’ Table 2 and Appendix 1—table 2 for the list of phenotypes and their respective sample size). These differences in the number of samples in the two classes AAM and controls could add additional variance in the accuracy. Nevertheless, some useful observations can be made from the consistenties found across the three time point analyses, depicted in subplots (a), (b), and (c) of Figure 3: Figure 3 with 1 supplement see all Download asset Open asset Results of the ML exploration experiments: The ten phenotypes of AAM tested are listed on the y-axis and the four ML models are represented with different color coding as shown in the legend of figure (a). For a given AAM label and ML model, the point represents the mean balanced accuracy across the 7-fold CV and the bars represent its standard deviation. Figure (a) shows the results when the imaging data from age 22 (FU3) is used, figure (b) shows results for age 19 (FU2) and figure (c) for age 14. Figure (d) shows the results from all three time point analyses in a single plot along with the interval of the balanced accuracy that were non-significant (p≤0.05) when tested with permutation tests. The most predictable phenotype from structural brain features for all three time point analyses is ‘Binge’ which measures the total lifetime experiences of being drunk from binge drinking. Other individual phenotypes such as the amount of alcohol consumption (Amount), frequency of alcohol use (Frequency) and the age of AAM onset (Onset) are harder to predict from brain features compared to the binge drinking phenotype. The results on ‘Combined-seo’ and ‘Combined-ours’ shows that using phenotypes measuring amount and frequency of drinking in combination with binge drinking seems to also be detrimental to model performance. All models perform poorly at predicting AAM phenotypes derived from AUDIT. This is surprising as AUDIT is considered a de facto screening test for measuring alcohol misuse (Kranzler and Soyka, 2018). Among the four ML models, the SVM with non-linear kernel SVM-rbf, and the ensemble learning method GB perform better than the linear models LR and SVM-lin. This is further evident in the summary plot (d) in the figure. Table 2 10 phenotypes of Adolescent Alcohol Misuse (AAM) are derived and compared in this analysis. A description of each phenotype is provided here along with the link to the IMAGEN questionnaires ID used to generate the phenotype. No.PhenotypeDescriptionQuestionnaire1FrequencyNumber of occasions drinking alcohol in last 12 monthsESPAD 8b.2AmountNumber of alcohol drinks consumed on atypical drinking occasionESPAD prev31,AUDIT q2.3OnsetHad one or more binge-drinking experiences by the age of 14ESPAD 29d4BingeTotal drunk episodes from binge-drinking in lifetime (by age 22)ESPAD 19a,AUDIT q3.5Binge-growthLongitudinal trajectory of binge-drinking experiences had per yearGrowth curveof ESPAD 19b.6AUDITAUDIT screening test performed at the year of scanAUDIT-total (q1-10).7AUDIT-quickOnly the first 3 questions of AUDIT screening testAUDIT-freq (q1-3).8AUDIT-growthLongitudinal changes in the AUDIT score measured over the yearsGrowth curve ofAUDIT-total.9Combined-seoA combined risky-drinking phenotype from Seo et al., 2019 generated using amount, frequency, and binge-drinking dataESPAD 8b, 17b, 19b,and TLFB alcohol210Combined-oursA combined risky-drinking phenotype developed by clusteringamount, frequency, and binge-drinking trajectoryAUDIT q1, q2,ESPAD 19a, growthcurve of ESPAD 19b. In summary, the non-linear ML models SVM-rbf and GB coupled with the ‘Binge’ phenotype consistently perform the best in all three time point analyses. This is more clearly visible in the summary figure (d) where the results from all three analyses are combined in a single plot. Similar general observations can be made when the AUC-ROC metric is used to measure model performance (see Figure 3—figure supplement 1). Generalization The generalization test is performed with ‘Binge’ phenotype as the label and the two non-linear ML models, SVM-rbf and GB. The final results are shown in Figure 4. For the three analyses using imaging data from age 22, age 19, and age 14, as input, an average balanced accuracy of 78%, 75.5%, and 73.5% are achieved, respectively. Their average ROC-AUC scores are 83.93%, 83.1%, and 81.5% for the respective analyses. The accuracies for all three time point analyses are significant with p<0.01. To get a better intuition, please refer to Figure 4—figure supplement 1 that shows the model accuracies against the accuracies obtained from permutation tests. Figure 4 with 1 supplement see all Download asset Open asset Final results for the three time point analyses on the ‘Binge’ drinking AAM phenotype obtained with the two non-linear ML models, kernel-based support vector machine (SVM-rbf) and gradient boosting (GB). The figure shows the mean balanced accuracy achieved by each ML model within each analysis while the table lists the combined average scores for each analysis. The ML models are retrained seven times on data explore with different random seeds and evaluated on data holdout to obtain an estimate of the accuracy with a standard deviation. Statistical significance is obtained from 1000 permutation tests and is shown with ** if p<0.01, * if p<0.05, and ‘n.s’ if p≥0.05. To further assess the causality in the MRIage14→AAMage22 analysis, we repeated it by using only subjects who had no binge drinking experiences by age 14 (n=477) and also with subjects who had a maximum of one binge drinking experience (n=565) by age 14. The balanced accuracy obtained on the holdout set was 72.9±2% and 71.1±2.3%, respectively. Important brain regions Following the generalization test, the most informative structural brain features are determined for the SVM-rbf model, as it performs relatively better among the two non-linear models tested on data holdout (see Figure 4). Figure 5 shows the list of the most important features for all three time point analyses and illustrates where they are located in the brain. It also shows whether these features have lower-than-average or higher-than-average values when the ML model predicts the subjects as AAMs. Figure 5 Download asset Open asset Most informative structural features for SVM-rbf model’s predictions on data holdout. Most important features are listed and their locations are shown on a template brain for a better intuition for each of the three time point analyses. The features are color coded to also display whether these features have lower-than-average or higher-than-average values when the model predicts alcohol misusers. This figure is only illustrative and an exhaustive list of all informative features with their corresponding SHAP values are given in the Appendix 1—table 3. (Acronyms:: AAM: adolescence alcohol misuse, area: surface area, volume: gray matter volume, thickness: average thickness, thicknessstd: standard deviation of thickness, intensity: mean intensity, meancurv: integrated rectified mean curvature, gauscurv: integrated rectified gaussian curvature, curvind: intrinsic curvature index). Several clusters of regions and feature values can be identified. Most of the important subcortical features are located around the lateral ventricles and the third ventricle and include CSF-related features such as the CSF mean-intensity, volume of left choroid plexus, and left corticospinal tract in the brain stem. Several white matter tracts are found to be informative such as parts of the corpus callosum, internal capsule, and posterior corona radiata. Furthermore, all of these white matter tracts, along with the brain stem have lower-than-average intensities in AAM predictions. The prominent cortical features are spread across the occipital, temporal, and frontal lobes. In the MRIage22→AAMage22 analysis important cortical features appear in the occipital lobe. In contrast, for the future prediction analyses MRIage19→AAMage22 and MRIage14→AAMage22, clusters appear in the limbic system (parts of the cingulate cortex and right parahippocampal gyrus), frontal lobe (left-pars orbitalis, left-frontal pole, right-precentral gyrus, and left-rostral middle frontal gyrus) as well as in the temporal lobe (left-inferior temporal gyrus, left-temporal pole, and right-bank of the superior temporal sulcus). In the occipital lobe, AAMs predictions have lower grey matter thickness in the right-cuneus, lateral occipital, and pericalcarine cortices, and higher curvature index in left-cuneus and left-pericalcarine cortex. The list of all the informative features are provided in Appendix 1—table 3 along with their feature type, modality, and respective SHAP values in each CV folds. Cross-site experiment The result from the leave-one-site-out CV experiment are shown in Figure 6. The ML models perform close-to-chance for all AAM labels in the ML exploration experiments and fail to produce a significant performance for any of the three time points in the generalization test. For the ‘Binge’ label in the ML exploration stage, the model accuracy displays very high variance, as compared to the main experiment (compare Figure 6 with Figure 3 (d)). This suggests that the performance of the ML models varies greatly across sites in this study. Figure 6 Download asset Open asset Analysis repeated with leave-one-site-out cross validations (CV). Discussion For over two decades, researchers have tried to uncover the relationship that exist between adolescent alcohol misuse (AAM) and brain development. Many previous studies found that such a relationship exists (see Table 1) but with low-to-medium effect size (Nagel et al., 2005; Whelan et al., 2014; Squeglia et al., 2017; Seo et al., 2019; De Bellis et al., 2005; McQueeny et al., 2009; Luciana et al., 2013). The brain regions linked with AAM varied greatly across studies (see highlighted text in Table 1). This inconsistency in findings and effect sizes could be due to methodological limitations, small sample studies, unavailability of long-term longitudinal data like IMAGEN (Mascarell Maričić et al., 2020), or simply due to the heterogeneous expression of AAM in the brain. In our study, ML models predicted AAM with significantly above-chance accuracies in the range 73.1%-78% (ROC-AUC in 81.5%-83.9%) from adolescent brain structure captured at ages 14, 19, and 22. Thus, our results
Alcohol misuse during adolescence (AAM) has been associated with disruptive development of adolescent brains. In this longitudinal machine learning (ML) study, we could predict AAM significantly from brain structure (T1-weighted imaging and DTI) with accuracies of 73 -78% in the IMAGEN dataset (n∼1182). Our results not only show that structural differences in brain can predict AAM, but also suggests that such differences might precede AAM behavior in the data. We predicted 10 phenotypes of AAM at age 22 using brain MRI features at ages 14, 19, and 22. Binge drinking was found to be the most predictable phenotype. The most informative brain features were located in the ventricular CSF, and in white matter tracts of the corpus callosum, internal capsule, and brain stem. In the cortex, they were spread across the occipital, frontal, and temporal lobes and in the cingulate cortex. We also experimented with four different ML models and several confound control techniques. Support Vector Machine (SVM) with rbf kernel and Gradient Boosting consistently performed better than the linear models, linear SVM and Logistic Regression. Our study also demonstrates how the choice of the predicted phenotype, ML model, and confound correction technique are all crucial decisions in an explorative ML study analyzing psychiatric disorders with small effect sizes such as AAM.