Alzheimer's disease and Alzheimer's disease-related dementias (AD/ADRD) are a group of brain disorders characterized by progressive cognitive impairments that severely impact daily functioning. Early prediction of AD/ADRD is crucial for potential disease modification through emerging treatments, but current methods are not sensitive enough to reliably detect the disease in its early or presymptomatic stages.
The PREPARE: Pioneering Research for Early Prediction of Alzheimer's and Related Dementias EUREKA Challenge was a multi-year innovation competition supported by the National Institute on Aging (NIA) that advanced methods and data for early prediction of AD/ADRD. Through three phases, over 1,000 participants submitted solutions for open data and innovative methods:
- Phase 1 - Find IT!: Solvers from across academia and industry found, curated, or contributed representative and open datasets that can be used for early prediction of AD/ADRD across a range of modalities including neuroimagery, synthetic electronic health records, survey, speech, and mobile apps. Read more about the results of Phase 1 here.
- Phase 2 - Build IT!: Data science solvers advanced algorithms and analytic approaches for early prediction of AD/ADRD, with an emphasis on explainability of predictions, in two data tracks focused on acoustic voice data and social determinants of health survey data. Read more about the results of Phase 2 here.
- Phase 3 - Put IT All Together!: Ten top teams from Phase 2 refined their algorithmic approaches, working to make their solutions more rigorous and generalizable to a real-world context. Included a public virtual pitch event, and an in-person winner showcase at the NIH. Read more about the results of Phase 3 here..
This repository contains submissions from winning competitors in the qualitatively judged PREPARE Challenge DrivenData challenge.
In this challenge, participants submitted written reports for each phase rather than code solutions. Solution code was reviewed during verification, but earning prizes did not require open-source licensing. If teams have voluntarily shared their code a public repository, the link is included in an .md file in the team's directory.
Winning submissions for other DrivenData competitions is available in the competition-winners repository.
| Place | Team or User | Data Summary |
|---|---|---|
| 1st Place | VBM_CSE_UB | Provided audio recordings, acoustic features, demographics, and clinical labels for 2,086 participants from DementiaBank. |
| 2nd Place | zedlab | Released 2M synthetic patient records generated from EHR-based models trained on the Truven MarketScan database MarketScan and University of Chicago data. |
| 3rd Place + Disproportionate Impact Bonus | IGCPHARMA | Contributed survey data from 26,839 older adults from MHAS and Mex-Cog, aligned with the HCAP cognitive protocol. |
| 4th Place | gaganwig | Shared pre-processed resting-state fMRI scans for 1,491 subjects from the ADNI neuroimaging initiative. |
| 5th Place | EngrDynamics | Provided survey data from ~1M respondents in the U.S. NHIS. |
| Data Idea | korinreidellisonlabs | Proposed a community-driven, de-identified AD-risk dataset combining EHR/claims with biomarker-rich data (e.g., ADNI). |
| Data Idea | msundman | Proposed use of dental radiographs as scalable early AD-risk biomarkers. |
| Data Idea | stephanieruth.young | Proposed open mobile cognitive screening data via the MyCog App. |
Additional submission details can be found inside the directory for each prize winner.
Winners Blog Post: Meet the winners for Phase 1 of the PREPARE Challenge
| Place | Team or User | Approach Highlights |
|---|---|---|
| 1st Place | ExplainableAD (sheep and cecelia) | Whisper encoder, using two-stage training (full dataset, then language-specific fine-tuning) with CAM-based temporal interpretability. |
| 2nd Place | Harris and Kielo | Ensemble combining eGeMAPS-v2 acoustic features with XGBoost, and two fine-tuned Whisper classifiers (one using triplet loss) with all-MiniLM-L6-v2 semantic clustering preprocessing and XGBoost meta-learner aggregation. |
| 3rd Place + Explainability Bonus | team SALEN | Ensemble combining Whisper encoder voiceprint analysis, BERT semantic embeddings, and EfficientNet mel-spectrogram processing with weighted softmax fusion, enhanced by SHAP and CAM interpretability for clinical decision support. |
| Special Recognition (data processing) + Explainability Bonus | SpeechCARE | Multi-lingual acoustic–linguistic pipeline combining eGeMAPS-v2 features, transformer embeddings from wav2vec2 and Whisper, extensive preprocessing (noise filtering, anomaly detection, task-ID classification), and multimodal SHAP-enhanced explanations. |
| Special Recognition (generalizability) | IGC Pharma | Multi-lingual acoustic baseline using fine-tuned Whisper encoder embeddings with comprehensive audio preprocessing and augmentation. |
| Special Recognition (generalizability) | BrainSignsLab | Developed a hybrid feature extraction pipeline combining traditional acoustic features (eGeMAPSv2), transformer embeddings (Wav2Vec2, Whisper), and demographic data in a feature-selected XGBoostclassifier. |
Additional submission details can be found inside the directory for each prize winner.
All solutions in the Acoustic Track were developed using data from the DementiaBank.
Lanzi, A. M., Saylor, A. K., Fromm, D., Liu, H., MacWhinney, B., & Cohen, M. (2023). DementiaBank: Theoretical rationale, protocol, and illustrative analyses. American Journal of Speech-Language Pathology. doi.org/10.1044/2022_AJSLP-22-00281
| Place | Team or User | Approach Highlights |
|---|---|---|
| 1st Place | RASKA-Team | Used TabPFN-derived features, engineered demographic and temporal representations, and sequential prediction of three decision trees (CatBoost, LightGBM, XGBoost) followed by regularized regression. |
| 2nd Place | NxGTR | Built a minimal-complexity decision tree-based model (LightGBM) to predict both speed and acceleration of cognitive decline. |
| 3rd Place | Cassandre | Ensemble of decision tree-based models (LightGBM), and included controls for age and education. |
| Special Recognition (feature selection) | GiaPaoDawei | Applied large-scale feature selection using CatBoost with SHAP-guided recursive removal, and trained a decision tree ensemble (LightGBM, CatBoost). |
| Explainability Bonus | Nick and Ry | Prediction explainer report included predicted cognitive score and critical contextual information for non-technical audience understanding, with scores based on appropriate technical methods. |
Additional submission details can be found inside the directory for each prize winner.
All solutions in the Social Determinants Track were developed using data from the MHAS (Mexican Health and Aging Study).
MHAS is partly sponsored by the National Institutes of Health/National Institute on Aging (grant number NIH R01AG018016) in the United States and the Instituto Nacional de Estadística y Geografía (INEGI) in Mexico. Data files and documentation are public use and available at www.MHASweb.org.
Winners Blog Post: Meet the winners for Phase 2 of the PREPARE Challenge
| Place | Team or User | Track | Approach Highlights |
|---|---|---|---|
| 1st Place and Clean Code Bonus | RASKA-Team | Social Determinants | Harmonized four additional international datasets from the HCAP network and trained an ensemble of LightGBM, CatBoost, and XGBoost with embedded fairness weighting. |
| 2nd Place | ExplainableAD | Acoustic | Used non-native and corpus-diverse speech from WLS, PITT, IVANOVA, VAS, and DELAWARE in DementiaBank plus Mispeech/Speechocean762 and the Qwen2.5-omni-3b model to train a fluency-focused disfluency model and deploy an explainable clinical demo. |
| Runner-up | NxGTR | Social Determinants | Incorporated six independent datasets (including from HCAP network, NACC) with a unified LightGBM ADRDModel adaptable to dataset-specific feature sets. |
| Runner-up | Harris and Kielo | Acoustic | Leveraged healthy-speech data from Mozilla Common Voice and synthetic ADRD-symptom text with a hierarchical Bayesian model combining a BART linguistic module and a CNN–Transformer encoder (CNN–Transformer overview) trained on GeMAPS features. |
- IGCPharma (Acoustic)
- SpeechCARE (Acoustic)

