The Pioneering Research for Early Prediction of Alzheimer's and Related Dementias EUREKA Challenge

Goal of the Competition

Alzheimer's disease and Alzheimer's disease-related dementias (AD/ADRD) are a group of brain disorders characterized by progressive cognitive impairments that severely impact daily functioning. Early prediction of AD/ADRD is crucial for potential disease modification through emerging treatments, but current methods are not sensitive enough to reliably detect the disease in its early or presymptomatic stages.

The PREPARE: Pioneering Research for Early Prediction of Alzheimer's and Related Dementias EUREKA Challenge was a multi-year innovation competition supported by the National Institute on Aging (NIA) that advanced methods and data for early prediction of AD/ADRD. Through three phases, over 1,000 participants submitted solutions for open data and innovative methods:

Phase 1 - Find IT!: Solvers from across academia and industry found, curated, or contributed representative and open datasets that can be used for early prediction of AD/ADRD across a range of modalities including neuroimagery, synthetic electronic health records, survey, speech, and mobile apps. Read more about the results of Phase 1 here.
Phase 2 - Build IT!: Data science solvers advanced algorithms and analytic approaches for early prediction of AD/ADRD, with an emphasis on explainability of predictions, in two data tracks focused on acoustic voice data and social determinants of health survey data. Read more about the results of Phase 2 here.
Phase 3 - Put IT All Together!: Ten top teams from Phase 2 refined their algorithmic approaches, working to make their solutions more rigorous and generalizable to a real-world context. Included a public virtual pitch event, and an in-person winner showcase at the NIH. Read more about the results of Phase 3 here..

What's in this Repository

This repository contains submissions from winning competitors in the qualitatively judged PREPARE Challenge DrivenData challenge.

In this challenge, participants submitted written reports for each phase rather than code solutions. Solution code was reviewed during verification, but earning prizes did not require open-source licensing. If teams have voluntarily shared their code a public repository, the link is included in an .md file in the team's directory.

Winning submissions for other DrivenData competitions is available in the competition-winners repository.

Winning Submissions

Phase 1

Place	Team or User	Data Summary
1st Place	VBM_CSE_UB	Provided audio recordings, acoustic features, demographics, and clinical labels for 2,086 participants from DementiaBank.
2nd Place	zedlab	Released 2M synthetic patient records generated from EHR-based models trained on the Truven MarketScan database MarketScan and University of Chicago data.
3rd Place + Disproportionate Impact Bonus	IGCPHARMA	Contributed survey data from 26,839 older adults from MHAS and Mex-Cog, aligned with the HCAP cognitive protocol.
4th Place	gaganwig	Shared pre-processed resting-state fMRI scans for 1,491 subjects from the ADNI neuroimaging initiative.
5th Place	EngrDynamics	Provided survey data from ~1M respondents in the U.S. NHIS.
Data Idea	korinreidellisonlabs	Proposed a community-driven, de-identified AD-risk dataset combining EHR/claims with biomarker-rich data (e.g., ADNI).
Data Idea	msundman	Proposed use of dental radiographs as scalable early AD-risk biomarkers.
Data Idea	stephanieruth.young	Proposed open mobile cognitive screening data via the MyCog App.

Additional submission details can be found inside the directory for each prize winner.

Winners Blog Post: Meet the winners for Phase 1 of the PREPARE Challenge

Phase 2

Acoustic Track

Place	Team or User	Approach Highlights
1st Place	ExplainableAD (sheep and cecelia)	Whisper encoder, using two-stage training (full dataset, then language-specific fine-tuning) with CAM-based temporal interpretability.
2nd Place	Harris and Kielo	Ensemble combining eGeMAPS-v2 acoustic features with XGBoost, and two fine-tuned Whisper classifiers (one using triplet loss) with all-MiniLM-L6-v2 semantic clustering preprocessing and XGBoost meta-learner aggregation.
3rd Place + Explainability Bonus	team SALEN	Ensemble combining Whisper encoder voiceprint analysis, BERT semantic embeddings, and EfficientNet mel-spectrogram processing with weighted softmax fusion, enhanced by SHAP and CAM interpretability for clinical decision support.
Special Recognition (data processing) + Explainability Bonus	SpeechCARE	Multi-lingual acoustic–linguistic pipeline combining eGeMAPS-v2 features, transformer embeddings from wav2vec2 and Whisper, extensive preprocessing (noise filtering, anomaly detection, task-ID classification), and multimodal SHAP-enhanced explanations.
Special Recognition (generalizability)	IGC Pharma	Multi-lingual acoustic baseline using fine-tuned Whisper encoder embeddings with comprehensive audio preprocessing and augmentation.
Special Recognition (generalizability)	BrainSignsLab	Developed a hybrid feature extraction pipeline combining traditional acoustic features (eGeMAPSv2), transformer embeddings (Wav2Vec2, Whisper), and demographic data in a feature-selected XGBoostclassifier.

Additional submission details can be found inside the directory for each prize winner.

All solutions in the Acoustic Track were developed using data from the DementiaBank.

Lanzi, A. M., Saylor, A. K., Fromm, D., Liu, H., MacWhinney, B., & Cohen, M. (2023). DementiaBank: Theoretical rationale, protocol, and illustrative analyses. American Journal of Speech-Language Pathology. doi.org/10.1044/2022_AJSLP-22-00281

Social Determinants Track

Place	Team or User	Approach Highlights
1st Place	RASKA-Team	Used TabPFN-derived features, engineered demographic and temporal representations, and sequential prediction of three decision trees (CatBoost, LightGBM, XGBoost) followed by regularized regression.
2nd Place	NxGTR	Built a minimal-complexity decision tree-based model (LightGBM) to predict both speed and acceleration of cognitive decline.
3rd Place	Cassandre	Ensemble of decision tree-based models (LightGBM), and included controls for age and education.
Special Recognition (feature selection)	GiaPaoDawei	Applied large-scale feature selection using CatBoost with SHAP-guided recursive removal, and trained a decision tree ensemble (LightGBM, CatBoost).
Explainability Bonus	Nick and Ry	Prediction explainer report included predicted cognitive score and critical contextual information for non-technical audience understanding, with scores based on appropriate technical methods.

Additional submission details can be found inside the directory for each prize winner.

All solutions in the Social Determinants Track were developed using data from the MHAS (Mexican Health and Aging Study).

MHAS is partly sponsored by the National Institutes of Health/National Institute on Aging (grant number NIH R01AG018016) in the United States and the Instituto Nacional de Estadística y Geografía (INEGI) in Mexico. Data files and documentation are public use and available at www.MHASweb.org.

Winners Blog Post: Meet the winners for Phase 2 of the PREPARE Challenge

Phase 3

Place	Team or User	Track	Approach Highlights
1st Place and Clean Code Bonus	RASKA-Team	Social Determinants	Harmonized four additional international datasets from the HCAP network and trained an ensemble of LightGBM, CatBoost, and XGBoost with embedded fairness weighting.
2nd Place	ExplainableAD	Acoustic	Used non-native and corpus-diverse speech from WLS, PITT, IVANOVA, VAS, and DELAWARE in DementiaBank plus Mispeech/Speechocean762 and the Qwen2.5-omni-3b model to train a fluency-focused disfluency model and deploy an explainable clinical demo.
Runner-up	NxGTR	Social Determinants	Incorporated six independent datasets (including from HCAP network, NACC) with a unified LightGBM ADRDModel adaptable to dataset-specific feature sets.
Runner-up	Harris and Kielo	Acoustic	Leveraged healthy-speech data from Mozilla Common Voice and synthetic ADRD-symptom text with a hierarchical Bayesian model combining a BART linguistic module and a CNN–Transformer encoder (CNN–Transformer overview) trained on GeMAPS features.

Additional Clean Code Bonus Winners

IGCPharma (Acoustic)
SpeechCARE (Acoustic)

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Phase 1		Phase 1
Phase 2		Phase 2
Phase 3		Phase 3
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

The Pioneering Research for Early Prediction of Alzheimer's and Related Dementias EUREKA Challenge

Goal of the Competition

What's in this Repository

Winning Submissions

Phase 1

Phase 2

Acoustic Track

Social Determinants Track

Phase 3

Additional Clean Code Bonus Winners

About

Uh oh!

Releases

Packages

Languages

License

drivendataorg/prepare-adrd

Folders and files

Latest commit

History

Repository files navigation

The Pioneering Research for Early Prediction of Alzheimer's and Related Dementias EUREKA Challenge

Goal of the Competition

What's in this Repository

Winning Submissions

Phase 1

Phase 2

Acoustic Track

Social Determinants Track

Phase 3

Additional Clean Code Bonus Winners

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages