Skip to content

Competition winners' code from PREPARE: Pioneering Research for Early Prediction of Alzheimer's and Related Dementias EUREKA Challenge

License

Notifications You must be signed in to change notification settings

drivendataorg/prepare-adrd

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation



The Pioneering Research for Early Prediction of Alzheimer's and Related Dementias EUREKA Challenge

Goal of the Competition

Alzheimer's disease and Alzheimer's disease-related dementias (AD/ADRD) are a group of brain disorders characterized by progressive cognitive impairments that severely impact daily functioning. Early prediction of AD/ADRD is crucial for potential disease modification through emerging treatments, but current methods are not sensitive enough to reliably detect the disease in its early or presymptomatic stages.

The PREPARE: Pioneering Research for Early Prediction of Alzheimer's and Related Dementias EUREKA Challenge was a multi-year innovation competition supported by the National Institute on Aging (NIA) that advanced methods and data for early prediction of AD/ADRD. Through three phases, over 1,000 participants submitted solutions for open data and innovative methods:

  • Phase 1 - Find IT!: Solvers from across academia and industry found, curated, or contributed representative and open datasets that can be used for early prediction of AD/ADRD across a range of modalities including neuroimagery, synthetic electronic health records, survey, speech, and mobile apps. Read more about the results of Phase 1 here.
  • Phase 2 - Build IT!: Data science solvers advanced algorithms and analytic approaches for early prediction of AD/ADRD, with an emphasis on explainability of predictions, in two data tracks focused on acoustic voice data and social determinants of health survey data. Read more about the results of Phase 2 here.
  • Phase 3 - Put IT All Together!: Ten top teams from Phase 2 refined their algorithmic approaches, working to make their solutions more rigorous and generalizable to a real-world context. Included a public virtual pitch event, and an in-person winner showcase at the NIH. Read more about the results of Phase 3 here..

What's in this Repository

This repository contains submissions from winning competitors in the qualitatively judged PREPARE Challenge DrivenData challenge.

In this challenge, participants submitted written reports for each phase rather than code solutions. Solution code was reviewed during verification, but earning prizes did not require open-source licensing. If teams have voluntarily shared their code a public repository, the link is included in an .md file in the team's directory.

Winning submissions for other DrivenData competitions is available in the competition-winners repository.

Winning Submissions

Phase 1

Place Team or User Data Summary
1st Place VBM_CSE_UB Provided audio recordings, acoustic features, demographics, and clinical labels for 2,086 participants from DementiaBank.
2nd Place zedlab Released 2M synthetic patient records generated from EHR-based models trained on the Truven MarketScan database MarketScan and University of Chicago data.
3rd Place + Disproportionate Impact Bonus IGCPHARMA Contributed survey data from 26,839 older adults from MHAS and Mex-Cog, aligned with the HCAP cognitive protocol.
4th Place gaganwig Shared pre-processed resting-state fMRI scans for 1,491 subjects from the ADNI neuroimaging initiative.
5th Place EngrDynamics Provided survey data from ~1M respondents in the U.S. NHIS.
Data Idea korinreidellisonlabs Proposed a community-driven, de-identified AD-risk dataset combining EHR/claims with biomarker-rich data (e.g., ADNI).
Data Idea msundman Proposed use of dental radiographs as scalable early AD-risk biomarkers.
Data Idea stephanieruth.young Proposed open mobile cognitive screening data via the MyCog App.

Additional submission details can be found inside the directory for each prize winner.

Winners Blog Post: Meet the winners for Phase 1 of the PREPARE Challenge

Phase 2

Acoustic Track

Place Team or User Approach Highlights
1st Place ExplainableAD (sheep and cecelia) Whisper encoder, using two-stage training (full dataset, then language-specific fine-tuning) with CAM-based temporal interpretability.
2nd Place Harris and Kielo Ensemble combining eGeMAPS-v2 acoustic features with XGBoost, and two fine-tuned Whisper classifiers (one using triplet loss) with all-MiniLM-L6-v2 semantic clustering preprocessing and XGBoost meta-learner aggregation.
3rd Place + Explainability Bonus team SALEN Ensemble combining Whisper encoder voiceprint analysis, BERT semantic embeddings, and EfficientNet mel-spectrogram processing with weighted softmax fusion, enhanced by SHAP and CAM interpretability for clinical decision support.
Special Recognition (data processing) + Explainability Bonus SpeechCARE Multi-lingual acoustic–linguistic pipeline combining eGeMAPS-v2 features, transformer embeddings from wav2vec2 and Whisper, extensive preprocessing (noise filtering, anomaly detection, task-ID classification), and multimodal SHAP-enhanced explanations.
Special Recognition (generalizability) IGC Pharma Multi-lingual acoustic baseline using fine-tuned Whisper encoder embeddings with comprehensive audio preprocessing and augmentation.
Special Recognition (generalizability) BrainSignsLab Developed a hybrid feature extraction pipeline combining traditional acoustic features (eGeMAPSv2), transformer embeddings (Wav2Vec2, Whisper), and demographic data in a feature-selected XGBoostclassifier.

Additional submission details can be found inside the directory for each prize winner.

All solutions in the Acoustic Track were developed using data from the DementiaBank.

Lanzi, A. M., Saylor, A. K., Fromm, D., Liu, H., MacWhinney, B., & Cohen, M. (2023). DementiaBank: Theoretical rationale, protocol, and illustrative analyses. American Journal of Speech-Language Pathology. doi.org/10.1044/2022_AJSLP-22-00281

Social Determinants Track

Place Team or User Approach Highlights
1st Place RASKA-Team Used TabPFN-derived features, engineered demographic and temporal representations, and sequential prediction of three decision trees (CatBoost, LightGBM, XGBoost) followed by regularized regression.
2nd Place NxGTR Built a minimal-complexity decision tree-based model (LightGBM) to predict both speed and acceleration of cognitive decline.
3rd Place Cassandre Ensemble of decision tree-based models (LightGBM), and included controls for age and education.
Special Recognition (feature selection) GiaPaoDawei Applied large-scale feature selection using CatBoost with SHAP-guided recursive removal, and trained a decision tree ensemble (LightGBM, CatBoost).
Explainability Bonus Nick and Ry Prediction explainer report included predicted cognitive score and critical contextual information for non-technical audience understanding, with scores based on appropriate technical methods.

Additional submission details can be found inside the directory for each prize winner.

All solutions in the Social Determinants Track were developed using data from the MHAS (Mexican Health and Aging Study).

MHAS is partly sponsored by the National Institutes of Health/National Institute on Aging (grant number NIH R01AG018016) in the United States and the Instituto Nacional de Estadística y Geografía (INEGI) in Mexico. Data files and documentation are public use and available at www.MHASweb.org.

Winners Blog Post: Meet the winners for Phase 2 of the PREPARE Challenge

Phase 3

Place Team or User Track Approach Highlights
1st Place and Clean Code Bonus RASKA-Team Social Determinants Harmonized four additional international datasets from the HCAP network and trained an ensemble of LightGBM, CatBoost, and XGBoost with embedded fairness weighting.
2nd Place ExplainableAD Acoustic Used non-native and corpus-diverse speech from WLS, PITT, IVANOVA, VAS, and DELAWARE in DementiaBank plus Mispeech/Speechocean762 and the Qwen2.5-omni-3b model to train a fluency-focused disfluency model and deploy an explainable clinical demo.
Runner-up NxGTR Social Determinants Incorporated six independent datasets (including from HCAP network, NACC) with a unified LightGBM ADRDModel adaptable to dataset-specific feature sets.
Runner-up Harris and Kielo Acoustic Leveraged healthy-speech data from Mozilla Common Voice and synthetic ADRD-symptom text with a hierarchical Bayesian model combining a BART linguistic module and a CNN–Transformer encoder (CNN–Transformer overview) trained on GeMAPS features.

Additional Clean Code Bonus Winners

  • IGCPharma (Acoustic)
  • SpeechCARE (Acoustic)

About

Competition winners' code from PREPARE: Pioneering Research for Early Prediction of Alzheimer's and Related Dementias EUREKA Challenge

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published