Skip to content

CSF autophagy/lysosome biomarker panel discovery pipeline - 13 datasets, ~9,500 proteins scored, translational mouse↔human

Notifications You must be signed in to change notification settings

Sigray-Lab/CSF-Panel-Discovery

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

CSF Autophagy & Lysosome Biomarker Panel Discovery

A reproducible proteomics pipeline that identifies and ranks cerebrospinal fluid (CSF)-detectable markers of autophagy and lysosomal function with rodent-to-human translation, integrating 13 mass-spectrometry datasets across ~3,600 samples.

Overview

The pipeline scores ~9,500 proteins across five evidence axes:

Component Weight Source
Mouse CSF evidence 0.25 7 mouse CSF datasets (D1-D5, D7-D8)
Human CSF evidence 0.30 Human Astral CSF discovery (D11) + validation (D10, D12)
EV support 0.10 Human cell-line EV secretome reference
Brain plausibility 0.10 Mouse brain lysate (D6)
Autophagy membership 0.25 Curated autophagy/lysosome gene list (R1, 599 genes)

Hard gates (D11 tier A/B, mouse CSF tier A/B, R1 membership, plasma exclusion) filter candidates down to a 122-protein core panel and a top-80 shortlist.

Pipeline steps

Step Script Description
00 00_setup.py Validate inputs, create directory structure
01 01_extract_and_qc.py Parse 13 datasets into standardised format
02 02_orthology_mapping.py Mouse-to-human orthology via g:Profiler
03 03_evidence_scoring.py Compute 5-axis evidence scores for all proteins
04 04_autophagy_filter.py Apply hard gates, rank candidates
05 05_peptide_feasibility.py Cross-species peptide conservation & assay feasibility
06 06_module_validation.py Co-abundance module analysis on Astral data
07 07_validate_and_report.py QC, sensitivity analyses, figures, methods draft
08 08_ad_model_crosscheck.py Cross-check against AppNL-G-F AD mouse model CSF
09 09_ev_gate_analysis.py EV hard-gate sensitivity analysis
10 10_supplementary_gene_scoring.py Score supplementary curated gene lists (R1b)

Getting started

Prerequisites

Setup

# Clone the repository
git clone https://github.com/Sigray-Lab/CSF-Panel-Discovery.git
cd CSF-Panel-Discovery

# Create conda environment
conda env create -f DataProc/Scripts/environment.yml
conda activate csf_panel

# Place raw data files in raw/ (see raw/README.md)

# Run the pipeline
cd DataProc/Scripts
python 00_setup.py
python 01_extract_and_qc.py
python 02_orthology_mapping.py
python 03_evidence_scoring.py
python 04_autophagy_filter.py
python 05_peptide_feasibility.py
python 06_module_validation.py
python 07_validate_and_report.py
python 08_ad_model_crosscheck.py
python 09_ev_gate_analysis.py
python 10_supplementary_gene_scoring.py

Directory structure

CSF_panel_project/
├── DataProc/
│   ├── Scripts/              # Pipeline code (11 steps + utilities)
│   │   ├── config.yaml       # Central configuration (all paths, weights, thresholds)
│   │   ├── environment.yml   # Conda environment specification
│   │   ├── 00_setup.py ... 10_supplementary_gene_scoring.py
│   │   └── utils/            # Shared modules (parsers, scoring, orthology, QC, viz)
│   ├── Outputs/              # Final deliverables (ranked lists, figures, reports)
│   ├── DerivedData/          # Intermediate files (regenerated by pipeline)
│   ├── QC/                   # Quality control artifacts (regenerated)
│   ├── Log/                  # Timestamped execution logs (regenerated)
│   └── project_plan.md       # Full pipeline specification (v3)
└── raw/                      # Source data (not included; see raw/README.md)

Key outputs

  • Outputs/candidates_ranked.tsv — All ~9,500 scored proteins with full evidence breakdown
  • Outputs/core_panel_shortlist.tsv — Top 80 shortlisted proteins (pass all hard gates)
  • Outputs/master_pipeline_list.tsv — 139 master list (122 core + 17 human-only additions)
  • Outputs/figures/ — Publication-ready PDF figures
  • Outputs/methods_draft.md — Methods section draft

Configuration

All tuneable parameters (weights, gate thresholds, tier definitions, file paths) are centralised in DataProc/Scripts/config.yaml. The pipeline is fully config-driven with no hardcoded paths.

License

TBD

Citation

TBD

About

CSF autophagy/lysosome biomarker panel discovery pipeline - 13 datasets, ~9,500 proteins scored, translational mouse↔human

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •