CSF Autophagy & Lysosome Biomarker Panel Discovery

A reproducible proteomics pipeline that identifies and ranks cerebrospinal fluid (CSF)-detectable markers of autophagy and lysosomal function with rodent-to-human translation, integrating 13 mass-spectrometry datasets across ~3,600 samples.

Overview

The pipeline scores ~9,500 proteins across five evidence axes:

Component	Weight	Source
Mouse CSF evidence	0.25	7 mouse CSF datasets (D1-D5, D7-D8)
Human CSF evidence	0.30	Human Astral CSF discovery (D11) + validation (D10, D12)
EV support	0.10	Human cell-line EV secretome reference
Brain plausibility	0.10	Mouse brain lysate (D6)
Autophagy membership	0.25	Curated autophagy/lysosome gene list (R1, 599 genes)

Hard gates (D11 tier A/B, mouse CSF tier A/B, R1 membership, plasma exclusion) filter candidates down to a 122-protein core panel and a top-80 shortlist.

Pipeline steps

Step	Script	Description
00	`00_setup.py`	Validate inputs, create directory structure
01	`01_extract_and_qc.py`	Parse 13 datasets into standardised format
02	`02_orthology_mapping.py`	Mouse-to-human orthology via g:Profiler
03	`03_evidence_scoring.py`	Compute 5-axis evidence scores for all proteins
04	`04_autophagy_filter.py`	Apply hard gates, rank candidates
05	`05_peptide_feasibility.py`	Cross-species peptide conservation & assay feasibility
06	`06_module_validation.py`	Co-abundance module analysis on Astral data
07	`07_validate_and_report.py`	QC, sensitivity analyses, figures, methods draft
08	`08_ad_model_crosscheck.py`	Cross-check against AppNL-G-F AD mouse model CSF
09	`09_ev_gate_analysis.py`	EV hard-gate sensitivity analysis
10	`10_supplementary_gene_scoring.py`	Score supplementary curated gene lists (R1b)

Getting started

Prerequisites

Conda or Mamba
Raw data files (see raw/README.md for sourcing instructions)

Setup

# Clone the repository
git clone https://github.com/Sigray-Lab/CSF-Panel-Discovery.git
cd CSF-Panel-Discovery

# Create conda environment
conda env create -f DataProc/Scripts/environment.yml
conda activate csf_panel

# Place raw data files in raw/ (see raw/README.md)

# Run the pipeline
cd DataProc/Scripts
python 00_setup.py
python 01_extract_and_qc.py
python 02_orthology_mapping.py
python 03_evidence_scoring.py
python 04_autophagy_filter.py
python 05_peptide_feasibility.py
python 06_module_validation.py
python 07_validate_and_report.py
python 08_ad_model_crosscheck.py
python 09_ev_gate_analysis.py
python 10_supplementary_gene_scoring.py

Directory structure

CSF_panel_project/
├── DataProc/
│   ├── Scripts/              # Pipeline code (11 steps + utilities)
│   │   ├── config.yaml       # Central configuration (all paths, weights, thresholds)
│   │   ├── environment.yml   # Conda environment specification
│   │   ├── 00_setup.py ... 10_supplementary_gene_scoring.py
│   │   └── utils/            # Shared modules (parsers, scoring, orthology, QC, viz)
│   ├── Outputs/              # Final deliverables (ranked lists, figures, reports)
│   ├── DerivedData/          # Intermediate files (regenerated by pipeline)
│   ├── QC/                   # Quality control artifacts (regenerated)
│   ├── Log/                  # Timestamped execution logs (regenerated)
│   └── project_plan.md       # Full pipeline specification (v3)
└── raw/                      # Source data (not included; see raw/README.md)

Key outputs

Outputs/candidates_ranked.tsv — All ~9,500 scored proteins with full evidence breakdown
Outputs/core_panel_shortlist.tsv — Top 80 shortlisted proteins (pass all hard gates)
Outputs/master_pipeline_list.tsv — 139 master list (122 core + 17 human-only additions)
Outputs/figures/ — Publication-ready PDF figures
Outputs/methods_draft.md — Methods section draft

Configuration

All tuneable parameters (weights, gate thresholds, tier definitions, file paths) are centralised in DataProc/Scripts/config.yaml. The pipeline is fully config-driven with no hardcoded paths.

License

TBD

Citation

TBD

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
DataProc		DataProc
raw		raw
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CSF Autophagy & Lysosome Biomarker Panel Discovery

Overview

Pipeline steps

Getting started

Prerequisites

Setup

Directory structure

Key outputs

Configuration

License

Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Sigray-Lab/CSF-Panel-Discovery

Folders and files

Latest commit

History

Repository files navigation

CSF Autophagy & Lysosome Biomarker Panel Discovery

Overview

Pipeline steps

Getting started

Prerequisites

Setup

Directory structure

Key outputs

Configuration

License

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages