Every paper your team shares — found and mapped.
PaperTrail automatically discovers papers shared across your Slack workspace, enriches them with metadata from OpenAlex and PubMed, computes semantic embeddings, and serves an interactive CellXGene-style dashboard with a canvas-based 2D embedding map, sortable table, and AI-powered chatbot search.
Documentation · Report Bug · Request Feature
- Slack Scraping — Finds papers across all configured channels by detecting DOI, arXiv, bioRxiv, PubMed, Nature, Cell, Science, OpenReview, and 30+ other scholarly URL patterns. Tracks engagement (reactions + thread replies).
- Multi-Strategy Metadata Enrichment — Cascading resolution pipeline: extract identifiers from URLs, batch OpenAlex lookup, individual OpenAlex/PubMed fallback, web search, and URL-based fallbacks. Handles tricky Elsevier/Cell PIIs via PubMed E-utilities.
- Semantic Embeddings — Generates embeddings via OpenAI, HuggingFace Inference API, local ONNX models (fastembed), or TF-IDF + SVD fallback (no API key needed).
- Interactive Dashboard — Self-contained HTML file with canvas-based scatter plot (UMAP/t-SNE/PCA), lasso and rectangle selection, zoom/pan, color-by-cluster/channel/user/date/year/citations, sortable table view, AI chatbot with Claude API integration, and inline detail panel.
- CLI Pipeline — Four-step pipeline:
scrape → enrich → embed → build.
The dashboard is a single self-contained HTML file that works offline. It includes:
- Canvas scatter plot with hardware-accelerated rendering for 1,000+ papers
- Three projection methods: UMAP (default), t-SNE, PCA — toggle in real time
- Six color modes: Cluster, Channel, User, Date, Year, Citations
- Lasso & rectangle selection with inline paper list in the sidebar
- "Select Top N" slider for quick filtering by citation count or relevance
- Sortable/filterable table view with all metadata columns
- AI chatbot (optional) powered by Claude API with
search_paperstool use - Export to Excel with one click
- Dark theme optimized for readability
Data is base64-encoded and embedded directly in the HTML — no server needed.
pip install papertrail-lab[all]Or install with a specific embedding backend:
pip install papertrail-lab[openai] # OpenAI embeddings (recommended)
pip install papertrail-lab[huggingface] # HuggingFace Inference API
pip install papertrail-lab[local] # Local ONNX (no API key needed)export SLACK_BOT_TOKEN="xoxb-your-token-here"
export OPENAI_API_KEY="sk-..." # for OpenAI embeddings (default)
# OR
export HF_TOKEN="hf_..." # for HuggingFace embeddings# Step 1: Scrape papers from Slack
papertrail scrape -o papers_raw.json
# Step 2: Enrich with metadata
papertrail enrich papers_raw.json -o papers_enriched.json
# Step 3: Compute embeddings, projections, clusters
papertrail embed papers_enriched.json -o papers_final.json --backend openai
# Step 4: Build the interactive dashboard
papertrail build papers_final.json -o dashboard.htmlpapertrail search -q "transformer attention mechanisms" -k 5Slack Workspace
│
▼
┌─────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Scraper │───▶│ Enricher │───▶│ Embeddings │───▶│ Preview │
│ │ │ │ │ │ │ │
│ - Slack API │ │ - OpenAlex │ │ - OpenAI │ │ - Canvas map │
│ - URL detect │ │ - PubMed │ │ - HuggingFace│ │ - Table view │
│ - Engagement │ │ - Web search │ │ - Local ONNX │ │ - AI chatbot │
│ metrics │ │ - Fallbacks │ │ - TF-IDF/SVD │ │ - Selection │
└─────────────┘ └──────────────┘ └──────────────┘ └──────────────┘
PaperTrail resolves paper metadata using a multi-strategy pipeline (see skills/paper-metadata-scraper/SKILL.md for full details):
- Extract identifiers from URL — DOIs, arXiv IDs, Elsevier PIIs, PMC IDs, OpenReview IDs
- Batch OpenAlex lookup — resolves up to 40 DOIs per request (fastest path)
- Individual OpenAlex lookup — DOI, arXiv DOI, or PMC ID
- PubMed E-utilities — best for Elsevier/Cell PIIs that other APIs can't handle
- Web search fallback — for OpenReview, conference proceedings, etc.
- URL-based fallback — generates readable titles from URL structure
All APIs are free and require no API keys. Adding a contact email to the User-Agent header gives access to OpenAlex's polite pool (10 req/s vs 1 req/s).
| Backend | Model | Dimensions | Speed | Quality | API Key Required |
|---|---|---|---|---|---|
| OpenAI (default) | text-embedding-3-small |
1536 | Fast | Excellent | Yes (OPENAI_API_KEY) |
| HuggingFace | BAAI/bge-small-en-v1.5 |
384 | Fast | Very Good | Optional (HF_TOKEN) |
| Local | BAAI/bge-small-en-v1.5 |
384 | Medium | Very Good | No |
| TF-IDF + SVD | N/A | 128 | Fast | Good | No |
The embedding backend is auto-detected based on available API keys. Override with --backend.
Embeddings are stored in a FAISS index for sub-millisecond similarity search:
from papertrail.embeddings import VectorStore
store = VectorStore()
store.load("faiss_index/")
results = store.search_text("single cell RNA sequencing", top_k=5)
for r in results:
print(f"[{r['score']:.3f}] {r['title']}")PaperTrail/
├── papertrail/ # Python package
│ ├── __init__.py
│ ├── scraper.py # Slack channel scraping + URL extraction
│ ├── enricher.py # Metadata enrichment (OpenAlex + PubMed)
│ ├── embeddings.py # Embedding backends (OpenAI, HF, fastembed, TF-IDF)
│ ├── projections.py # PCA, t-SNE, UMAP projections + K-Means clustering
│ ├── preview.py # Interactive HTML dashboard builder
│ ├── cli.py # Click CLI (papertrail scrape/enrich/embed/build/search)
│ └── templates/
│ └── dashboard.html # Dashboard HTML template ({{DATA_B64}} placeholder)
├── skills/ # Claude Code / Cowork skill files
│ ├── papertrail-pipeline/ # Full pipeline skill
│ │ └── SKILL.md
│ └── paper-metadata-scraper/ # Metadata resolution cascade skill
│ └── SKILL.md
├── docs/ # MkDocs documentation
├── tests/ # Unit tests
├── pyproject.toml # Package config and dependencies
└── papertrail_dashboard.html # Pre-built dashboard (Koo Lab, 1,072 papers)
git clone https://github.com/bschilder/PaperTrail.git
cd PaperTrail
pip install -e ".[dev]"
# Run tests
pytest
# Serve docs locally
mkdocs serveMIT License. See LICENSE.