The Genotypes and Phenotypes in Families (GPF) system manages large databases of genetic variants and phenotypic measurements from family collections.
User documentation: see the GPF documentation at https://iossifovlab.com/gpfdocs/.
core/— GPF core library: genotype storage, studies, pedigrees, pheno, import tools, query API. Python package:gpf. Depends ongain.web_api/— Web application and REST API (Django 5.2). Python package:gpf_web. Depends ongpfandgain.impala_storage/,impala2_storage/,gcp_storage/— optional genotype storagesfederation/,rest_client/— federation and REST clientdocs/— documentation sources
The gain package (annotation engine, genomic resources,
effect annotation, task graph, gene scores/sets) lives in
its own repository at
https://github.com/iossifovlab/gain.
Primary stack: Python 3.12, Django 5.2, dask, pandas, pyarrow, duckdb, pysam, pytest, mypy, ruff.
Release notes live in docs/changes.rst.
uv is the primary supported workflow — it is what CI
and the production image build use (the four CI test
images and web_api/Dockerfile.production). The same path
works for local development. Conda/Mamba is supported
as an alternative for local development only; CI does
not consume conda.
The gain package lives in a separate repository
(https://github.com/iossifovlab/gain) and must be
checked out as a sibling before either workflow will
import it:
git clone https://github.com/iossifovlab/gain.git ../gainThis repo is a uv workspace (see root pyproject.toml).
Runtime dependencies are declared per sub-project; dev
tools live in each sub-project's own dev dependency
group. The root pyproject.toml is a virtual coordinator
([tool.uv] package = false) that defaults to installing
just gpf-core + gpf-web. The federation/ and
rest_client/ extensions are workspace members but
optional; the heavy storage backends (impala_storage/,
impala2_storage/, gcp_storage/) are deliberately not
workspace members and install standalone (see "Optional
genotype storages" below).
gain-core is consumed as a wheel produced by gain's CI
rather than as a sibling-repo path source. Build a fresh
wheel before the first sync (and again whenever you pull
new gain master, unless you opt into editable gain — see
"gain-core: wheel install vs editable" below):
cd ../gain
uv build --package gain-core --out-dir ../gpf/dist/gain
cd ../gpfCommon sync invocations (all require --find-links ./dist/gain so uv can resolve gain-core):
# Default: install gpf-core + gpf-web
uv sync --find-links ./dist/gain
# Everything: all workspace members + every dev group
uv sync --find-links ./dist/gain --all-packages --all-groups
# A single workspace member
uv sync --find-links ./dist/gain --package gpf-federation --group dev
# Activate the venv (optional; `uv run` works without it)
source .venv/bin/activateRun any command in the project's environment without
activation via uv run:
uv run pytest -v tests/small/
uv run ruff check --fix .
uv run mypy gpf --exclude core/docs/Manage dependencies via uv (don't edit pyproject.toml
deps directly — uv updates the lockfile in step):
# Add a runtime dep to a workspace member
cd core && uv add <dep>
# Add a dev dep
cd core && uv add --group dev <dep>
# Remove a dep
cd core && uv remove <dep>
# Refresh the entire lockfile
uv lock --upgrade
# Refresh just one dep
uv lock --upgrade-package <dep>The lockfile (uv.lock) is committed and managed by uv.
After git pull, re-run uv sync --find-links ./dist/gain
(plus --all-packages --all-groups if you've installed
the optional extensions) to pick up lockfile updates.
gain-core is not on PyPI and no longer pinned to
a sibling-repo path source in pyproject.toml (it used
to be — that pattern caused tb-eqh, where docker layer
caching of RUN git clone gain ... checkout master
served stale gain code across CI builds).
The committed model: gain-core is consumed as a wheel
that gets dropped into dist/gain/ and resolved via
uv sync --find-links ./dist/gain. CI provides the
wheel automatically; local devs have two choices.
The gpf Jenkinsfile's Fetch gain wheel stage uses
copyArtifacts to pull gain-core-*.whl from
iossifovlab/gain/master's last successful build into
dist/gain/ before any docker build runs. Each CI
Dockerfile then COPYs that wheel into the image and
syncs against it. Wheel filenames encode the gain SHA
(via hatch-vcs), so a moving gain master invalidates
the docker layer cache by content — no more stale-clone
class of bugs.
If you don't need editable gain (e.g. you're working on gpf only, gain is read-only):
# Build a fresh gain wheel into gpf/dist/gain/
cd ../gain
uv build --package gain-core --out-dir ../gpf/dist/gain
cd ../gpf
uv sync --find-links ./dist/gainRe-run the uv build step whenever you pull new gain
master to refresh the wheel.
If you're working across both repos and want gain edits to take effect without rebuilding a wheel, opt into the editable path source via the local override file:
# 1. Copy the template (once per checkout):
cp pyproject.local.toml.template pyproject.local.toml
# pyproject.local.toml is gitignored.
# 2. Tell git to ignore future local edits to pyproject.toml,
# then merge in the override block from your local file:
git update-index --skip-worktree pyproject.toml
python -c "
import pathlib, tomllib, tomlkit
base = tomlkit.parse(pathlib.Path('pyproject.toml').read_text())
local = tomllib.loads(pathlib.Path('pyproject.local.toml').read_text())
for pkg, src in local.get('tool', {}).get('uv', {}).get('sources', {}).items():
base['tool']['uv']['sources'][pkg] = src
pathlib.Path('pyproject.toml').write_text(tomlkit.dumps(base))
"
# 3. Sync — uv will now use the editable path source for gain-core.
uv syncTo revert to the canonical (CI-shaped) state — e.g. before opening a PR:
git update-index --no-skip-worktree pyproject.toml
git checkout -- pyproject.toml uv.lockThe skip-worktree bit prevents git status/git add
from picking up the local override; flipping it off
exposes the file again so a clean git checkout snaps
back to the committed state.
mamba env create --name gpf --file ./environment.yml
mamba env update --name gpf --file ./dev-environment.yml
conda activate gpf
pip install -e ../gain/core
pip install -e core
pip install -e web_api
# Optional extensions:
pip install -e federation
pip install -e rest_clientCI does not consume conda. Conda users can follow the
uv command shapes from inside an activated gpf env —
e.g. pip install -e ./impala_storage for the storage
backends, or running tests/lint without the uv run
prefix.
# Quick cycles
cd core
uv run pytest -v tests/small/test_file.py
uv run pytest -v tests/small/module/
# Full suites in parallel
cd core && uv run pytest -v -n 10 tests/
cd web_api && uv run pytest -v -n 5 gpf_web/Conda users: from inside an activated gpf env, drop the
uv run prefix (pytest -v -n 10 tests/ directly).
Test markers and configuration are defined in
core/pytest.ini (e.g., gs_inmemory, gs_duckdb,
gs_duckdb_parquet, grr_rw, grr_ro, grr_http).
uv run ruff check --fix .
uv run mypy gpf --exclude core/docs/
uv run mypy gpf_web --exclude web_api/docs/ \
--exclude web_api/conftest.pyThe optional storage backends (gpf_impala_storage,
gpf_impala2_storage, gpf_gcp_storage) bring in heavy
non-Python dependencies (Java/Hadoop, the Google Cloud
SDKs) that aren't on PyPI as wheels. They have their own
pyproject.toml files but are deliberately not
workspace members of the root coordinator — installing
all workspace members would otherwise pull these heavy
backend deps into the lockfile for everyone. Install
them standalone into the active venv:
uv pip install -e ./impala_storage
uv pip install -e ./impala2_storage
uv pip install -e ./gcp_storageConda users: use plain pip install -e ./impala_storage
(etc.) from inside the activated conda env — same shape.
GCP storage tests need application-default credentials for
the seqpipe-gcp-storage-testing project:
gcloud config list project
gcloud auth application-default loginThen, from the gcp_storage/ directory:
uv run pytest -v gcp_storage/tests/
uv run pytest -v ../core/tests/ \
--gsf gcp_storage/tests/gcp_storage.yamlA git pre-commit hook for lint checking with Ruff is included. Install it from the repository root:
cp pre-commit .git/hooks/To bypass the pre-commit hook when committing:
git commit --no-verify- Prefer
uv run <cmd>over activating the venv — works in any fresh shell without state. - After
git pull, re-runuv sync --find-links ./dist/gain(add--all-packages --all-groupsif you've installed the optional workspace members) to pick up lockfile updates. - After
git pullin../gain/, rebuild the gain wheel and re-sync:cd ../gain && uv build --package gain-core --out-dir ../gpf/dist/gain && cd ../gpf && uv sync --find-links ./dist/gain. Editable-gain users skip this — see "Editable gain for local development" above. - Conda users: always activate the
gpfenvironment before running commands (conda activate gpf), and re-runpip install -e coreif imports fail. - Some tests may be flaky with high parallelism; reduce
-nor run without it.
This project is licensed under the MIT License. See
LICENSE for details.