Add STPath: spatial transcriptomics gene expression prediction from histology#233
Open
simonschindler wants to merge 13 commits into
Open
Add STPath: spatial transcriptomics gene expression prediction from histology#233simonschindler wants to merge 13 commits into
simonschindler wants to merge 13 commits into
Conversation
- Import STPath and STFM in tile_prediction/__init__.py so the @register decorator fires and the model appears in MODEL_REGISTRY - Remove two scratch dev scripts (test_stpath.py, comp_predictions.py) that contained hardcoded local paths and Desktop output paths - Remove debug print(self.config) from STFM.__init__ - Replace print() with warnings.warn() for missing gene symbols and torch_geometric fallback - Add torch-geometric as the [stpath] optional extra in pyproject.toml - Add stpath entry to MODEL_INPUT_ARGS in test_models_general.py - Correct registry metadata: license and commercial set to None (no license file in upstream GitHub repo or HuggingFace model card; the previous CC BY-NC-ND 4.0 value was an incorrect assumption) - Update paper_url to https://doi.org/10.64898/2026.03.17.711896 - Relax test_models_general assertions on license/commercial from `is not None` to hasattr(), so models with genuinely unspecified licenses do not cause a hard test failure
FLOPs scale with spot count (N) rather than being fixed — the stored 305.9 GFLOPs was computed on the INT2 dev dataset and is not meaningful as a registry constant. param_size (~50M, verified at 49.2M) is kept.
test_stpath_equivalence.py clones the original STPath repo from GitHub and verifies that the LazySlide STFM reimplementation produces bit-identical outputs (atol=1e-5) to the original for the same weights and inputs. The clone is skipped gracefully when the network is unavailable; set STPATH_REPO to a local clone to skip the network step during development. Also removes the leftover test_stpath.ipynb dev notebook.
Member
|
Thanks, @simonschindler, for adding the new model. Here are a few comments:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR integrates STPath (Huang et al., 2025) into LazySlide as a new model of type
spatial_transcriptomics. STPath predicts gene expression across the full transcriptome (~38 000 genes) from tile-level image embeddings and spatial coordinates, using a spatial transformer foundation model (STFM).src/lazyslide/models/tile_prediction/stpath.py(~1 900 lines): all tokenizers, the STFM architecture, and theSTPathwrapper class that returns a fully annotatedAnnDataobject"stpath"inMODEL_REGISTRYvia@registerdocs/source/references.bibtorch-geometricas the[stpath]optional extra inpyproject.toml(needed for sparse gene expression encoding)tests/models/test_stpath_equivalence.py: clones the original STPath repository from GitHub at test time and asserts bit-identical outputs (atol=1e-5) between the reimplementation and upstream, for the same weights and inputs. SetSTPATH_REPOto a local clone to skip the network step.Points needing reviewer input
1. License and commercial use are unspecified (
None)Neither the GitHub repository nor the HuggingFace model card carries a license file or declaration. The value previously present (
CC BY-NC-ND 4.0) was an incorrect assumption and has been removed. Both fields are explicitly set toNone. Thetest_models_general.pyassertion was relaxed fromis not Nonetohasattr()to accommodate models with genuinely unspecified licenses. If the authors clarify the license this should be updated.2. No LazySlide dispatch function for
spatial_transcriptomicsmodelsThe contributing guide states that new models should be wired into a corresponding LazySlide function (e.g.
zs.tl.encode_tiles()for vision models). There is currently no such function for spatial transcriptomics prediction —STPathmust be used by instantiating it directly. This is a known gap; opening a separate issue to add azs.tl.predict_gene_expression()or similar function is suggested.3. New
ModelTask.spatial_transcriptomicsenum valuefeature_predictionwas renamed tospatial_transcriptomicsto be more descriptive and consistent with the established field name. The contributing guide currently only documentsvision,segmentation,multimodal, andtile_predictionmodel types — should aspatial_transcriptomicssection be added to the guide?Test plan
pytest tests/models/test_stpath_equivalence.py— numerical equivalence against upstream (requires network orSTPATH_REPO)pytest tests/test_model_registry.py—stpathappears in registry with correct metadatapytest tests/models/test_models_general.py -m large_runner -k stpath— model initialises, param count and FLOPs estimation work