This workflow is developed with reproducibility bearing in mind. Please refer to the following section for more details.
Note: The pipeline we used in the paper is available with tag v1.0.
We use Snakemake (v9+) as the backend to this workflow. Thus, a conda environment for Snakemake must be created in the first step. We recommend mamba as a replacement to conda for environment management.
Using reproducibility/environment.yml, we can create an environment for Snakemake:
# the current working directory is the root of this repo
# Alternative: `conda`
mamba env create --use-uv -y -f reproducibility/environment.ymlNote: If you are using mamba < 2.3.3 or conda, please drop
--use-uv.
We use multiple singularity containers for different methods and / or environments. To ensure reproducibility, please build these containers before executing the workflow.
The R version we use for this workflow is 4.4.2, and renv is used to track specific versions of packages. Please find files related to renv in reproducibility/r/metadata, and use r.def in reproducibility/r to build the corresponding container:
# the current working directory is the root of this repo
cd reproducibility/r
singularity build --fakeroot --force /path/to/the/built/container r.defThe 10X Xenium Ranger version we use here is 4.0.0. A link is used to download the software from the 10X website. Since 10X regularly updates this link, users should replace it with the most recent one if the container fails to be built:
# the current working directory is the root of this repo
singularity build --fakeroot --force /path/to/the/built/container reproducibility/10x.defThe Baysor version we use here is 0.7.0.
# the current working directory is the root of this repo
singularity build --fakeroot --force /path/to/the/built/container reproducibility/baysor.defThe Proseg version we use here is 3.0.10.
# the current working directory is the root of this repo
singularity build --fakeroot --force /path/to/the/built/container reproducibility/proseg.defThe Segger version we use here is a fix by us.
# the current working directory is the root of this repo
cd reproducibility/segger
singularity build --fakeroot --force /path/to/the/built/container segger.defPlease edit config/config.yml for the configuration of the workflow. A detailed guideline can be found in config/README.md.
We have developed a bash script, run.sh, to make the execution easy for users. Before execution, users need to configure a few entries in it under USER SETUP section.
-
MODULES: Modules to be loaded prior to execution on clusters. -
CONDA_BIN: The name of or path to eithermambaorconda. -
ENV_NAME: The name of or path to the conda environment (by defaultxenium_analysis_pipeline). -
LOCAL_PROFILEandCLUSTER_PROFILE: The execution of this workflow is controlled by profiles. Please refer to the Snakemake manuel for the details.We have provided two examples of profiles under
profiles. One is for local execution, which locates inprofiles/local; the other is for cluster execution, specifically slurm, which locates inprofiles/slurm. Users can edit these profiles according to their specific needs.Besides, users can define their own profiles for execution, e.g., when they use a cluster other than slurm. They only need to specify in proper places the paths to their customised profiles.
-
SINGULARITY_BIND_DIRS: An array of directories to bind to containers. Each element should be in the following form: LOCAL_DIR:SINGULARITY_DIR. Inexistent local directories will be filtered out. -
SNAKEMAKE_CACHE_DIR: A directory used to store cache of Snakemake. Internally it overwrite theXDG_CACHE_HOMEenvironment variable if it is set by users.
The workflow should be executed from the root directory of this repo. To get a self-explanatory help message, type
# the current working directory is the root of this repo
./run.sh --helpwhich prints
Usage: [ -m | --mode MODE ] [ -c | --core CORE ] [ -j | --jobs JOBS ] [ --retries RETRIES ] [ -n | --dry-run ] [ -R | --forcerun RULE ] [ -U | --until RULE ] [ --dag OUTPUT ] [ --unlock ] [ -v | --verbose ] [ -h | --help ]
-m,--mode MODE: the pipeline will be run on 'local' (default) or on 'cluster'.
-c,--core CORE: the number of cores to be used when -m,--mode is unset or 'local' (default: 1); ignored when -m,--mode is 'cluster'.
-j,--jobs JOBS: the number of jobs submitted to the cluster at the same time when -m,--mode is 'cluster'. (default: 500).
--retries RETRIES: the number of retries for failed jobs. (default: 0).
-n,--dry-run: dry run.
-R,--forcerun RULE: force the re-execution or creation of the given rule or file. Repeat this option multile times for multiple rules or files.
-U,--until RULE: runs the pipeline until it finishes the specified rule or generated the file. Repeat this option multile times for multiple rules or files.
--dag OUTPUT: draw dag and save to OUTPUT.pdf.
--unlock: unlock the working directory.
-v,--verbose: print more information.
-h,--help: print this message.
Note: Execution logging via snkmt is always enabled. The log database is stored at
${output_path}/snkmt.db(whereoutput_pathis read fromconfig/config.yml). To monitor the pipeline execution, run:mamba run -n xenium_analysis_pipeline snkmt console --db-path ${output_path}/snkmt.db
- Draw the DAG
./run.sh --dag dag- Dry-run
./run.sh -n- Run on cluster
./run.sh -m cluster- Run until a specific rule
./run.sh -m cluster -U rule_name- Run on cluster with retries
./run.sh -m cluster --retries 2There are some other files related to Xenium data analysis, residing in notebooks. Please refer to the documentation inside for more details.
-
Snakemake fails to create conda environments due to the lack of writing permission of
/tmp/conda.Such an issue occurs most often on devices with multiple users, such as HPCs and servers. In this case, the reason is simply that
/tmp/condais possessed by other users. A possible solution is to firstly create a folder in/tmp, such as/tmp/your_id, and then add/tmp/your_id:/tmptoSINGULARITY_BIND_DIRSinsiderun.sh. After this you can rerun the command, and those environments should be correctly created.Additionally, for HPCs, users might need to remove
/tmp/your_id:/tmpfromSINGULARITY_BIND_DIRSafter creating environments as directory/tmp/your_idis not likely present in compute nodes. -
For those steps involving 10X xeniumranger, sometimes I get the folowing error: "PermissionError: [Errno 13] Permission denied".
10X xeniumranger copies files from raw data during processing. This error could be because the user, as the owner of the raw data, deprives him-/herself of write permission to it. When 10X xeniumranger conducts copy operation, it also copies the modes of files, and hence this error when it needs to write to the copied files. Although it is a safe behaviour to prevent from accidental change of the raw data, users have to have write permission to the raw data when they are also the owner.
Code to reproduce analyses from the original manuscript can be found at https://github.com/bdsc-tds/Bilous2025