Add "grid" rules to segmentation by mariiabilous · Pull Request #104 · bdsc-tds/xenium_analysis_pipeline

mariiabilous · 2026-01-08T11:00:55Z

This PR introduces support for grid-based segmentations to simulate Visium and VisiumHD layouts from Xenium data:

Grid Generation:

Generates Visium-style (grid_visium_{diameter}um) and VisiumHD-style (grid_square_{bin_size}um) grids in .geojson format over Xenium samples.

These grids are used as synthetic cells via xeniumranger import-segmentation to produce pseudo-Visium outputs.

Registered Grid Handling:

If a registered grid segmentation exists at:

{output_path}/registration/{compact_segmentation_id}/{sample_id}/cell_boundaries.geojson

with ID format: grid_registered_{technology}_{diameter_or_bin_size}um, it will also be imported via xeniumranger import-segmentation to generate pseudo-Visium from registered data.

coderabbitai · 2026-01-08T11:01:01Z

Summary by CodeRabbit

New Features
- Added Visium, Visium HD (square), and registered-cell grid generation with GeoJSON cell-boundary export.
- Added an end-to-end grid import and normalization workflow with structured logging and resource configuration.
- Added spatial grid utilities to generate grids, map cells to grid spots, and export cell boundaries.
Performance
- Increased memory allocation for coexpression analysis.

Walkthrough

Adds grid-based segmentation: new Snakemake rules for generating Visium, VisiumHD (square), and registered grids; a consolidated importGrid rule to normalize segmentation via xeniumranger; R utilities and scripts to build spatial grids and export GeoJSON; minor workflow includes change and increased coexpression memory.

Changes

Cohort / File(s)	Summary
Grid rule — registered `workflow/rules/_segmentation/_grid/registered_grid.smk`	Adds `get_registered_cells_geojson` helper and `rule importRegisteredGrid` that symlinks registered cell boundaries into segmentation outputs with wildcard constraint `grid_registered_\w+um` and logging.
Grid rule — square `workflow/rules/_segmentation/_grid/square_grid.smk`	Adds `rule generateSquareGrid` using xenium bundle input, params `bin_size` (defaults supported), wildcard constraint `grid_square_\w+um`, container from `config["containers"]["r"]`, and runs `square_grid.R`.
Grid rule — visium `workflow/rules/_segmentation/_grid/visium_grid.smk`	Adds helpers `get_generated_cells_geojson`, `get_input2_or_params4generateVisiumGrid` and `rule generateVisiumGrid` with `diameter` param, wildcard constraint `grid_visium_\w+um`, R container, and runs `visium_grid.R`.
Grid orchestration `workflow/rules/_segmentation/grid.smk`	Adds `rule importGrid` that aggregates subrules, computes absolute paths/params, constrains `compact_segmentation_id` to `grid_(?:[a-zA-Z]+_)+\d+um`, sets threads/resources, and invokes `xeniumranger import-segmentation`. Includes the three `_grid` subrule files.
Segmentation include & coexpression `workflow/rules/segmentation.smk`, `workflow/rules/coexpression.smk`	Switches active include to `grid.smk` (visium include commented) and increases `computeCoexpression` mem multiplier from 512 to 2048.
Grid R utilities `workflow/scripts/_segmentation/_grid/grid_utils.R`	Adds sf-based utilities: `make_visium_grid_sf`, `make_visiumhd_grid_sf`, `map_cells_to_spots`, and `write_xenium_cells_geojson` for grid generation, mapping cells to spots, and GeoJSON export with validation and CRS handling.
Grid R scripts `workflow/scripts/_segmentation/_grid/visium_grid.R`, `workflow/scripts/_segmentation/_grid/square_grid.R`	Adds scripts that read Xenium parquet to compute bbox, call grid utilities (visium or visiumHD), and write `cell_boundaries.geojson`; includes logging and input validation.

Sequence Diagram(s)

sequenceDiagram
    participant SM as Snakemake
    participant R as R_Script
    participant Utils as Grid_Utils
    participant Bundle as Xenium_Bundle
    participant XR as xeniumranger
    participant Out as Normalised_Output

    SM->>R: invoke generate*Grid with xenium_bundle input
    R->>Bundle: read parquet bundle (arrow)
    Bundle-->>R: bbox (xmin,xmax,ymin,ymax)
    R->>Utils: request grid (hex or square) with bbox & params
    Utils-->>R: return sf centers + polygons
    R->>R: write GeoJSON cell_boundaries.geojson
    R-->>SM: produce cells GeoJSON output
    SM->>XR: run xeniumranger import-segmentation (bundle + cells)
    XR-->>Out: write normalised_results

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

🐇 I nudge the grid, hex and square,
I stitch each spot with careful care,
R hums, Snakemake pulls tight the thread,
xeniumranger wakes and spreads,
Hops complete — the tiled world is read.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title "Add "grid" rules to segmentation" accurately and concisely describes the main change: introducing grid-based segmentation rules (Visium and VisiumHD grids) to the segmentation workflow.
Description check	✅ Passed	The PR description is directly related to the changeset, providing clear context about grid generation (Visium and VisiumHD styles), registered grid handling, and the pseudo-Visium output use case.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch import-segmentation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

mariiabilous · 2026-01-08T11:10:13Z

These rules have been tested for grid_visium_55um, grid_visium_registered_55um, grid_square_8um and grid_square_16um.

To run the pipeline, i currently specify the config.yml as follows:

segmentation:
  methods:
    - grid_visium_55um
    - grid_square_8um
    - grid_square_16um
    - grid_registered_visium_55um
...

  grid_visium_55um:
    diameter: 55
...
  grid_registered_visium_55um:
    diameter: 55
...
    
  grid_square_8um:
    bin_size: 8
...

but it should be something like:

segmentation:
  methods:
    - grid
    - grid_registered
...

grid:
    visium:
      diameter: 55 
    square:
      bin_size: 
        - 8
        - 16 #  grid_square_16um' or ideally 'grid_square_016um' to be consistent with visiumHD structure
  
 grid_registered:
    visium:
      diameter: 55 
...

Need improvements from @senbaikang :) Thaks!

coderabbitai

Actionable comments posted: 15

🤖 Fix all issues with AI agents

In @workflow/rules/_segmentation/_grid/registered_grid.smk:
- Line 25: The rule contains an empty resources: directive (same issue as
square_grid.smk) which is invalid or misleading; either remove the solitary
"resources:" line from registered_grid.smk or populate it with the required
resource keys/values used by the rule (e.g., the cell_boundaries.geojson entry)
so the rule's resource requirements are explicit; update the registered_grid.smk
rule to mirror the fix you applied in square_grid.smk (remove empty directive or
add correct resource identifiers).

In @workflow/rules/_segmentation/_grid/square_grid.smk:
- Around line 27-28: The Snakemake rule file contains an empty resources:
directive (the resources: line in square_grid.smk) which can confuse parsing;
either remove the empty resources: block entirely or replace it with explicit
resource declarations needed by the rule (e.g., resources: mem_mb=X, threads=Y
or the actual resource names your workflow uses) so the rule no longer has a
bare/empty resources: entry; update the rule that contains container: to
reference the corrected resources (or omit resources if none are required).

In @workflow/rules/_segmentation/_grid/visium_grid.smk:
- Around line 5-12: The docstring for get_generated_cells_geojson is inaccurate:
it mentions a {output_path}/custom_segmentation/... convention while the
function actually builds a path under
{config["output_path"]}/segmentation/{wildcards.compact_segmentation_id}/{wildcards.sample_id}/processed_results/cell_boundaries.geojson;
update the docstring to reflect this exact path and naming (replace
"custom_segmentation" with "segmentation" and the final filename "cells.geojson"
with "processed_results/cell_boundaries.geojson") so the documentation matches
the implementation.
- Around line 58-60: The rule currently has the `container:` directive nested
under `resources:` which is invalid; move `container:` out to the same top-level
indentation as `input:`, `output:`, `params:`, `log:`, and `resources:` so it is
a top-level directive (or alternatively give `resources:` a proper value and
keep `container:` separate); update the block containing `resources:` and
`container:` so `container:` is not indented under `resources:` and references
config["containers"]["r"] at top level.
- Line 58: The inline comment on the "resources:" line in visium_grid.smk is
unclear/incomplete; either remove it or rewrite it to a clear statement (e.g.,
"resources:  # low resource requirements; this rule does not read or create
cell_boundaries.geojson from xenium bundles") so maintainers understand whether
resources are specified and whether the rule reads/creates
cell_boundaries.geojson; update the comment next to the "resources:" token
accordingly or delete the comment if no resources note is necessary.
- Around line 50-55: The diameter lambda currently calls get_dict_value(config,
"segmentation", wildcards.compact_segmentation_id, "diameter") but does not
supply a default, so if the key is missing a KeyError will be raised; update the
diameter lambda (the anonymous function passed for the rule parameter
"diameter") to call get_dict_value with replace_none=55 (or otherwise explicitly
handle a missing key and return 55) so the function returns 55 by default as the
comment states.
- Around line 15-35: The function get_input2_or_params4generateVisiumGrid
contains a dead code path for for_input=False; remove the alternate branches and
simplify the function to only return the direct ret result used when
for_input=True, and update the function signature to drop the unused for_input
parameter so it becomes get_input2_or_params4generateVisiumGrid(wildcards) ->
str; ensure references to get_raw_data_dir(wildcards.sample_id) and the returned
ret value remain intact and that callers (which already invoke the default)
continue to work unchanged.

In @workflow/rules/_segmentation/grid.smk:
- Around line 1-7: Fix the header comments: correct typos "resuls" → "results"
and "adjuscent" → "adjacent", and update the grid name examples to match the
convention used in square_grid.smk (use grid_square_\w+um). For example replace
"grid_binned_8um" and "grid_registered_binned_8um" with "grid_square_8um" and
"grid_square_registered_8um" (or state the pattern as grid_square_\w+um) so the
naming in this comment matches the actual rule names.
- Around line 51-52: The inline comment for the wildcard_constraints entry
compact_segmentation_id currently references grid_visiumhd_8um but should match
the file header's documented naming; update the comment to read "can be
grid_visium_55um or grid_registered_visium_55um or grid_binned_8um" so it stays
consistent with the expected grid IDs.

In @workflow/scripts/_segmentation/_grid/grid_utils.R:
- Around line 9-11: The current noop assignment names(bbox) <- names(bbox)
leaves bbox unnamed and causes NULL subscripts; replace it with a conditional
that sets default names when missing (e.g., if is.null(names(bbox)) or
anyNA(names(bbox)) then assign c("xmin","ymin","xmax","ymax")), otherwise keep
existing names; ensure this change is applied where bbox is validated (the bbox
variable in this function or helper in grid_utils.R) so the subsequent uses xmin
<- bbox[["xmin"]], etc. do not fail.
- Around line 184-185: The no-op line names(bbox) <- names(bbox) should be
replaced with logic to handle unnamed bbox inputs: in the function using bbox
(same pattern as in make_visium_grid_sf), after stopifnot(is.numeric(bbox),
length(bbox) == 4) check if names(bbox) is NULL or contains empty strings and,
if so, assign a sensible default name vector (e.g. xmin, ymin, xmax, ymax);
otherwise leave existing names intact. Update the code that currently contains
names(bbox) <- names(bbox) to perform that conditional assignment so unnamed
bbox vectors are robustly named.
- Around line 208-226: The code currently calls match.arg(origin) which only
validates/returns the first element, so the y-axis origin is ignored; change the
API to accept separate x/y origin values (e.g., origin_x and origin_y) or treat
origin as a length-2 vector and validate both elements separately by calling
match.arg(origin[1], c("xmin","xmax")) and match.arg(origin[2],
c("ymin","ymax")); then use the validated origin_x/origin_y when computing
x_anchor, y_anchor and the x_edges/y_edges seq logic (the x_anchor/x_edges and
y_anchor/y_edges blocks) so both axes are properly handled and the +/-
bin_size_um branches work for xmax/xmin and ymax/ymin respectively.
- Around line 56-69: The function map_cells_to_spots is missing a return and
uses the pipe operator without ensuring the magrittr/dplyr pipe is available;
fix by explicitly importing/namespace-qualifying the pipe (e.g., add
library(dplyr) or use dplyr::`%>%`/magrittr::`%>%`) before using
visium_grid_sf$spots %>% select(spot_id) (or replace the pipe with
visium_grid_sf$spots[ , "spot_id", drop = FALSE] and call dplyr::select), and
ensure the function returns the joined result (e.g., return(cell_to_spot) or put
cell_to_spot as the last expression) after the st_join call so callers of
map_cells_to_spots receive the mapped sf.

In @workflow/scripts/_segmentation/_grid/square_grid.R:
- Around line 12-16: The log and error messages use concatenation without spaces
resulting in cramped output; update the message() and stop() calls that
reference cellb_path so they build strings with proper spacing (e.g., include
separators or use sprintf/paste0 with explicit spaces) — specifically adjust the
message("Xenium bundle path:", cellb_path) and the stop("Xenium bundle",
cellb_path, "does not exist! \n") usages to produce readable output like "Xenium
bundle path: <path>" and "Xenium bundle <path> does not exist!".

In @workflow/scripts/_segmentation/_grid/visium_grid.R:
- Around line 12-16: The log calls print concatenated strings without spaces;
update the message and stop invocations around the cellb_path variable (the
message call using "Xenium bundle path:" and the stop call using "Xenium
bundle", cellb_path, ...) to produce readable output by joining the label and
cellb_path with a space (e.g., use a single string with a trailing space or use
a string-joining function like paste()/sprintf() to ensure a space between the
label and the path) so logs read "Xenium bundle path: /path" and stop errors
include the path separated by a space.

🧹 Nitpick comments (5)

workflow/scripts/_segmentation/_grid/grid_utils.R (1)
79-82: Avoid calling library() inside functions.

Loading packages inside a function can cause unintended side effects and slows repeated calls. Since sf is already loaded at the top of the file, only jsonlite needs to be ensured at file level.
♻️ Suggested refactor

Move library(jsonlite) to the top of the file alongside library(sf):
 library(sf)
+library(jsonlite)
Then remove the suppressPackageStartupMessages block from inside the function.
workflow/scripts/_segmentation/_grid/visium_grid.R (1)
34-34: Remove debug print statement.

This appears to be a debug artifact that should be removed or converted to a message() call for consistency with the rest of the logging.
♻️ Proposed fix
-print(snakemake@params[["diameter"]])
+message("Spot diameter: ", snakemake@params[["diameter"]], " um")
workflow/rules/_segmentation/_grid/square_grid.smk (1)

11-31: Consider adding threads for consistency with importGrid rule.

The importGrid rule in grid.smk specifies both threads and resources.mem_mb. For consistency and proper resource allocation, consider adding similar specifications here.

workflow/scripts/_segmentation/_grid/square_grid.R (1)

1-45: Consider extracting shared logic with visium_grid.R.

Lines 1-30 are nearly identical to visium_grid.R. The common pattern (logging setup, library loading, input validation, bbox computation) could be extracted to a shared helper function in grid_utils.R to reduce duplication.
workflow/rules/_segmentation/_grid/registered_grid.smk (1)
21-22: Log path inconsistency.

The log path uses /registration/ while the output is under /segmentation/. Other rules consistently log under the same base path as their outputs. Consider aligning the log path for consistency.
♻️ Proposed fix
     log:
-        f'{config["output_path"]}/registration/{{compact_segmentation_id}}/{{sample_id}}/logs/importRegisteredGrid.log'
+        f'{config["output_path"]}/segmentation/{{compact_segmentation_id}}/{{sample_id}}/logs/importRegisteredGrid.log'

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9cc2c60 and 605b6a8.

📒 Files selected for processing (9)

workflow/rules/_segmentation/_grid/registered_grid.smk
workflow/rules/_segmentation/_grid/square_grid.smk
workflow/rules/_segmentation/_grid/visium_grid.smk
workflow/rules/_segmentation/grid.smk
workflow/rules/coexpression.smk
workflow/rules/segmentation.smk
workflow/scripts/_segmentation/_grid/grid_utils.R
workflow/scripts/_segmentation/_grid/square_grid.R
workflow/scripts/_segmentation/_grid/visium_grid.R

🔇 Additional comments (5)

workflow/rules/coexpression.smk (1)

33-33: Clarify the 4x memory increase.

The memory allocation has increased from 512MB to 2048MB per attempt (4x). While the implementation is correct, this significant resource change could impact cluster scheduling and costs.

Please confirm:

Is this increase related to grid segmentation workloads?

Has this been tested and validated as necessary?

Were OOM errors encountered with the previous allocation?

workflow/scripts/_segmentation/_grid/visium_grid.R (1)

20-28: LGTM!

The bbox computation pipeline using Arrow for efficient Parquet reading with lazy evaluation and dplyr summarization is well-structured. Using na.rm = TRUE handles potential missing values appropriately.

workflow/rules/segmentation.smk (1)

33-34: LGTM!

The consolidation of grid-related rules under grid.smk is a good refactoring choice, reducing complexity in the main segmentation include file. The commented line provides useful history.

workflow/rules/_segmentation/grid.smk (1)

29-69: LGTM!

The importGrid rule is well-structured with proper resource allocation, logging, and xeniumranger integration. The use of absolute paths for bundle and cells inputs ensures robustness when running from the work directory.

workflow/rules/_segmentation/_grid/registered_grid.smk (1)

26-31: LGTM!

The shell command correctly creates the output directory, uses ln -sf to handle existing symlinks gracefully, and logs the action. Using realpath ensures the symlink target is absolute, avoiding broken links if the working directory changes.

coderabbitai · 2026-01-08T11:16:19Z

+    resources: # not demanding reads cell_boundaries.geojson from the xenium bundles and create another one. 
+    container:


⚠️ Potential issue | 🟡 Minor

Empty resources block.

The resources: directive on line 27 has no values defined, which may cause Snakemake parsing issues or unexpected defaults. Either add resource specifications or remove the directive.

♻️ Proposed fix (option 1: remove empty block)

- resources: # not demanding reads cell_boundaries.geojson from the xenium bundles and create another one. container:

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

resources: # not demanding reads cell_boundaries.geojson from the xenium bundles and create another one.

container:

container:

🤖 Prompt for AI Agents

In @workflow/rules/_segmentation/_grid/square_grid.smk around lines 27 - 28, The Snakemake rule file contains an empty resources: directive (the resources: line in square_grid.smk) which can confuse parsing; either remove the empty resources: block entirely or replace it with explicit resource declarations needed by the rule (e.g., resources: mem_mb=X, threads=Y or the actual resource names your workflow uses) so the rule no longer has a bare/empty resources: entry; update the rule that contains container: to reference the corrected resources (or omit resources if none are required).

coderabbitai

Actionable comments posted: 3

🤖 Fix all issues with AI agents

In `@workflow/rules/_segmentation/grid.smk`:
- Around line 59-69: Ensure the shell block removes any existing partial
normalised_results directory before calling xeniumranger: add a step to delete
{params.work_dir}/normalised_results (e.g. rm -rf
{params.work_dir}/normalised_results) immediately before invoking xeniumranger
import-segmentation so the tool cannot resume an incomplete pipestance; keep the
rest of the shell block and parameters (params.abs_bundle, params.abs_cells,
params.other_options, threads, params.localmem, params.abs_log) unchanged.

In `@workflow/scripts/_segmentation/_grid/grid_utils.R`:
- Around line 58-73: The function map_cells_to_spots is defined but never used;
either delete it or wire it into the pipeline where cell coordinates are
assigned to Visium spots: if removing, remove the entire map_cells_to_spots
definition and any related unused imports; if keeping, call
map_cells_to_spots(visium_grid_sf, cells_df) from the grid processing flow that
constructs visium_grid_sf and cells_df (e.g., immediately after cells_df and
visium_grid_sf are created/loaded), return or store its result (cell_to_spot)
for downstream steps, and ensure any callers import or reference the function
name correctly so it is exercised by tests.

In `@workflow/scripts/_segmentation/_grid/square_grid.R`:
- Around line 34-38: The call that creates visiumHD_grid passes
snakemake@params[["bin_size"]] as a string but make_visiumhd_grid_sf requires a
numeric bin_size_um; wrap the Snakemake parameter in as.numeric() when calling
make_visiumhd_grid_sf (i.e., set bin_size_um =
as.numeric(snakemake@params[["bin_size"]])) following the same pattern used for
numeric params in standard_qc.R so the is.numeric() check in grid_utils.R
passes.

🧹 Nitpick comments (1)

workflow/scripts/_segmentation/_grid/square_grid.R (1)

1-3: Log sink is opened but never closed.

If an error occurs mid-script, the sink remains open. Consider wrapping the body in tryCatch or adding on.exit(sink(type = "message")); on.exit(sink(type = "output"), add = TRUE) after the sink calls to ensure cleanup.

Also applies to: 45-48

coderabbitai · 2026-02-06T15:30:45Z

+    shell:
+        r"""
+        mkdir -p {params.work_dir}/logs
+        cd {params.work_dir}
+        xeniumranger import-segmentation --id=normalised_results \
+          --xenium-bundle={params.abs_bundle} \
+          --cells={params.abs_cells} \
+          --units=microns \
+          --localcores={threads} \
+          --localmem={params.localmem} \
+          {params.other_options} &> {params.abs_log}


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

xeniumranger import-segmentation force overwrite flag documentation

💡 Result:

10x Genomics’ Xenium Ranger import-segmentation does not document any --force, --overwrite, or similar “force overwrite” flag. The documented behavior is:

If you want to re-run from scratch, use a new --id, or delete the existing pipestance directory (named after --id) and the invocation .mro file, then re-run. [1]

If the output directory exists and is treated as an in-progress/failed pipestance, Xenium Ranger will try to resume; for lock issues, the troubleshooting docs recommend either renaming the output directory or removing the _lock file (only if you’re sure nothing else is running). [2]

Sources:
[1] Xenium Ranger Command Line Arguments (latest) — pipestance re-run guidance [turn4view3]
[2] Troubleshooting Xenium Ranger Errors — locked pipestance / _lock handling [turn5view0]

Shell block: remove partial normalised_results directory before xeniumranger call.

If a previous run failed and left a partial normalised_results directory, xeniumranger will attempt to resume from that incomplete pipestance rather than overwrite it. Add rm -rf {params.work_dir}/normalised_results before the xeniumranger call to ensure a fresh run, which is the documented approach for handling pre-existing output directories.

🤖 Prompt for AI Agents

In `@workflow/rules/_segmentation/grid.smk` around lines 59 - 69, Ensure the shell block removes any existing partial normalised_results directory before calling xeniumranger: add a step to delete {params.work_dir}/normalised_results (e.g. rm -rf {params.work_dir}/normalised_results) immediately before invoking xeniumranger import-segmentation so the tool cannot resume an incomplete pipestance; keep the rest of the shell block and parameters (params.abs_bundle, params.abs_cells, params.other_options, threads, params.localmem, params.abs_log) unchanged.

coderabbitai · 2026-02-06T15:30:45Z

+map_cells_to_spots <- function(
+    visium_grid_sf,
+    cells_df
+){
+
+  cells_sf <- st_as_sf(cells_df, coords = c("x", "y"), crs = st_crs(visium_grid_sf$spots))
+
+  cell_to_spot <- st_join(
+    cells_sf,
+    visium_grid_sf$spots %>% select(spot_id),
+    join = st_within,
+    left = TRUE
+  )
+
+  return(cell_to_spot)
+}


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

rg -n 'map_cells_to_spots' --type r

Repository: bdsc-tds/xenium_analysis_pipeline

Length of output: 161

Remove unused function map_cells_to_spots or add calls to it.

The prior issues (missing return statement and dplyr pipe operator dependency) are fixed. However, verification confirms the function is never called anywhere in the codebase—it is dead code. Either remove the function if it's no longer needed, or add the necessary calls to integrate it into the grid scripts.

🤖 Prompt for AI Agents

In `@workflow/scripts/_segmentation/_grid/grid_utils.R` around lines 58 - 73, The function map_cells_to_spots is defined but never used; either delete it or wire it into the pipeline where cell coordinates are assigned to Visium spots: if removing, remove the entire map_cells_to_spots definition and any related unused imports; if keeping, call map_cells_to_spots(visium_grid_sf, cells_df) from the grid processing flow that constructs visium_grid_sf and cells_df (e.g., immediately after cells_df and visium_grid_sf are created/loaded), return or store its result (cell_to_spot) for downstream steps, and ensure any callers import or reference the function name correctly so it is exercised by tests.

coderabbitai · 2026-02-06T15:30:45Z

+visiumHD_grid <- make_visiumhd_grid_sf(
+    bbox,
+    bin_size_um = snakemake@params[["bin_size"]],
+    crs = NA
+)


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

find . -type f $ -name "*.smk" -o -name "*.py" $ | head -20

Repository: bdsc-tds/xenium_analysis_pipeline

Length of output: 1355

🏁 Script executed:

rg -n 'bin_size' -t py -t txt --glob='*.smk' --glob='*.py' | head -50

Repository: bdsc-tds/xenium_analysis_pipeline

Length of output: 256

🏁 Script executed:

cat -n workflow/scripts/_segmentation/_grid/square_grid.R 2>/dev/null || find . -name "square_grid.R" -exec cat -n {} \;

Repository: bdsc-tds/xenium_analysis_pipeline

Length of output: 1653

🏁 Script executed:

cat -n workflow/rules/_segmentation/_grid/square_grid.smk

Repository: bdsc-tds/xenium_analysis_pipeline

Length of output: 1523

🏁 Script executed:

cat -n workflow/scripts/_segmentation/_grid/grid_utils.R

Repository: bdsc-tds/xenium_analysis_pipeline

Length of output: 10859

🏁 Script executed:

rg -n "get_dict_value" -A 20 -B 5 | head -100

Repository: bdsc-tds/xenium_analysis_pipeline

Length of output: 7331

🏁 Script executed:

find . -name "*.yaml" -o -name "*.yml" | xargs grep -l "bin_size" 2>/dev/null | head -10

Repository: bdsc-tds/xenium_analysis_pipeline

Length of output: 59

🏁 Script executed:

find . -name "*.yaml" -o -name "*.yml" | xargs grep -B 5 -A 5 "bin_size" 2>/dev/null | head -60

Repository: bdsc-tds/xenium_analysis_pipeline

Length of output: 59

🏁 Script executed:

find . -type f $ -name "config*" -o -name "*config*" $ | head -30

Repository: bdsc-tds/xenium_analysis_pipeline

Length of output: 262

🏁 Script executed:

rg -n "bin_size" --type yaml --type json -B 2 -A 2 | head -80

Repository: bdsc-tds/xenium_analysis_pipeline

Length of output: 59

🏁 Script executed:

rg -n "snakemake@params" workflow/scripts/ | head -20

Repository: bdsc-tds/xenium_analysis_pipeline

Length of output: 2503

🏁 Script executed:

cat -n ./config/config.yml | head -100

Repository: bdsc-tds/xenium_analysis_pipeline

Length of output: 2148

🏁 Script executed:

rg -n "convert2numeric" -B 5 -A 15 | head -60

Repository: bdsc-tds/xenium_analysis_pipeline

Length of output: 3824

Use as.numeric() to coerce bin_size parameter from Snakemake.

Snakemake passes YAML numeric values as strings to R scripts. The make_visiumhd_grid_sf() function requires bin_size_um to be numeric (line 192 of grid_utils.R validates with stopifnot(is.numeric(...))), so the parameter must be explicitly coerced. This follows the established pattern in the codebase where other numeric parameters (min_counts, min_features, max_counts, etc. in standard_qc.R) are converted using as.numeric().

Proposed fix

visiumHD_grid <- make_visiumhd_grid_sf( bbox, - bin_size_um = snakemake@params[["bin_size"]], + bin_size_um = as.numeric(snakemake@params[["bin_size"]]), crs = NA )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

visiumHD_grid <- make_visiumhd_grid_sf(

bbox,

bin_size_um = snakemake@params[["bin_size"]],

crs = NA

)

visiumHD_grid <- make_visiumhd_grid_sf(

bbox,

bin_size_um = as.numeric(snakemake@params[["bin_size"]]),

crs = NA

)

🤖 Prompt for AI Agents

In `@workflow/scripts/_segmentation/_grid/square_grid.R` around lines 34 - 38, The call that creates visiumHD_grid passes snakemake@params[["bin_size"]] as a string but make_visiumhd_grid_sf requires a numeric bin_size_um; wrap the Snakemake parameter in as.numeric() when calling make_visiumhd_grid_sf (i.e., set bin_size_um = as.numeric(snakemake@params[["bin_size"]])) following the same pattern used for numeric params in standard_qc.R so the is.numeric() check in grid_utils.R passes.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (4)

workflow/rules/_segmentation/grid.smk (2)

59-69: ⚠️ Potential issue | 🟠 Major

Ensure a clean pipestance before running xeniumranger import-segmentation.

If normalised_results already exists from a partial/failed run, reusing --id=normalised_results can lead to resume behavior instead of a fresh import.

Does `xeniumranger import-segmentation` support an overwrite/force flag, and what is the recommended rerun procedure when the `--id` output directory already exists?

🔁 Suggested fix

     shell:
         r"""
         mkdir -p {params.work_dir}/logs
         cd {params.work_dir}
+        rm -rf {params.work_dir}/normalised_results
         xeniumranger import-segmentation --id=normalised_results \
           --xenium-bundle={params.abs_bundle} \
           --cells={params.abs_cells} \

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@workflow/rules/_segmentation/grid.smk` around lines 59 - 69, The step
invoking xeniumranger import-segmentation with --id=normalised_results can
resume instead of starting fresh if that id directory already exists; update the
shell block around xeniumranger import-segmentation to ensure a clean pipestance
by either adding xeniumranger's documented overwrite/force flag (e.g.,
--overwrite or --force) to the import-segmentation command if supported, or by
deleting/renaming the existing {params.work_dir}/normalised_results (or the path
referenced by --id) before running; reference the invocation line "xeniumranger
import-segmentation --id=normalised_results" and constants {params.work_dir},
{params.abs_log}, {params.abs_bundle}, {params.abs_cells} when making the
change.

1-10: ⚠️ Potential issue | 🟡 Minor

Header comments still have stale naming/typos.

Line 5 has “adjucent”, and Line 9 still references grid_binned_8um, which does not match the square-grid naming used elsewhere.

📝 Suggested fix

-# - grid_registered_visium_55um - for the registered spots (visium registered to xenium adjucent slice)
+# - grid_registered_visium_55um - for the registered spots (visium registered to xenium adjacent slice)
@@
-# the grids like grid_visium_55um and grid_binned_8um are generated with the `_segmentation/_grid/` rules, 
+# the grids like grid_visium_55um and grid_square_8um are generated with the `_segmentation/_grid/` rules,

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@workflow/rules/_segmentation/grid.smk` around lines 1 - 10, Header comments
in workflow/rules/_segmentation/grid.smk contain stale naming and a typo: change
“adjucent” to “adjacent” and replace the incorrect reference to
`grid_binned_8um` with the correct `grid_square_8um` (and/or remove any outdated
grid name) so the comment list of expected grids matches the actual names
(`grid_visium_55um`, `grid_registered_visium_55um`, `grid_square_8um`,
`grid_registered_square_8um`) and the explanatory sentence about where grids are
generated/stored is consistent.

workflow/rules/_segmentation/_grid/visium_grid.smk (2)

59-61: ⚠️ Potential issue | 🔴 Critical

resources: block is syntactically invalid as written.

Line 59 declares resources: with no assignments, and Line 60 starts container: at the same level. This can break Snakemake parsing.

#!/bin/bash
python - <<'PY'
from pathlib import Path

p = Path("workflow/rules/_segmentation/_grid/visium_grid.smk")
lines = p.read_text().splitlines()

for i, line in enumerate(lines, 1):
    if line.strip() == "resources:":
        # find next non-empty line
        j = i
        while j < len(lines) and lines[j].strip() == "":
            j += 1
        nxt = lines[j] if j < len(lines) else ""
        print(f"resources line: {i}")
        print(f"next non-empty line: {j+1 if j < len(lines) else 'EOF'} -> {nxt!r}")
PY

🔧 Suggested fix

-    resources: # not demanding reads cell_boundaries.geojson from the xenium bundles and create another one. 
     container:
         config["containers"]["r"]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@workflow/rules/_segmentation/_grid/visium_grid.smk` around lines 59 - 61, The
Snakemake rule has an empty/incomplete resources declaration ("resources:")
which is syntactically invalid and should either be removed or populated; update
the visium_grid.smk rule by either deleting the lone "resources:" line or
replacing it with a valid resources mapping (e.g., resources: { ... }) and
ensure the following "container: config[\"containers\"][\"r\"]" remains at the
same indentation level as other rule directives so the parser sees a valid
directive sequence; locate the orphan "resources:" token in the file and fix it
accordingly.

6-11: ⚠️ Potential issue | 🟡 Minor

Docstring path is out of sync with the actual return value.

Line 9 documents .../cells.geojson, but Line 11 returns .../processed_results/cell_boundaries.geojson.

📝 Suggested fix

-      {output_path}/segmentation/{compact_segmentation_id}/{sample_id}/cells.geojson
+      {output_path}/segmentation/{compact_segmentation_id}/{sample_id}/processed_results/cell_boundaries.geojson

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@workflow/rules/_segmentation/_grid/visium_grid.smk` around lines 6 - 11, The
docstring describing the expected GeoJSON path is out of sync with the value
assigned to p: update the docstring to match the actual return/value used by the
rule (the variable p) or vice-versa; specifically make the comment reflect
"{output_path}/segmentation/{compact_segmentation_id}/{sample_id}/processed_results/cell_boundaries.geojson"
to match the p = f'.../processed_results/cell_boundaries.geojson' assignment (or
change the p assignment to match the documented
"{output_path}/segmentation/.../cells.geojson" if that was the intended
location).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@workflow/rules/_segmentation/_grid/visium_grid.smk`:
- Around line 57-58: The wildcard constraint for compact_segmentation_id
currently uses \w+ which allows letters and underscores; update the pattern in
wildcard_constraints for compact_segmentation_id to require only digits for the
diameter by replacing \w+ with \d+ so it matches grid_visium_{numericDiameter}um
(e.g., grid_visium_55um) and rejects non-numeric values.

---

Duplicate comments:
In `@workflow/rules/_segmentation/_grid/visium_grid.smk`:
- Around line 59-61: The Snakemake rule has an empty/incomplete resources
declaration ("resources:") which is syntactically invalid and should either be
removed or populated; update the visium_grid.smk rule by either deleting the
lone "resources:" line or replacing it with a valid resources mapping (e.g.,
resources: { ... }) and ensure the following "container:
config[\"containers\"][\"r\"]" remains at the same indentation level as other
rule directives so the parser sees a valid directive sequence; locate the orphan
"resources:" token in the file and fix it accordingly.
- Around line 6-11: The docstring describing the expected GeoJSON path is out of
sync with the value assigned to p: update the docstring to match the actual
return/value used by the rule (the variable p) or vice-versa; specifically make
the comment reflect
"{output_path}/segmentation/{compact_segmentation_id}/{sample_id}/processed_results/cell_boundaries.geojson"
to match the p = f'.../processed_results/cell_boundaries.geojson' assignment (or
change the p assignment to match the documented
"{output_path}/segmentation/.../cells.geojson" if that was the intended
location).

In `@workflow/rules/_segmentation/grid.smk`:
- Around line 59-69: The step invoking xeniumranger import-segmentation with
--id=normalised_results can resume instead of starting fresh if that id
directory already exists; update the shell block around xeniumranger
import-segmentation to ensure a clean pipestance by either adding xeniumranger's
documented overwrite/force flag (e.g., --overwrite or --force) to the
import-segmentation command if supported, or by deleting/renaming the existing
{params.work_dir}/normalised_results (or the path referenced by --id) before
running; reference the invocation line "xeniumranger import-segmentation
--id=normalised_results" and constants {params.work_dir}, {params.abs_log},
{params.abs_bundle}, {params.abs_cells} when making the change.
- Around line 1-10: Header comments in workflow/rules/_segmentation/grid.smk
contain stale naming and a typo: change “adjucent” to “adjacent” and replace the
incorrect reference to `grid_binned_8um` with the correct `grid_square_8um`
(and/or remove any outdated grid name) so the comment list of expected grids
matches the actual names (`grid_visium_55um`, `grid_registered_visium_55um`,
`grid_square_8um`, `grid_registered_square_8um`) and the explanatory sentence
about where grids are generated/stored is consistent.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b9dc174 and babc2e1.

📒 Files selected for processing (9)

workflow/rules/_segmentation/_grid/registered_grid.smk
workflow/rules/_segmentation/_grid/square_grid.smk
workflow/rules/_segmentation/_grid/visium_grid.smk
workflow/rules/_segmentation/grid.smk
workflow/rules/coexpression.smk
workflow/rules/segmentation.smk
workflow/scripts/_segmentation/_grid/grid_utils.R
workflow/scripts/_segmentation/_grid/square_grid.R
workflow/scripts/_segmentation/_grid/visium_grid.R

🚧 Files skipped from review as they are similar to previous changes (5)

workflow/rules/_segmentation/_grid/registered_grid.smk
workflow/scripts/_segmentation/_grid/grid_utils.R
workflow/rules/coexpression.smk
workflow/rules/_segmentation/_grid/square_grid.smk
workflow/scripts/_segmentation/_grid/square_grid.R

coderabbitai · 2026-03-02T12:41:59Z

+    wildcard_constraints:
+        compact_segmentation_id=r"grid_visium_\w+um" # should be grid_visium_55um


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, locate and examine the file fd -t f "visium_grid.smk" --exec wc -l {} \;

Repository: bdsc-tds/xenium_analysis_pipeline

Length of output: 132

🏁 Script executed:

# Read the specific file around the lines mentioned cat -n workflow/rules/_segmentation/_grid/visium_grid.smk | sed -n '50,65p'

Repository: bdsc-tds/xenium_analysis_pipeline

Length of output: 738

🏁 Script executed:

# Search for usages of compact_segmentation_id to understand the impact rg "compact_segmentation_id" -B 3 -A 3 workflow/

Repository: bdsc-tds/xenium_analysis_pipeline

Length of output: 15900

🏁 Script executed:

# Verify the regex pattern behavior as demonstrated in the review comment python3 - <<'PY' import re # Current pattern current_pat = re.compile(r"^grid_visium_\w+um$") # Suggested pattern suggested_pat = re.compile(r"^grid_visium_\d+um$") test_cases = [ "grid_visium_55um", # expected valid "grid_visium_abcum", # should be invalid "grid_visium_55_extraum", # should be invalid "grid_visium_10um", # expected valid "grid_visium_Xum", # should be invalid (letter) ] print("Test case analysis:") print("-" * 70) for test in test_cases: current_match = bool(current_pat.match(test)) suggested_match = bool(suggested_pat.match(test)) print(f"{test:<30} | current: {str(current_match):<5} | suggested: {str(suggested_match):<5}") PY

Repository: bdsc-tds/xenium_analysis_pipeline

Length of output: 502

Change wildcard constraint to accept numeric diameters only.

The current pattern \w+ allows non-numeric characters (e.g., grid_visium_abcum, grid_visium_55_extraum), but the intended format is grid_visium_{diameter}um where diameter is numeric. Use \d+ instead.

Suggested fix

- compact_segmentation_id=r"grid_visium_\w+um" # should be grid_visium_55um + compact_segmentation_id=r"grid_visium_\d+um" # e.g., grid_visium_55um

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

wildcard_constraints:

compact_segmentation_id=r"grid_visium_\w+um" # should be grid_visium_55um

wildcard_constraints:

compact_segmentation_id=r"grid_visium_\d+um" # e.g., grid_visium_55um

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@workflow/rules/_segmentation/_grid/visium_grid.smk` around lines 57 - 58, The wildcard constraint for compact_segmentation_id currently uses \w+ which allows letters and underscores; update the pattern in wildcard_constraints for compact_segmentation_id to require only digits for the diameter by replacing \w+ with \d+ so it matches grid_visium_{numericDiameter}um (e.g., grid_visium_55um) and rejects non-numeric values.

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (3)

workflow/scripts/_segmentation/_grid/visium_grid.R (1)

14-15: ⚠️ Potential issue | 🟡 Minor

Add the missing space in the error message.

Line 15 has the same stop() concatenation issue as the square-grid script, so the emitted path and "does not exist!" still run together.

Proposed fix

-if(!file.exists(cellb_path)){
-  stop("Xenium bundle ", cellb_path, "does not exist! \n")
+if(!file.exists(cellb_path)){
+  stop("Xenium bundle ", cellb_path, " does not exist!\n")
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@workflow/scripts/_segmentation/_grid/visium_grid.R` around lines 14 - 15, The
stop() call in visium_grid.R concatenates the path and message without a space
using stop("Xenium bundle ", cellb_path, "does not exist! \n"); update the error
string so the emitted message separates the path and text (e.g., insert a
leading space in " does not exist!" or use paste0/paste) while keeping the same
stop() invocation and referencing the cellb_path variable to produce "Xenium
bundle <path> does not exist!"; ensure you modify the stop() expression in the
grid script accordingly.

workflow/scripts/_segmentation/_grid/square_grid.R (1)

14-15: ⚠️ Potential issue | 🟡 Minor

Fix the missing separator in the stop() message.

stop() pastes its arguments without spaces, so Line 15 still renders the path and "does not exist!" as one token.

Proposed fix

-if(!file.exists(cellb_path)){
-  stop("Xenium bundle ", cellb_path, "does not exist! \n")
+if(!file.exists(cellb_path)){
+  stop("Xenium bundle ", cellb_path, " does not exist!\n")
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@workflow/scripts/_segmentation/_grid/square_grid.R` around lines 14 - 15, The
stop() call inside the if(!file.exists(cellb_path)) block concatenates arguments
without spaces; update the stop invocation (the stop(...) call referencing
cellb_path) to include proper spacing or use a formatting helper (e.g.,
paste()/sprintf()) so the path and the "does not exist!" message are separated
(for example, add a space before "does not exist!" or replace with paste("Xenium
bundle", cellb_path, "does not exist!\n") or sprintf("Xenium bundle %s does not
exist!\n", cellb_path)).

workflow/rules/_segmentation/_grid/visium_grid.smk (1)

57-58: ⚠️ Potential issue | 🟡 Minor

Tighten the wildcard constraint to numeric diameters only.

Line 58 currently accepts values like grid_visium_abcum, which do not match the grid_visium_{diameter}um contract.

Proposed fix

-        compact_segmentation_id=r"grid_visium_\w+um" # should be grid_visium_55um
+        compact_segmentation_id=r"grid_visium_\d+um" # e.g. grid_visium_55um

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@workflow/rules/_segmentation/_grid/visium_grid.smk` around lines 57 - 58, The
current wildcard_constraints entry for compact_segmentation_id allows
non-numeric diameters; update the regex under wildcard_constraints
(compact_segmentation_id) to only match numeric diameters, e.g. use
r"grid_visium_\d+um" (or r"grid_visium_55um" if you intend to restrict to only
55um) so values like grid_visium_abcum are rejected.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@workflow/rules/_segmentation/_grid/registered_grid.smk`:
- Around line 23-24: The current wildcard_constraints entry
compact_segmentation_id uses a too-broad regex (grid_registered_\w+um); update
compact_segmentation_id to enforce the documented format
grid_registered_{technology}_{size}um by using a regex that requires a
technology name, an underscore, then numeric size plus "um" (for example change
to r"grid_registered_[A-Za-z]+_\d+um"); ensure the pattern is assigned to
wildcard_constraints under compact_segmentation_id and retains the raw-string
notation.

In `@workflow/rules/_segmentation/_grid/visium_grid.smk`:
- Around line 15-19: The function get_input2_or_params4generateVisiumGrid
returns the raw directory as-is, but visium_grid.R expects input.xenium_bundle
to point to the directory containing cell_boundaries.parquet; update
get_input2_or_params4generateVisiumGrid to "normalize" ret before returning
(when for_input is True) by resolving cases where the sample layout nests data
under an "outs" directory or returns a path to a file rather than the bundle
directory—e.g., detect if ret refers to an outs subpath or does not contain
cell_boundaries.parquet and adjust to the parent directory that does; apply the
same normalization logic in the analogous function/branch around lines 42-44 so
both return values consistently point to the directory with
cell_boundaries.parquet for visium_grid.R to consume (referencing
get_input2_or_params4generateVisiumGrid and input.xenium_bundle).

---

Duplicate comments:
In `@workflow/rules/_segmentation/_grid/visium_grid.smk`:
- Around line 57-58: The current wildcard_constraints entry for
compact_segmentation_id allows non-numeric diameters; update the regex under
wildcard_constraints (compact_segmentation_id) to only match numeric diameters,
e.g. use r"grid_visium_\d+um" (or r"grid_visium_55um" if you intend to restrict
to only 55um) so values like grid_visium_abcum are rejected.

In `@workflow/scripts/_segmentation/_grid/square_grid.R`:
- Around line 14-15: The stop() call inside the if(!file.exists(cellb_path))
block concatenates arguments without spaces; update the stop invocation (the
stop(...) call referencing cellb_path) to include proper spacing or use a
formatting helper (e.g., paste()/sprintf()) so the path and the "does not
exist!" message are separated (for example, add a space before "does not exist!"
or replace with paste("Xenium bundle", cellb_path, "does not exist!\n") or
sprintf("Xenium bundle %s does not exist!\n", cellb_path)).

In `@workflow/scripts/_segmentation/_grid/visium_grid.R`:
- Around line 14-15: The stop() call in visium_grid.R concatenates the path and
message without a space using stop("Xenium bundle ", cellb_path, "does not
exist! \n"); update the error string so the emitted message separates the path
and text (e.g., insert a leading space in " does not exist!" or use
paste0/paste) while keeping the same stop() invocation and referencing the
cellb_path variable to produce "Xenium bundle <path> does not exist!"; ensure
you modify the stop() expression in the grid script accordingly.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d8af48d6-64ab-4ec3-acf4-9b8f4d0a1e9c

📥 Commits

Reviewing files that changed from the base of the PR and between babc2e1 and c411a2a.

📒 Files selected for processing (9)

workflow/rules/_segmentation/_grid/registered_grid.smk
workflow/rules/_segmentation/_grid/square_grid.smk
workflow/rules/_segmentation/_grid/visium_grid.smk
workflow/rules/_segmentation/grid.smk
workflow/rules/coexpression.smk
workflow/rules/segmentation.smk
workflow/scripts/_segmentation/_grid/grid_utils.R
workflow/scripts/_segmentation/_grid/square_grid.R
workflow/scripts/_segmentation/_grid/visium_grid.R

🚧 Files skipped from review as they are similar to previous changes (3)

workflow/rules/_segmentation/_grid/square_grid.smk
workflow/scripts/_segmentation/_grid/grid_utils.R
workflow/rules/coexpression.smk

coderabbitai · 2026-03-09T11:17:52Z

+    wildcard_constraints:
+        compact_segmentation_id=r"grid_registered_\w+um" # should be grid_registered_visium_55um


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

git ls-files | grep -E "(registered_grid|grid)" | head -20

Repository: bdsc-tds/xenium_analysis_pipeline

Length of output: 423

🏁 Script executed:

cat -n workflow/rules/_segmentation/_grid/registered_grid.smk | head -30

Repository: bdsc-tds/xenium_analysis_pipeline

Length of output: 1615

🏁 Script executed:

rg "grid_registered" --no-heading -t py -t smk --max-count 20

Repository: bdsc-tds/xenium_analysis_pipeline

Length of output: 104

🏁 Script executed:

rg "grid_registered" --no-heading -A 2 -B 2

Repository: bdsc-tds/xenium_analysis_pipeline

Length of output: 1924

🏁 Script executed:

rg "compact_segmentation_id" --no-heading -A 1 -B 1 | head -40

Repository: bdsc-tds/xenium_analysis_pipeline

Length of output: 3239

Restrict registered grid IDs to the documented format.

grid_registered_\w+um is too permissive and matches invalid IDs. The pattern should enforce the expected format grid_registered_{technology}_{size}um to prevent matches like grid_registered_123um or grid_registered_visium55um (missing underscore separator).

Proposed fix

- compact_segmentation_id=r"grid_registered_\w+um" # should be grid_registered_visium_55um + compact_segmentation_id=r"grid_registered_[a-zA-Z]+_\d+um" # e.g. grid_registered_visium_55um

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

wildcard_constraints:

compact_segmentation_id=r"grid_registered_\w+um" # should be grid_registered_visium_55um

wildcard_constraints:

compact_segmentation_id=r"grid_registered_[a-zA-Z]+_\d+um" # e.g. grid_registered_visium_55um

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@workflow/rules/_segmentation/_grid/registered_grid.smk` around lines 23 - 24, The current wildcard_constraints entry compact_segmentation_id uses a too-broad regex (grid_registered_\w+um); update compact_segmentation_id to enforce the documented format grid_registered_{technology}_{size}um by using a regex that requires a technology name, an underscore, then numeric size plus "um" (for example change to r"grid_registered_[A-Za-z]+_\d+um"); ensure the pattern is assigned to wildcard_constraints under compact_segmentation_id and retains the raw-string notation.

coderabbitai · 2026-03-09T11:17:52Z

+def get_input2_or_params4generateVisiumGrid(wildcards, for_input: bool = True) -> str:
+    use_raw_data, ret = get_raw_data_dir(wildcards.sample_id)
+
+    if for_input:
+        return ret


⚠️ Potential issue | 🟠 Major

Normalize xenium_bundle before the R script consumes it.

Lines 18-19 return the raw directory for the default branch, but workflow/scripts/_segmentation/_grid/visium_grid.R Line 10 assumes input.xenium_bundle already points at the directory that contains cell_boundaries.parquet. On samples laid out under .../outs, this rule will probe the wrong path and stop.

Proposed fix

rule generateVisiumGrid: input: - xenium_bundle=get_input2_or_params4generateVisiumGrid + xenium_bundle=lambda wc: get_input2_or_params4generateVisiumGrid(wc, for_input=False)

Also applies to: 42-44

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@workflow/rules/_segmentation/_grid/visium_grid.smk` around lines 15 - 19, The function get_input2_or_params4generateVisiumGrid returns the raw directory as-is, but visium_grid.R expects input.xenium_bundle to point to the directory containing cell_boundaries.parquet; update get_input2_or_params4generateVisiumGrid to "normalize" ret before returning (when for_input is True) by resolving cases where the sample layout nests data under an "outs" directory or returns a path to a file rather than the bundle directory—e.g., detect if ret refers to an outs subpath or does not contain cell_boundaries.parquet and adjust to the parent directory that does; apply the same normalization logic in the analogous function/branch around lines 42-44 so both return values consistently point to the directory with cell_boundaries.parquet for visium_grid.R to consume (referencing get_input2_or_params4generateVisiumGrid and input.xenium_bundle).

mariiabilous marked this pull request as ready for review January 8, 2026 11:10

coderabbitai Bot reviewed Jan 8, 2026

View reviewed changes

coderabbitai Bot reviewed Feb 6, 2026

View reviewed changes

senbaikang force-pushed the import-segmentation branch from b9dc174 to babc2e1 Compare March 2, 2026 12:36

coderabbitai Bot reviewed Mar 2, 2026

View reviewed changes

mariiabilous added 11 commits March 9, 2026 12:07

feat: generate grid and import grid as segmemtation

81ab127

chore: move custom segmentation files to segmentation

602186a

chore: include rules

1a73aa7

fix: input path

7019952

fix: use microns for visium grid in xenium ranger

6c1c400

fix: redirect log

07f9fa3

fix: give more memory for coexpression

af48b47

feat: implement square grid (pseudo-visiumHD)

984572c

feat: import registered grid

2837bea

fix: abs_log path

a85c62c

fix: coderabbitai review

c411a2a

senbaikang force-pushed the import-segmentation branch from babc2e1 to c411a2a Compare March 9, 2026 11:07

coderabbitai Bot reviewed Mar 9, 2026

View reviewed changes

		resources: # not demanding reads cell_boundaries.geojson from the xenium bundles and create another one.
		container:

		wildcard_constraints:
		compact_segmentation_id=r"grid_visium_\w+um" # should be grid_visium_55um

		wildcard_constraints:
		compact_segmentation_id=r"grid_registered_\w+um" # should be grid_registered_visium_55um

Conversation

mariiabilous commented Jan 8, 2026

Uh oh!

coderabbitai Bot commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

mariiabilous commented Jan 8, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Jan 8, 2026 •

edited

Loading