Skip to content

Conversation

@chrishalcrow
Copy link
Member

@chrishalcrow chrishalcrow commented Jul 7, 2025

Allows create_sorting_analyzer to accept a dict of recordings and a dict of sortings. Now the following code...

raw_rec = si.read_openephys("recording_path") 
grouped_rec = raw_rec.split_by("group")
pp_rec = si.bandpass_filter(grouped_rec)
sort = si.run_sorter("mountainsort5", pp_rec)
analyzer = si.create_sorting_analyzer(sort, pp_rec)

...works for both grouped and un-grouped recordings! So the user doesn't have to include any logic about grouping in their pipeline for e.g. Neuropixel 2.0 recordings.

For now, I've implemented the simple solution: internally the create_sorting_analyzer function aggregates the recordings and sortings and combines them into a single sorting analyzer. The actual code inside create_sorting_analyzer is super simple! Also updated the units_aggregation function to accept dicts.

(We've also discussed allowing it to output a dictionary of analyzers. This is slightly more painful: the user will have to write awkward code when they want to compute, or export. We'll need to write and test new save and load options etc. I think we should add this in the future if there's a need for it.)

Internally, we need to keep track of the splitting and aggregation (e.g. which units in the analyzer belonged to which sorting?). I've put that logic in the aggregate_channels and aggregate_units functions, which attach new properties to the channels and units depending on the underlying group called "aggregate_channels_key" and "aggregate_units_key". Happy to hear other name ideas! So you can get the grouped analyzer by doing something like:

  group_0_keys = analyzer.unit_ids[analyzer.get_sorting_property("aggregate_units_key") == 0]
  group_0_analzyer = analyzer.select_units(unit_ids = group_0_keys)

Docs: https://spikeinterface--4037.org.readthedocs.build/en/4037/how_to/process_by_channel_group.html#
Plus a comment here: https://spikeinterface--4037.org.readthedocs.build/en/4037/modules/sorters.html#spike-sorting-by-group

@chrishalcrow chrishalcrow marked this pull request as ready for review July 8, 2025 12:29
@chrishalcrow chrishalcrow requested a review from samuelgarcia July 8, 2025 12:29
@chrishalcrow chrishalcrow added the enhancement New feature or request label Jul 8, 2025
@chrishalcrow chrishalcrow added this to the 0.103.0 milestone Jul 8, 2025
@samuelgarcia
Copy link
Member

Merci beaucoup.
I will read this in details.

Copy link
Member

@alejoe91 alejoe91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few small suggestions :)

@samuelgarcia
Copy link
Member

This is ultra smart and usefull.
The only improvement I would see is the handling of sparsity that would be group dependant.
For instance if you play with tetrodes you could expect the sparsity to be by group which is super fast to compute (instead of taking many random waveforms).
This could be done line 145 in sorting_analyzer.py with maybe contorl by an option. Because for big group (shank of neuropixel) I think the sparsity from waveforms is still wanted no ?

Copy link
Member

@zm711 zm711 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two small points-- 1 a rendering issue 2 a discussion of if we should explain the dict layout better for beginners trying to use this strategy (at the level of inputs to the analyzer creation).


The code above generates a dictionary of recording objects and a dictionary of sorting objects.
When making a :ref:`SortingAnalyzer <modules/core:SortingAnalyzer>`, we can pass these dictionaries and
a single analyzer will be created, with the recordings and sortings appropriately aggregated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How? Do keys of the dict need to match up? Will the user know how the sorting dict looks vs the recording dict?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could check that keys are the same in the same order no ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we either do that for the users internally or need to explain it here. So I think doing it ourselves is fine. Chris is doing a dict key comparison below but I haven't tested it to see if it checks the order or just the presence of the same keys.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to hopefully make things clearer.

The order of keys doesn't matter: when you aggregate, the link between the recording channels and unit ids is independent of this dict stuff I'm adding. Added a test to check this is true.

@chrishalcrow
Copy link
Member Author

The only improvement I would see is the handling of sparsity that would be group dependant.
For instance if you play with tetrodes you could expect the sparsity to be by group which is super fast to compute (instead of taking many random waveforms).
This could be done line 145 in sorting_analyzer.py with maybe contorl by an option. Because for big group (shank of neuropixel) I think the sparsity from waveforms is still wanted no ?

Hello. Hmm, I'm not sure how to handle this. Like you say: you defo want this for tetrodes. You probably don't want this for NP since you want an extra sparsity calc on each shank. You maybe want this for 32 channel silicon probes?? Ideally, you would compute sparsity per group, but this seems fairly complicated code-wise. Maybe we can add a note to the tetrode docs, and these docs, that if you're using tetrodes you can pass the sparsity explicitly? Maybe easiest to discuss "in person".

You can control `estimate_sparsity()` : all extra arguments are propagated to it (included job_kwargs)
sparsity : ChannelSparsity or None, default: None
The sparsity used to compute exensions. If this is given, `sparse` is ignored.
set_sparsity_by_dict_key : bool, default: False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I support this.
What about setting it to True ?

Copy link
Member Author

@chrishalcrow chrishalcrow Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's False by default, some tetrode people will have un-sparsified tetrode bundles (not great, but ok)
If True by default, some silicon people will have badly-sparsified probes (worse!)

So I vote False!

@chrishalcrow
Copy link
Member Author

Updated docstring of set_sparsity_by_dict_key - now ready!

@alejoe91 alejoe91 added the core Changes to core module label Jul 28, 2025
@alejoe91 alejoe91 merged commit f2d6373 into SpikeInterface:main Jul 28, 2025
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Changes to core module enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants