Skip to content

Commit e95622f

Browse files
authored
Merge pull request #4072 from huizizhang949/add-noisecutoff2
Add IBL's "noise cutoff" quality metric (again)
2 parents 705d194 + 1a4a1dd commit e95622f

File tree

6 files changed

+290
-8
lines changed

6 files changed

+290
-8
lines changed
27.3 KB
Loading
Lines changed: 106 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,30 +1,128 @@
1-
Noise cutoff (not currently implemented)
2-
========================================
1+
Noise cutoff (:code:`noise_cutoff`)
2+
===================================
33

44
Calculation
55
-----------
66

77

8-
Metric describing whether an amplitude distribution is cut off, similar to _amp_cutoff :ref:`amplitude cutoff <amp_cutoff>` but without a Gaussian assumption.
9-
A histogram of amplitudes is created and quantifies the distance between the low tail, mean number of spikes and high tail in terms of standard deviations.
8+
Metric describing whether an amplitude distribution is cut off as it approaches zero, similar to :ref:`amplitude cutoff <amp_cutoff>` but without a Gaussian assumption.
109

11-
A SpikeInterface implementation is not yet available.
10+
The **noise cutoff** metric assesses whether a unit’s spike‐amplitude distribution is truncated
11+
at the low-end, which may be due to the high amplitude detection threshold in the deconvolution step,
12+
i.e., if low‐amplitude spikes were missed. It does not assume a Gaussian shape;
13+
instead, it directly compares counts in the low‐amplitude bins to counts in high‐amplitude bins.
14+
15+
1. **Build a histogram**
16+
17+
For each unit, divide all amplitudes into ``n_bins`` equally spaced bins over the range of the amplitude.
18+
If the number of spikes is large, you may consider using a larger ``n_bins``. For a small number of spikes, consider a smaller ``n_bins``.
19+
Let :math:`n_i` denote the count in the :math:`i`-th bin.
20+
21+
2. **Identify the “low” region**
22+
- Compute the amplitude value at the specified ``low_quantile`` (for example, 0.10 = 10th percentile), denoted as :math:`\text{amp}_{low}`.
23+
- Find all histogram bins whose upper edge is below that quantile value. These bins form the "low‐quantile region".
24+
- Compute
25+
26+
.. math::
27+
L_{\mathrm{bins}} = \bigl\{i : \text{upper_edge}_i \le \text{amp}_{low}\bigr\}, \quad
28+
\mu_{\mathrm{low}} = \frac{1}{|L_{\mathrm{bins}}|}\sum_{i\in L_{\mathrm{bins}}} n_i.
29+
30+
3. **Identify the “high” region**
31+
32+
- Compute the amplitude value at the specified ``high_quantile`` (for example, 0.25 = top 25th percentile), denoted as :math:`\text{amp}_{high}`.
33+
- Find all histogram bins whose lower edge is greater than that quantile value. These bins form the "high‐quantile region".
34+
- Compute
35+
36+
.. math::
37+
H_{\mathrm{bins}} &= \bigl\{i : \text{lower_edge}_i \ge \text{amp}_{high}\bigr\}, \\
38+
\mu_{\mathrm{high}} &= \frac{1}{|H_{\mathrm{bins}}|}\sum_{i\in H_{\mathrm{bins}}} n_i, \quad
39+
\sigma_{\mathrm{high}} = \sqrt{\frac{1}{|H_{\mathrm{bins}}|-1}\sum_{i\in H_{\mathrm{bins}}}\bigl(n_i-\mu_{\mathrm{high}} \bigr)^2}.
40+
41+
4. **Compute cutoff**
42+
43+
The *cutoff* is given by how many standard deviations away the low-amplitude bins are from the high-amplitude bins, defined as
44+
45+
.. math::
46+
\mathrm{cutoff} = \frac{\mu_{\mathrm{low}} - \mu_{\mathrm{high}}}{\sigma_{\mathrm{high}}}.
47+
48+
49+
- If no low‐quantile bins exist, a warning is issued and ``cutoff = NaN``.
50+
- If no high‐quantile bins exist or :math:`\sigma_{\mathrm{high}} = 0`, a warning is issued and ``cutoff = NaN``.
51+
52+
5. **Compute the low-to-peak ratio**
53+
54+
- Let :math:`M = \max_i\,n_i` be the height of the largest bin in the histogram.
55+
- Define
56+
57+
.. math::
58+
\mathrm{ratio} = \frac{\mu_{\mathrm{low}}}{M}.
59+
60+
61+
- If there are no low bins, :math:`\mathrm{ratio} = NaN`.
62+
63+
64+
Together, ``(cutoff, ratio)`` quantify how suppressed the low‐end of the amplitude distribution is relative to the top quantile and to the peak.
1265

1366
Expectation and use
1467
-------------------
1568

1669
Noise cutoff attempts to describe whether an amplitude distribution is cut off.
17-
The metric is loosely based on [Hill]_'s amplitude cutoff, but is here adapted (originally by [IBL]_) to avoid making the Gaussianity assumption on spike distributions.
18-
Noise cutoff provides an estimate of false negative rate, so a lower value indicates fewer missed spikes (a more complete unit).
70+
Larger values of ``cutoff`` and ``ratio`` suggest that the distribution is cut-off.
71+
IBL uses the default value of 1 (equivalent to e.g. ``low_quantile=0.01, n_bins=100``) to choose the number of
72+
lower bins, with a suggested threshold of 5 for ``cutoff`` to determine whether a unit is cut off or not.
73+
In practice, the IBL threshold is quite conservative, and a lower threshold might work better for your data.
74+
We suggest plotting the data using the :py:func:`~spikeinterface.widgets.plot_amplitudes` widget to view your data when choosing your threshold.
75+
It is suggested to use this metric when the amplitude histogram is **unimodal**.
76+
77+
The metric is loosely based on [Hill]_'s amplitude cutoff, but is here adapted (originally by [IBL2024]_) to avoid making the Gaussian assumption on spike distributions.
78+
79+
Example code
80+
------------
81+
82+
.. code-block:: python
83+
84+
import numpy as np
85+
import matplotlib.pyplot as plt
86+
from spikeinterface.full as si
87+
88+
# Suppose `sorting_analyzer` has been computed with spike amplitudes:
89+
# Select units you are interested in visualizing
90+
unit_ids = ...
91+
92+
# Compute noise_cutoff:
93+
summary_dict = compute_noise_cutoff(
94+
sorting_analyzer=sorting_analyzer
95+
high_quantile=0.25,
96+
low_quantile=0.10,
97+
n_bins=100,
98+
unit_ids=unit_ids
99+
)
100+
101+
Reference
102+
---------
103+
104+
.. autofunction:: spikeinterface.qualitymetrics.misc_metrics.compute_noise_cutoff
105+
106+
Examples with plots
107+
-------------------
108+
109+
Here is shown the histogram of two units, with the vertical lines separating low- and high-amplitude regions.
110+
111+
- On the left, we have a unit with no truncation at the left end, and the cutoff and ratio are small.
112+
- On the right, we have a unit with truncation at -1, and the cutoff and ratio are much larger.
19113

114+
.. image:: example_cutoff.png
115+
:width: 600
20116

21117
Links to original implementations
22118
---------------------------------
23119

24120
* From `IBL implementation <https://github.com/int-brain-lab/ibllib/blob/2e1f91c622ba8dbd04fc53946c185c99451ce5d6/brainbox/metrics/single_units.py>`_
25121

122+
Note: Compared to the original implementation, we have added a comparison between the low-amplitude bins to the largest bin (``noise_ratio``).
123+
The selection of low-amplitude bins is based on the ``low_quantile`` rather than the number of bins.
26124

27125
Literature
28126
----------
29127

30-
Metric introduced by [IBL]_ (adapted from [Hill]_'s amplitude cutoff metric).
128+
Metric introduced by [IBL2024]_.

doc/references.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -123,6 +123,8 @@ References
123123
124124
.. [IBL] `Spike sorting pipeline for the International Brain Laboratory. 2022. <https://figshare.com/articles/online_resource/Spike_sorting_pipeline_for_the_International_Brain_Laboratory/19705522/3>`_
125125
126+
.. [IBL2024] `Spike sorting pipeline for the International Brain Laboratory - Version 2. 2024. <https://figshare.com/articles/online_resource/Spike_sorting_pipeline_for_the_International_Brain_Laboratory/19705522?file=49783080>`_
127+
126128
.. [Jackson] `Quantitative assessment of extracellular multichannel recording quality using measures of cluster separation. Society of Neuroscience Abstract. 2005. <https://www.sciencedirect.com/science/article/abs/pii/S0306452204008425>`_
127129
128130
.. [Jain] `UnitRefine: A Community Toolbox for Automated Spike Sorting Curation. 2025 <https://www.biorxiv.org/content/10.1101/2025.03.30.645770v1>`_

src/spikeinterface/qualitymetrics/misc_metrics.py

Lines changed: 161 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,167 @@
3737
_default_params = dict()
3838

3939

40+
def compute_noise_cutoffs(sorting_analyzer, high_quantile=0.25, low_quantile=0.1, n_bins=100, unit_ids=None):
41+
"""
42+
A metric to determine if a unit's amplitude distribution is cut off as it approaches zero, without assuming a Gaussian distribution.
43+
44+
Based on the histogram of the (transformed) amplitude:
45+
46+
1. This method compares counts in the lower-amplitude bins to counts in the top 'high_quantile' of the amplitude range.
47+
It computes the mean and std of an upper quantile of the distribution, and calculates how many standard deviations away
48+
from that mean the lower-quantile bins lie.
49+
50+
2. The method also compares the counts in the lower-amplitude bins to the count in the highest bin and return their ratio.
51+
52+
Parameters
53+
----------
54+
sorting_analyzer : SortingAnalyzer
55+
A SortingAnalyzer object.
56+
high_quantile : float, default: 0.25
57+
Quantile of the amplitude range above which values are treated as "high" (e.g. 0.25 = top 25%), the reference region.
58+
low_quantile : int, default: 0.1
59+
Quantile of the amplitude range below which values are treated as "low" (e.g. 0.1 = lower 10%), the test region.
60+
n_bins: int, default: 100
61+
The number of bins to use to compute the amplitude histogram.
62+
unit_ids : list or None
63+
List of unit ids to compute the amplitude cutoffs. If None, all units are used.
64+
65+
Returns
66+
-------
67+
noise_cutoff_dict : dict of floats
68+
Estimated metrics based on the amplitude distribution, for each unit ID.
69+
70+
References
71+
----------
72+
Inspired by metric described in [IBL2024]_
73+
74+
"""
75+
res = namedtuple("cutoff_metrics", ["noise_cutoff", "noise_ratio"])
76+
if unit_ids is None:
77+
unit_ids = sorting_analyzer.unit_ids
78+
79+
noise_cutoff_dict = {}
80+
noise_ratio_dict = {}
81+
if not sorting_analyzer.has_extension("spike_amplitudes"):
82+
warnings.warn(
83+
"`compute_noise_cutoffs` requires the 'spike_amplitudes` extension. Please run sorting_analyzer.compute('spike_amplitudes') to be able to compute `noise_cutoff`"
84+
)
85+
for unit_id in unit_ids:
86+
noise_cutoff_dict[unit_id] = np.nan
87+
noise_ratio_dict[unit_id] = np.nan
88+
return res(noise_cutoff_dict, noise_ratio_dict)
89+
90+
amplitude_extension = sorting_analyzer.get_extension("spike_amplitudes")
91+
peak_sign = amplitude_extension.params["peak_sign"]
92+
if peak_sign == "both":
93+
raise TypeError(
94+
'`peak_sign` should either be "pos" or "neg". You can set `peak_sign` as an argument when you compute spike_amplitudes.'
95+
)
96+
97+
amplitudes_by_units = _get_amplitudes_by_units(sorting_analyzer, unit_ids, peak_sign)
98+
99+
for unit_id in unit_ids:
100+
amplitudes = amplitudes_by_units[unit_id]
101+
102+
# We assume the noise (zero values) is on the lower tail of the amplitude distribution.
103+
# But if peak_sign == 'neg', the noise will be on the higher tail, so we flip the distribution.
104+
if peak_sign == "neg":
105+
amplitudes = -amplitudes
106+
107+
cutoff, ratio = _noise_cutoff(amplitudes, high_quantile=high_quantile, low_quantile=low_quantile, n_bins=n_bins)
108+
noise_cutoff_dict[unit_id] = cutoff
109+
noise_ratio_dict[unit_id] = ratio
110+
111+
return res(noise_cutoff_dict, noise_ratio_dict)
112+
113+
114+
_default_params["noise_cutoff"] = dict(high_quantile=0.25, low_quantile=0.1, n_bins=100)
115+
116+
117+
def _noise_cutoff(amps, high_quantile=0.25, low_quantile=0.1, n_bins=100):
118+
"""
119+
A metric to determine if a unit's amplitude distribution is cut off as it approaches zero, without assuming a Gaussian distribution.
120+
121+
Based on the histogram of the (transformed) amplitude:
122+
123+
1. This method compares counts in the lower-amplitude bins to counts in the higher_amplitude bins.
124+
It computes the mean and std of an upper quantile of the distribution, and calculates how many standard deviations away
125+
from that mean the lower-quantile bins lie.
126+
127+
2. The method also compares the counts in the lower-amplitude bins to the count in the highest bin and return their ratio.
128+
129+
Parameters
130+
----------
131+
amps : array-like
132+
Spike amplitudes.
133+
high_quantile : float, default: 0.25
134+
Quantile of the amplitude range above which values are treated as "high" (e.g. 0.25 = top 25%), the reference region.
135+
low_quantile : int, default: 0.1
136+
Quantile of the amplitude range below which values are treated as "low" (e.g. 0.1 = lower 10%), the test region.
137+
n_bins: int, default: 100
138+
The number of bins to use to compute the amplitude histogram.
139+
140+
Returns
141+
-------
142+
cutoff : float
143+
(mean(lower_bins_count) - mean(high_bins_count)) / std(high_bins_count)
144+
ratio: float
145+
mean(lower_bins_count) / highest_bin_count
146+
147+
"""
148+
n_per_bin, bin_edges = np.histogram(amps, bins=n_bins)
149+
150+
maximum_bin_height = np.max(n_per_bin)
151+
152+
low_quantile_value = np.quantile(amps, q=low_quantile)
153+
154+
# the indices for low-amplitude bins
155+
low_indices = np.where(bin_edges[1:] <= low_quantile_value)[0]
156+
157+
high_quantile_value = np.quantile(amps, q=1 - high_quantile)
158+
159+
# the indices for high-amplitude bins
160+
high_indices = np.where(bin_edges[:-1] >= high_quantile_value)[0]
161+
162+
if len(low_indices) == 0:
163+
warnings.warn(
164+
"No bin is selected to test cutoff. Please increase low_quantile. Setting noise cutoff and ratio to NaN"
165+
)
166+
return np.nan, np.nan
167+
168+
# compute ratio between low-amplitude bins and the largest bin
169+
low_counts = n_per_bin[low_indices]
170+
mean_low_counts = np.mean(low_counts)
171+
ratio = mean_low_counts / maximum_bin_height
172+
173+
if len(high_indices) == 0:
174+
warnings.warn(
175+
"No bin is selected as the reference region. Please increase high_quantile. Setting noise cutoff to NaN"
176+
)
177+
return np.nan, ratio
178+
179+
if len(high_indices) == 1:
180+
warnings.warn(
181+
"Only one bin is selected as the reference region, and thus the standard deviation cannot be computed. "
182+
"Please increase high_quantile. Setting noise cutoff to NaN"
183+
)
184+
return np.nan, ratio
185+
186+
# compute cutoff from low-amplitude and high-amplitude bins
187+
high_counts = n_per_bin[high_indices]
188+
mean_high_counts = np.mean(high_counts)
189+
std_high_counts = np.std(high_counts)
190+
if std_high_counts == 0:
191+
warnings.warn(
192+
"All the high-amplitude bins have the same size. Please consider changing n_bins. "
193+
"Setting noise cutoff to NaN"
194+
)
195+
return np.nan, ratio
196+
197+
cutoff = (mean_low_counts - mean_high_counts) / std_high_counts
198+
return cutoff, ratio
199+
200+
40201
def compute_num_spikes(sorting_analyzer, unit_ids=None, **kwargs):
41202
"""
42203
Compute the number of spike across segments.

src/spikeinterface/qualitymetrics/quality_metric_list.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@
1818
compute_firing_ranges,
1919
compute_amplitude_cv_metrics,
2020
compute_sd_ratio,
21+
compute_noise_cutoffs,
2122
)
2223

2324
from .pca_metrics import (
@@ -51,6 +52,7 @@
5152
"firing_range": compute_firing_ranges,
5253
"drift": compute_drift_metrics,
5354
"sd_ratio": compute_sd_ratio,
55+
"noise_cutoff": compute_noise_cutoffs,
5456
}
5557

5658
# a dict converting the name of the metric for computation to the output of that computation
@@ -81,6 +83,7 @@
8183
"nn_noise_overlap": ["nn_noise_overlap"],
8284
"silhouette": ["silhouette"],
8385
"silhouette_full": ["silhouette_full"],
86+
"noise_cutoff": ["noise_cutoff", "noise_ratio"],
8487
}
8588

8689
# this dict allows us to ensure the appropriate dtype of metrics rather than allow Pandas to infer them
@@ -116,4 +119,6 @@
116119
"nn_noise_overlap": float,
117120
"silhouette": float,
118121
"silhouette_full": float,
122+
"noise_cutoff": float,
123+
"noise_ratio": float,
119124
}

src/spikeinterface/qualitymetrics/tests/test_metrics_functions.py

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,13 +43,29 @@
4343
compute_quality_metrics,
4444
)
4545

46+
from spikeinterface.qualitymetrics.misc_metrics import _noise_cutoff
4647

4748
from spikeinterface.core.basesorting import minimum_spike_dtype
4849

4950

5051
job_kwargs = dict(n_jobs=2, progress_bar=True, chunk_duration="1s")
5152

5253

54+
def test_noise_cutoff():
55+
"""
56+
Generate two artifical gaussian, one truncated and one not. Check the metrics are higher for the truncated one.
57+
"""
58+
np.random.seed(1)
59+
amps = np.random.normal(0, 1, 1000)
60+
amps_trunc = amps[amps > -1]
61+
62+
cutoff1, ratio1 = _noise_cutoff(amps=amps)
63+
cutoff2, ratio2 = _noise_cutoff(amps=amps_trunc)
64+
65+
assert cutoff1 <= cutoff2
66+
assert ratio1 <= ratio2
67+
68+
5369
def test_compute_new_quality_metrics(small_sorting_analyzer):
5470
"""
5571
Computes quality metrics then computes a subset of quality metrics, and checks

0 commit comments

Comments
 (0)