Merge pull request #4072 from huizizhang949/add-noisecutoff2

chrishalcrow · web-flow · commit e95622f3a313 · 2025-07-28T10:06:50.000+01:00
Add IBL's "noise cutoff" quality metric (again)
diff --git a/doc/modules/qualitymetrics/example_cutoff.png b/doc/modules/qualitymetrics/example_cutoff.png
diff --git a/doc/modules/qualitymetrics/noise_cutoff.rst b/doc/modules/qualitymetrics/noise_cutoff.rst
@@ -1,30 +1,128 @@
-Noise cutoff (not currently implemented)
-========================================
+Noise cutoff (:code:`noise_cutoff`)
+===================================
 
 Calculation
 -----------
 
 
-Metric describing whether an amplitude distribution is cut off, similar to _amp_cutoff  :ref:`amplitude cutoff <amp_cutoff>` but without a Gaussian assumption.
-A histogram of amplitudes is created and quantifies the distance between the low tail, mean number of spikes and high tail in terms of standard deviations.
+Metric describing whether an amplitude distribution is cut off as it approaches zero, similar to  :ref:`amplitude cutoff <amp_cutoff>` but without a Gaussian assumption.
 
-A SpikeInterface implementation is not yet available.
+The **noise cutoff** metric assesses whether a unit’s spike‐amplitude distribution is truncated
+at the low-end, which may be due to the high amplitude detection threshold in the deconvolution step,
+i.e., if low‐amplitude spikes were missed.  It does not assume a Gaussian shape;
+instead, it directly compares counts in the low‐amplitude bins to counts in high‐amplitude bins.
+
+1. **Build a histogram**
+
+   For each unit, divide all amplitudes into ``n_bins`` equally spaced bins over the range of the amplitude.
+   If the number of spikes is large, you may consider using a larger ``n_bins``. For a small number of spikes, consider a smaller ``n_bins``.
+   Let :math:`n_i` denote the count in the :math:`i`-th bin.
+
+2. **Identify the “low” region**
+    - Compute the amplitude value at the specified ``low_quantile`` (for example, 0.10 = 10th percentile), denoted as :math:`\text{amp}_{low}`.
+    - Find all histogram bins whose upper edge is below that quantile value.  These bins form the "low‐quantile region".
+    - Compute
+
+    .. math::
+        L_{\mathrm{bins}} = \bigl\{i : \text{upper_edge}_i \le \text{amp}_{low}\bigr\}, \quad
+        \mu_{\mathrm{low}} = \frac{1}{|L_{\mathrm{bins}}|}\sum_{i\in L_{\mathrm{bins}}} n_i.
+
+3. **Identify the “high” region**
+
+   - Compute the amplitude value at the specified ``high_quantile`` (for example, 0.25 = top 25th percentile), denoted as :math:`\text{amp}_{high}`.
+   - Find all histogram bins whose lower edge is greater than that quantile value.  These bins form the "high‐quantile region".
+   - Compute
+
+    .. math::
+        H_{\mathrm{bins}} &= \bigl\{i : \text{lower_edge}_i \ge \text{amp}_{high}\bigr\}, \\
+        \mu_{\mathrm{high}} &= \frac{1}{|H_{\mathrm{bins}}|}\sum_{i\in H_{\mathrm{bins}}} n_i, \quad
+        \sigma_{\mathrm{high}} = \sqrt{\frac{1}{|H_{\mathrm{bins}}|-1}\sum_{i\in H_{\mathrm{bins}}}\bigl(n_i-\mu_{\mathrm{high}} \bigr)^2}.
+
+4. **Compute cutoff**
+
+   The *cutoff* is given by how many standard deviations away the low-amplitude bins are from the high-amplitude bins, defined as
+
+    .. math::
+     \mathrm{cutoff} = \frac{\mu_{\mathrm{low}} - \mu_{\mathrm{high}}}{\sigma_{\mathrm{high}}}.
+
+
+   - If no low‐quantile bins exist, a warning is issued and ``cutoff = NaN``.
+   - If no high‐quantile bins exist or :math:`\sigma_{\mathrm{high}} = 0`, a warning is issued and ``cutoff = NaN``.
+
+5. **Compute the low-to-peak ratio**
+
+   - Let :math:`M = \max_i\,n_i` be the height of the largest bin in the histogram.
+   - Define
+
+   .. math::
+        \mathrm{ratio} = \frac{\mu_{\mathrm{low}}}{M}.
+
+
+   - If there are no low bins, :math:`\mathrm{ratio} = NaN`.
+
+
+Together, ``(cutoff, ratio)`` quantify how suppressed the low‐end of the amplitude distribution is relative to the top quantile and to the peak.
 
 Expectation and use
 -------------------
 
 Noise cutoff attempts to describe whether an amplitude distribution is cut off.
-The metric is loosely based on [Hill]_'s amplitude cutoff, but is here adapted (originally by [IBL]_) to avoid making the Gaussianity assumption on spike distributions.
-Noise cutoff provides an estimate of false negative rate, so a lower value indicates fewer missed spikes (a more complete unit).
+Larger values of ``cutoff`` and ``ratio`` suggest that the distribution is cut-off.
+IBL uses the default value of 1 (equivalent to e.g. ``low_quantile=0.01, n_bins=100``) to choose the number of
+lower bins, with a suggested threshold of 5 for ``cutoff`` to determine whether a unit is cut off or not.
+In practice, the IBL threshold is quite conservative, and a lower threshold might work better for your data.
+We suggest plotting the data using the  :py:func:`~spikeinterface.widgets.plot_amplitudes` widget to view your data when choosing your threshold.
+It is suggested to use this metric when the amplitude histogram is **unimodal**.
+
+The metric is loosely based on [Hill]_'s amplitude cutoff, but is here adapted (originally by [IBL2024]_) to avoid making the Gaussian assumption on spike distributions.
+
+Example code
+------------
+
+.. code-block:: python
+
+    import numpy as np
+    import matplotlib.pyplot as plt
+    from spikeinterface.full as si
+
+    # Suppose `sorting_analyzer` has been computed with spike amplitudes:
+    # Select units you are interested in visualizing
+    unit_ids = ...
+
+    # Compute noise_cutoff:
+    summary_dict = compute_noise_cutoff(
+        sorting_analyzer=sorting_analyzer
+        high_quantile=0.25,
+        low_quantile=0.10,
+        n_bins=100,
+        unit_ids=unit_ids
+    )
+
+Reference
+---------
+
+.. autofunction:: spikeinterface.qualitymetrics.misc_metrics.compute_noise_cutoff
+
+Examples with plots
+-------------------
+
+Here is shown the histogram of two units, with the vertical lines separating low- and high-amplitude regions.
+
+- On the left, we have a unit with no truncation at the left end, and the cutoff and ratio are small.
+- On the right, we have a unit with truncation at -1, and the cutoff and ratio are much larger.
 
+.. image:: example_cutoff.png
+    :width: 600
 
 Links to original implementations
 ---------------------------------
 
 * From `IBL implementation <https://github.com/int-brain-lab/ibllib/blob/2e1f91c622ba8dbd04fc53946c185c99451ce5d6/brainbox/metrics/single_units.py>`_
 
+Note: Compared to the original implementation, we have added a comparison between the low-amplitude bins to the largest bin (``noise_ratio``).
+The selection of low-amplitude bins is based on the ``low_quantile`` rather than the number of bins.
 
 Literature
 ----------
 
-Metric introduced by [IBL]_ (adapted from [Hill]_'s amplitude cutoff metric).
+Metric introduced by [IBL2024]_.
diff --git a/doc/references.rst b/doc/references.rst
@@ -123,6 +123,8 @@ References
 
 .. [IBL] `Spike sorting pipeline for the International Brain Laboratory. 2022. <https://figshare.com/articles/online_resource/Spike_sorting_pipeline_for_the_International_Brain_Laboratory/19705522/3>`_
 
+.. [IBL2024] `Spike sorting pipeline for the International Brain Laboratory - Version 2. 2024. <https://figshare.com/articles/online_resource/Spike_sorting_pipeline_for_the_International_Brain_Laboratory/19705522?file=49783080>`_
+
 .. [Jackson] `Quantitative assessment of extracellular multichannel recording quality using measures of cluster separation. Society of Neuroscience Abstract. 2005. <https://www.sciencedirect.com/science/article/abs/pii/S0306452204008425>`_
 
 .. [Jain] `UnitRefine: A Community Toolbox for Automated Spike Sorting Curation. 2025 <https://www.biorxiv.org/content/10.1101/2025.03.30.645770v1>`_
diff --git a/src/spikeinterface/qualitymetrics/misc_metrics.py b/src/spikeinterface/qualitymetrics/misc_metrics.py
@@ -37,6 +37,167 @@
 _default_params = dict()
 
 
+def compute_noise_cutoffs(sorting_analyzer, high_quantile=0.25, low_quantile=0.1, n_bins=100, unit_ids=None):
+    """
+    A metric to determine if a unit's amplitude distribution is cut off as it approaches zero, without assuming a Gaussian distribution.
+
+    Based on the histogram of the (transformed) amplitude:
+
+    1. This method compares counts in the lower-amplitude bins to counts in the top 'high_quantile' of the amplitude range.
+    It computes the mean and std of an upper quantile of the distribution, and calculates how many standard deviations away
+    from that mean the lower-quantile bins lie.
+
+    2. The method also compares the counts in the lower-amplitude bins to the count in the highest bin and return their ratio.
+
+    Parameters
+    ----------
+    sorting_analyzer : SortingAnalyzer
+        A SortingAnalyzer object.
+    high_quantile : float, default: 0.25
+        Quantile of the amplitude range above which values are treated as "high" (e.g. 0.25 = top 25%), the reference region.
+    low_quantile : int, default: 0.1
+        Quantile of the amplitude range below which values are treated as "low" (e.g. 0.1 = lower 10%), the test region.
+    n_bins: int, default: 100
+        The number of bins to use to compute the amplitude histogram.
+    unit_ids : list or None
+        List of unit ids to compute the amplitude cutoffs. If None, all units are used.
+
+    Returns
+    -------
+    noise_cutoff_dict : dict of floats
+        Estimated metrics based on the amplitude distribution, for each unit ID.
+
+    References
+    ----------
+    Inspired by metric described in [IBL2024]_
+
+    """
+    res = namedtuple("cutoff_metrics", ["noise_cutoff", "noise_ratio"])
+    if unit_ids is None:
+        unit_ids = sorting_analyzer.unit_ids
+
+    noise_cutoff_dict = {}
+    noise_ratio_dict = {}
+    if not sorting_analyzer.has_extension("spike_amplitudes"):
+        warnings.warn(
+            "`compute_noise_cutoffs` requires the 'spike_amplitudes` extension. Please run sorting_analyzer.compute('spike_amplitudes') to be able to compute `noise_cutoff`"
+        )
+        for unit_id in unit_ids:
+            noise_cutoff_dict[unit_id] = np.nan
+            noise_ratio_dict[unit_id] = np.nan
+        return res(noise_cutoff_dict, noise_ratio_dict)
+
+    amplitude_extension = sorting_analyzer.get_extension("spike_amplitudes")
+    peak_sign = amplitude_extension.params["peak_sign"]
+    if peak_sign == "both":
+        raise TypeError(
+            '`peak_sign` should either be "pos" or "neg". You can set `peak_sign` as an argument when you compute spike_amplitudes.'
+        )
+
+    amplitudes_by_units = _get_amplitudes_by_units(sorting_analyzer, unit_ids, peak_sign)
+
+    for unit_id in unit_ids:
+        amplitudes = amplitudes_by_units[unit_id]
+
+        # We assume the noise (zero values) is on the lower tail of the amplitude distribution.
+        # But if peak_sign == 'neg', the noise will be on the higher tail, so we flip the distribution.
+        if peak_sign == "neg":
+            amplitudes = -amplitudes
+
+        cutoff, ratio = _noise_cutoff(amplitudes, high_quantile=high_quantile, low_quantile=low_quantile, n_bins=n_bins)
+        noise_cutoff_dict[unit_id] = cutoff
+        noise_ratio_dict[unit_id] = ratio
+
+    return res(noise_cutoff_dict, noise_ratio_dict)
+
+
+_default_params["noise_cutoff"] = dict(high_quantile=0.25, low_quantile=0.1, n_bins=100)
+
+
+def _noise_cutoff(amps, high_quantile=0.25, low_quantile=0.1, n_bins=100):
+    """
+    A metric to determine if a unit's amplitude distribution is cut off as it approaches zero, without assuming a Gaussian distribution.
+
+    Based on the histogram of the (transformed) amplitude:
+
+    1. This method compares counts in the lower-amplitude bins to counts in the higher_amplitude bins.
+    It computes the mean and std of an upper quantile of the distribution, and calculates how many standard deviations away
+    from that mean the lower-quantile bins lie.
+
+    2. The method also compares the counts in the lower-amplitude bins to the count in the highest bin and return their ratio.
+
+    Parameters
+    ----------
+    amps : array-like
+        Spike amplitudes.
+    high_quantile : float, default: 0.25
+        Quantile of the amplitude range above which values are treated as "high" (e.g. 0.25 = top 25%), the reference region.
+    low_quantile : int, default: 0.1
+        Quantile of the amplitude range below which values are treated as "low" (e.g. 0.1 = lower 10%), the test region.
+    n_bins: int, default: 100
+        The number of bins to use to compute the amplitude histogram.
+
+    Returns
+    -------
+    cutoff : float
+        (mean(lower_bins_count) - mean(high_bins_count)) / std(high_bins_count)
+    ratio: float
+        mean(lower_bins_count) / highest_bin_count
+
+    """
+    n_per_bin, bin_edges = np.histogram(amps, bins=n_bins)
+
+    maximum_bin_height = np.max(n_per_bin)
+
+    low_quantile_value = np.quantile(amps, q=low_quantile)
+
+    # the indices for low-amplitude bins
+    low_indices = np.where(bin_edges[1:] <= low_quantile_value)[0]
+
+    high_quantile_value = np.quantile(amps, q=1 - high_quantile)
+
+    # the indices for high-amplitude bins
+    high_indices = np.where(bin_edges[:-1] >= high_quantile_value)[0]
+
+    if len(low_indices) == 0:
+        warnings.warn(
+            "No bin is selected to test cutoff. Please increase low_quantile. Setting noise cutoff and ratio to NaN"
+        )
+        return np.nan, np.nan
+
+    # compute ratio between low-amplitude bins and the largest bin
+    low_counts = n_per_bin[low_indices]
+    mean_low_counts = np.mean(low_counts)
+    ratio = mean_low_counts / maximum_bin_height
+
+    if len(high_indices) == 0:
+        warnings.warn(
+            "No bin is selected as the reference region. Please increase high_quantile. Setting noise cutoff to NaN"
+        )
+        return np.nan, ratio
+
+    if len(high_indices) == 1:
+        warnings.warn(
+            "Only one bin is selected as the reference region, and thus the standard deviation cannot be computed. "
+            "Please increase high_quantile. Setting noise cutoff to NaN"
+        )
+        return np.nan, ratio
+
+    # compute cutoff from low-amplitude and high-amplitude bins
+    high_counts = n_per_bin[high_indices]
+    mean_high_counts = np.mean(high_counts)
+    std_high_counts = np.std(high_counts)
+    if std_high_counts == 0:
+        warnings.warn(
+            "All the high-amplitude bins have the same size. Please consider changing n_bins. "
+            "Setting noise cutoff to NaN"
+        )
+        return np.nan, ratio
+
+    cutoff = (mean_low_counts - mean_high_counts) / std_high_counts
+    return cutoff, ratio
+
+
 def compute_num_spikes(sorting_analyzer, unit_ids=None, **kwargs):
     """
     Compute the number of spike across segments.
diff --git a/src/spikeinterface/qualitymetrics/quality_metric_list.py b/src/spikeinterface/qualitymetrics/quality_metric_list.py
@@ -18,6 +18,7 @@
     compute_firing_ranges,
     compute_amplitude_cv_metrics,
     compute_sd_ratio,
+    compute_noise_cutoffs,
 )
 
 from .pca_metrics import (
@@ -51,6 +52,7 @@
     "firing_range": compute_firing_ranges,
     "drift": compute_drift_metrics,
     "sd_ratio": compute_sd_ratio,
+    "noise_cutoff": compute_noise_cutoffs,
 }
 
 # a dict converting the name of the metric for computation to the output of that computation
@@ -81,6 +83,7 @@
     "nn_noise_overlap": ["nn_noise_overlap"],
     "silhouette": ["silhouette"],
     "silhouette_full": ["silhouette_full"],
+    "noise_cutoff": ["noise_cutoff", "noise_ratio"],
 }
 
 # this dict allows us to ensure the appropriate dtype of metrics rather than allow Pandas to infer them
@@ -116,4 +119,6 @@
     "nn_noise_overlap": float,
     "silhouette": float,
     "silhouette_full": float,
+    "noise_cutoff": float,
+    "noise_ratio": float,
 }
diff --git a/src/spikeinterface/qualitymetrics/tests/test_metrics_functions.py b/src/spikeinterface/qualitymetrics/tests/test_metrics_functions.py
@@ -43,13 +43,29 @@
     compute_quality_metrics,
 )
 
+from spikeinterface.qualitymetrics.misc_metrics import _noise_cutoff
 
 from spikeinterface.core.basesorting import minimum_spike_dtype
 
 
 job_kwargs = dict(n_jobs=2, progress_bar=True, chunk_duration="1s")
 
 
+def test_noise_cutoff():
+    """
+    Generate two artifical gaussian, one truncated and one not. Check the metrics are higher for the truncated one.
+    """
+    np.random.seed(1)
+    amps = np.random.normal(0, 1, 1000)
+    amps_trunc = amps[amps > -1]
+
+    cutoff1, ratio1 = _noise_cutoff(amps=amps)
+    cutoff2, ratio2 = _noise_cutoff(amps=amps_trunc)
+
+    assert cutoff1 <= cutoff2
+    assert ratio1 <= ratio2
+
+
 def test_compute_new_quality_metrics(small_sorting_analyzer):
     """
     Computes quality metrics then computes a subset of quality metrics, and checks