Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/how_to/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,4 +18,5 @@ Guides on how to solve specific, short problems in SpikeInterface. Learn how to.
auto_curation_training
auto_curation_prediction
physical_units
unsigned_to_signed
customize_a_plot
2 changes: 2 additions & 0 deletions doc/how_to/physical_units.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _physical_units:

Working with physical units in SpikeInterface recordings
========================================================

Expand Down
102 changes: 102 additions & 0 deletions doc/how_to/unsigned_to_signed.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
Unsigned to Signed Data types
=============================

As of version 0.103.0 SpikeInterface has changed one of its defaults for interacting with
:code:`Recording` objects. We no longer autocast unsigned dtypes to signed implicitly. This
means that some users of SpikeInterface will need to add one additional line of code to their scripts
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe mention the formats here that suffer from it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the perfect being the enemy of good. The only way I can think of doing this is:

  1. looking through the code in neo for each file format to see how it is stored
  2. testing this myself to see how each is stored

both of these are too time-intensive for me. Are you interested in making this list for us and then I'll add it in?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. I don't have time either.

We don't have the resources to be exhaustive but do we know at least or or two examples. Didn't this cause troubles for you and that's how you noticed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is in the example. I know Intan does this. But that's the only one I know for sure off the top of my head. (I could do a e.g. Intan) but I also didn't want to preference one system.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is fine to add only one to have at least is one example. We add more if we discover them to be more. I guess we can just do a search for uint16 in neo but for this PR at least let's have an example.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me look for at least a second example if that exists.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thansk guys for looking this up. TO my knowledge, it's indeed Intan, Maxwell, and Biocam. Let's start adding these and add more if they pop up

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add the list in another PR if you want @zm711. You have already done a lot with this : )

to explicitly handle this conversion.


Why this matters?
-----------------

For those that want a deeper understanding of dtypes `NumPy provides a great explanation <https://numpy.org/doc/stable/reference/arrays.dtypes.html>`_.
For our purposes it is important to know that many pieces of recording equipment opt to store their electrophysiological data as unsigned integers
(e.g., Intan, Maxwell Biosystems, 3Brain Biocam).
Similarly to signed integers, in order to convert to real units these file formats only need to store a :code:`gain`
and an :code:`offset`. Our :code:`RecordingExtractor`'s maintain the dtype that the file format utilizes, which means that some of our
:code:`RecordingExtractor`'s will have unsigned dtypes.

The problem with using unsigned dtypes is that many types of functions (including the ones we use from :code:`SciPy`) perform poorly with unsigned integers.
This is made worse by the fact that these failures are silent (i.e. no error is triggered but the operation leads to nonsensical data). So the
solution required is to convernt unsigned integers into signed integers. Previously we did this under the hood, automatically for users that had
a :code:`Recording` object with an unsigned dtype.

We decided, however, that implicitly performing this action was not the best course of action, since:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. Thanks.


1) *explicit* is always better than *implicit*
2) some functions would *magically* change the dtype of the :code:`Recording` object, which can cause confusion

So from version 0.103.0, users will now explicitly have to perform this transformation of their data. This will help users better understand how they are
processing their data during an analysis pipeline as well as better understand the provenance of their pipeline.


Using :code:`unsigned_to_signed`
--------------------------------

For users that receive an error because their :code:`Recording` is unsigned, their is one additional step that must be done:

.. code:: python

import spikeinterface.extractors as se
import spikeinterface.preprocessing as spre

# Intan is an example of unsigned data
recording = se.read_intan('path/to/my/file.rhd', stream_id='0')
# to get a signed version of our Recording we use the following function
recording_signed = spre.unsigned_to_signed(recording)
# we can now apply any preprocessing functions like normal, e.g.
recording_filtered = spre.bandpass_filter(recording_signed)


Now with the signed dtype of the :code:`Recording` one can use a SpikeInterface pipeline as usual.


If you are curious if your :code:`Recording` is unsigned you can simply check the repr or use :code:`get_dtype()`

.. code:: python

# the repr automatically displays the dtype
print(recording)
# use method on the Recording object
print(recording.get_dtype())

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to mention what you'd expect here. E.g. "If this returns a dtype starting with a u, such as uint16, the recording if using unsigned int. While if this returns ... "

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think I should just write up a parsed-literal block or you think it is sufficient to just say uint16?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Up to you. I think sufficient just to say: the possible things it can return are pretty simple.

In either case, if the dtype displayed has a :code:`u` at the beginning (e.g. :code:`uint16`) then your recording is
unsigned. If it doesn't have the :code:`u` (e.g. :code:`int16`) then it is signed and would not need this preprocessing step.


Bit depth
---------

One final important piece of information for some users is the concept of bit depth, which is the number of bits used to
sample the data. The :code:`bit_depth` argument that can be fed into the :code:`unsigned_to_signed` function.
This should be used in cases where the ADC bit depth does not match the bit depth of the data type (e.g., if the data is
stored as :code:`uint16` but the ADC is 12 bits).
Let's make a concrete example: the Biocam acquisition system from 3Brain uses a 12-bit ADC and stores the data as
:code:`uint16`. This means that the data is stored in a 16-bit unsigned integer format, but the actual data
only covers a 12-bit range. Therefore, that the "zero" of the data is not at 0, nor at half of the :code:`uint16` range (i.e. 2^15),
but rather at 2048 (i.e., 2^12).
In this case, setting the :code:`bit_depth` argument to 12 will allow the :code:`unsigned_to_signed` function to
correctly convert the unsigned data to signed data and offset the data to be centered around 0, by subtracting 2048
while converting the data from unsigned to signed.

.. code:: python

recording_unsigned = se.read_biocam('path/to/my/file.brw')
# we can now convert to signed with the correct bit depth
recording_signed = spre.unsigned_to_signed(recording_unsigned, bit_depth=12)


Additional Notes
----------------

1) Some sorters make use of SpikeInterface preprocessing either
within their wrappers or within their own code base. So remember to use the "signed" version of
your recording for the rest of your pipeline.

2) Using :code:`unsigned_to_signed` in versions less than 0.103.0 does not hurt your scripts. This
option was available previously along with the implicit option. Adding this into scripts with old
versions of SpikeInterface will still work and will "future-proof" your scripts for when you
update to a version greater than or equal to 0.103.0.

3) For additional information on units and scaling in SpikeInterface see :ref:`physical_units`.
4 changes: 3 additions & 1 deletion src/spikeinterface/preprocessing/filter.py
Original file line number Diff line number Diff line change
Expand Up @@ -423,9 +423,11 @@ def fix_dtype(recording, dtype):
# if uint --> force int
if dtype.kind == "u":
raise ValueError(
"Unsigned types are not supported, since they don't ineract well with "
"Unsigned types are not supported, since they don't interact well with "
"various preprocessing steps. You can use "
"`spikeinterface.preprocessing.unsigned_to_signed` to convert the recording to a signed type."
"For more information, please see "
"https://spikeinterface.readthedocs.io/en/stable/how_to/unsigned_to_signed.html"
)

return dtype
Loading