-
Notifications
You must be signed in to change notification settings - Fork 236
Add additional unsigned_to_signed doc
#4076
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
99e6d50
76755ee
f5f0bfa
56abb70
ca4db72
9eee046
6ea0c3c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,102 @@ | ||
| Unsigned to Signed Data types | ||
| ============================= | ||
|
|
||
| As of version 0.103.0 SpikeInterface has changed one of its defaults for interacting with | ||
| :code:`Recording` objects. We no longer autocast unsigned dtypes to signed implicitly. This | ||
| means that some users of SpikeInterface will need to add one additional line of code to their scripts | ||
| to explicitly handle this conversion. | ||
|
|
||
|
|
||
| Why this matters? | ||
| ----------------- | ||
|
|
||
| For those that want a deeper understanding of dtypes `NumPy provides a great explanation <https://numpy.org/doc/stable/reference/arrays.dtypes.html>`_. | ||
| For our purposes it is important to know that many pieces of recording equipment opt to store their electrophysiological data as unsigned integers | ||
| (e.g., Intan, Maxwell Biosystems, 3Brain Biocam). | ||
| Similarly to signed integers, in order to convert to real units these file formats only need to store a :code:`gain` | ||
| and an :code:`offset`. Our :code:`RecordingExtractor`'s maintain the dtype that the file format utilizes, which means that some of our | ||
h-mayorquin marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| :code:`RecordingExtractor`'s will have unsigned dtypes. | ||
|
|
||
| The problem with using unsigned dtypes is that many types of functions (including the ones we use from :code:`SciPy`) perform poorly with unsigned integers. | ||
| This is made worse by the fact that these failures are silent (i.e. no error is triggered but the operation leads to nonsensical data). So the | ||
| solution required is to convernt unsigned integers into signed integers. Previously we did this under the hood, automatically for users that had | ||
| a :code:`Recording` object with an unsigned dtype. | ||
|
|
||
| We decided, however, that implicitly performing this action was not the best course of action, since: | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Great. Thanks. |
||
|
|
||
| 1) *explicit* is always better than *implicit* | ||
| 2) some functions would *magically* change the dtype of the :code:`Recording` object, which can cause confusion | ||
|
|
||
| So from version 0.103.0, users will now explicitly have to perform this transformation of their data. This will help users better understand how they are | ||
| processing their data during an analysis pipeline as well as better understand the provenance of their pipeline. | ||
|
|
||
|
|
||
| Using :code:`unsigned_to_signed` | ||
| -------------------------------- | ||
|
|
||
| For users that receive an error because their :code:`Recording` is unsigned, their is one additional step that must be done: | ||
|
|
||
| .. code:: python | ||
|
|
||
| import spikeinterface.extractors as se | ||
| import spikeinterface.preprocessing as spre | ||
|
|
||
| # Intan is an example of unsigned data | ||
| recording = se.read_intan('path/to/my/file.rhd', stream_id='0') | ||
| # to get a signed version of our Recording we use the following function | ||
| recording_signed = spre.unsigned_to_signed(recording) | ||
| # we can now apply any preprocessing functions like normal, e.g. | ||
| recording_filtered = spre.bandpass_filter(recording_signed) | ||
|
|
||
|
|
||
| Now with the signed dtype of the :code:`Recording` one can use a SpikeInterface pipeline as usual. | ||
|
|
||
|
|
||
| If you are curious if your :code:`Recording` is unsigned you can simply check the repr or use :code:`get_dtype()` | ||
|
|
||
| .. code:: python | ||
|
|
||
| # the repr automatically displays the dtype | ||
| print(recording) | ||
| # use method on the Recording object | ||
| print(recording.get_dtype()) | ||
|
|
||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would be good to mention what you'd expect here. E.g. "If this returns a dtype starting with a u, such as
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you think I should just write up a parsed-literal block or you think it is sufficient to just say
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Up to you. I think sufficient just to say: the possible things it can return are pretty simple. |
||
| In either case, if the dtype displayed has a :code:`u` at the beginning (e.g. :code:`uint16`) then your recording is | ||
| unsigned. If it doesn't have the :code:`u` (e.g. :code:`int16`) then it is signed and would not need this preprocessing step. | ||
|
|
||
|
|
||
| Bit depth | ||
| --------- | ||
|
|
||
| One final important piece of information for some users is the concept of bit depth, which is the number of bits used to | ||
| sample the data. The :code:`bit_depth` argument that can be fed into the :code:`unsigned_to_signed` function. | ||
| This should be used in cases where the ADC bit depth does not match the bit depth of the data type (e.g., if the data is | ||
| stored as :code:`uint16` but the ADC is 12 bits). | ||
| Let's make a concrete example: the Biocam acquisition system from 3Brain uses a 12-bit ADC and stores the data as | ||
| :code:`uint16`. This means that the data is stored in a 16-bit unsigned integer format, but the actual data | ||
| only covers a 12-bit range. Therefore, that the "zero" of the data is not at 0, nor at half of the :code:`uint16` range (i.e. 2^15), | ||
| but rather at 2048 (i.e., 2^12). | ||
| In this case, setting the :code:`bit_depth` argument to 12 will allow the :code:`unsigned_to_signed` function to | ||
| correctly convert the unsigned data to signed data and offset the data to be centered around 0, by subtracting 2048 | ||
| while converting the data from unsigned to signed. | ||
|
|
||
| .. code:: python | ||
|
|
||
| recording_unsigned = se.read_biocam('path/to/my/file.brw') | ||
| # we can now convert to signed with the correct bit depth | ||
| recording_signed = spre.unsigned_to_signed(recording_unsigned, bit_depth=12) | ||
|
|
||
|
|
||
| Additional Notes | ||
| ---------------- | ||
|
|
||
| 1) Some sorters make use of SpikeInterface preprocessing either | ||
| within their wrappers or within their own code base. So remember to use the "signed" version of | ||
| your recording for the rest of your pipeline. | ||
|
|
||
| 2) Using :code:`unsigned_to_signed` in versions less than 0.103.0 does not hurt your scripts. This | ||
| option was available previously along with the implicit option. Adding this into scripts with old | ||
| versions of SpikeInterface will still work and will "future-proof" your scripts for when you | ||
| update to a version greater than or equal to 0.103.0. | ||
|
|
||
| 3) For additional information on units and scaling in SpikeInterface see :ref:`physical_units`. | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe mention the formats here that suffer from it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the perfect being the enemy of good. The only way I can think of doing this is:
both of these are too time-intensive for me. Are you interested in making this list for us and then I'll add it in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. I don't have time either.
We don't have the resources to be exhaustive but do we know at least or or two examples. Didn't this cause troubles for you and that's how you noticed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it is in the example. I know Intan does this. But that's the only one I know for sure off the top of my head. (I could do a
e.g. Intan) but I also didn't want to preference one system.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is fine to add only one to have at least is one example. We add more if we discover them to be more. I guess we can just do a search for uint16 in neo but for this PR at least let's have an example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me look for at least a second example if that exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maxwell:
https://github.com/h-mayorquin/python-neo/blob/2240a29c4366fb3ed5dacadb54837004651ee35e/neo/rawio/maxwellrawio.py#L184-L192
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Biocam as well it seems:
https://github.com/h-mayorquin/python-neo/blob/2240a29c4366fb3ed5dacadb54837004651ee35e/neo/rawio/biocamrawio.py#L87-L100
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thansk guys for looking this up. TO my knowledge, it's indeed Intan, Maxwell, and Biocam. Let's start adding these and add more if they pop up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add the list in another PR if you want @zm711. You have already done a lot with this : )