-
Notifications
You must be signed in to change notification settings - Fork 11
feat: optional codec and data type
#33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: optional codec and data type
#33
Conversation
|
Looks good, the only issue I see is the fill value representation not allowing null for the base data type fill value, e.g. for a base data type of json or nested optional. Instead you could specify the base data type fill value in a single-element array: null -> missing |
|
While I see the simplicity here of toggling between |
I suppose it is a limitation, but I'd note that we don't have any data types that permit a Also, for a multiply nested optional type like |
I think |
|
I've implemented this in
I am open to explicit suggestions that satisfy both, or only the first. The latter is a bit burdensome to support for something I suspect nobody would use. What I currently do:
|
I previously suggested wrapping any non-None fill value in a one-element array. That solves both issues and is syntactically pretty minimal. |
|
I'm still not quite following why this current implementation is not just a special case of a more general sum type, an enum type in Rust. Currently, Optional has a bit that toggles between
Why build this infrastructure for |
|
Interesting... masked data has come up in a few discussions I've had, but never a more general sum type. Will people use this? It seems like not many people are complaining about the lack of struct support in Zarr V3. A more general enum type would probably need to encode each variant through separate codec chains. E.g. enum EnumType {
U8(u8),
String(String),
}{
"data_type": {
"name": "enum",
"configuration": {
"data_types": [
{
"name": "uint8"
},
{
"name": "string"
}
]
}
},
"fill_value": "?",
"codecs": [
{
"name": "enum",
"configuration": {
"discriminator_data_type": {
"name": "uint8"
},
"discriminator_codecs": [
{
"name": "bytes"
}
],
"variant_codecs": [
[
{
"name": "bytes"
}
],
[
{
"name": "vlen-utf8"
}
]
]
}
}
]
}Specialising the above for an optional would need a new |
|
The other two null-like types I would immediately like to use this for are:
I do not think these are well represented by julia> NaN == true
false
julia> missing == true
missingIf we could generalize this to any two types and introduce a |
|
A recent application is in the tracking standard GEFF, they introduced a While they use Python's |
I'm still finalising an implementation, but here is a draft spec.