Skip to content

Bugfix for error describing CSV with semi-colon delimeter#129

Merged
phargogh merged 3 commits intonatcap:mainfrom
davemfish:bugfix/128-table-resource-init
Apr 8, 2026
Merged

Bugfix for error describing CSV with semi-colon delimeter#129
phargogh merged 3 commits intonatcap:mainfrom
davemfish:bugfix/128-table-resource-init

Conversation

@davemfish
Copy link
Copy Markdown
Collaborator

Such a CSV would be described by frictionless with an extra dialect attribute. Passing that through to our TableResource model raised a ValidationError because we do not allow extra attributes on our models.

Fixes #128

I also addressed the issue of non-sensical encodings. A longer term solution is probably to stop using frictionless to describe raster and vector datasets, but I'll save that for later.
Fixes #121

Text encoding does make sense as an attribute of binary files.
Also, frictionless will assign a default encoding when it fails to
detect an encoding, so the value it was giving us was spurious.
@davemfish davemfish requested a review from phargogh April 7, 2026 20:25
Copy link
Copy Markdown
Member

@phargogh phargogh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @davemfish ! I had one question about how it might make sense to approach the other attributes returned by frictionless, but then I second-guessed myself. I suppose we can revisit my question in the future if/when frictionless decides to change their attributes around or adds attributes as the case may be.

Comment on lines +290 to +299
# These are other attributes sometimes returned by frictionless.
# We don't have a use for them in our metadata and we do not permit
# arbitrary extra attributes in our models.
description.pop('mediatype', None)
description.pop('name', None)
description.pop('profile', None)
description.pop('dialect', None)
description.pop('hash', None)
description.pop('sources', None)
description.pop('licenses', None)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case frictionless adds some additional attributes in future releases, maybe it'd be more practical to have a list of attributes we want to keep and then delete the rest?


"""
description = describe_file(source_dataset_path, scheme)
description.pop('encoding', None) # does not make sense for binary data
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh maybe my above comment doesn't make sense if the attributes to keep depends on the datatype.

@phargogh phargogh merged commit 30f533e into natcap:main Apr 8, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ValidationError on initiating a TableResource Encoding attribute doesn't make sense for binary files like GeoTiffs

2 participants