-
Notifications
You must be signed in to change notification settings - Fork 17
[DO NOT MERGE] feat: remove generated protobuf code from substrait-python #145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
ACTION NEEDED Substrait follows the Conventional Commits The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved all of the extension files to a different dedicated directory, to avoid a conflict with the substrait.extensions module which is where the extension.proto code gets generated into.
| from substrait.parameterized_types_pb2 import * | ||
| from substrait.plan_pb2 import * | ||
| from substrait.type_expressions_pb2 import * | ||
| from substrait.type_pb2 import * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not clever enough to recreate whatever was happening in proto.py, but I am smart enough to re-export it all by hand. It's not as flashy, but it does the job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tbh, pretty sure that module was there to avoid writing out gen.proto in imports (which I was doing anyway 😆) if protos are moved to top-level substrait namespace, I don't think we need to keep this around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So without this we actually get test failures like
AttributeError: module 'substrait.proto' has no attribute 'ExtendedExpression
in https://github.com/substrait-io/substrait-python/blob/main/tests/sql/test_sql_to_substrait.py, which seem to be coming from the following line in pyarrow:
import substrait.proto # no-cython-lintWe should probably update the code there to not rely on that import, but I think we want to do that after we have the new path available in this library.
src/substrait/version.py
Outdated
| @@ -0,0 +1,3 @@ | |||
| from importlib.metadata import version | |||
|
|
|||
| substrait_version = version("vbarua-substrait-protobuf") | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of baking in the substrait version into substrait-python, we can just use the version of substrait-protobuf which should correspond to a spec version.
|
|
||
| __substrait_version__ = "0.79.0" | ||
| __substrait_hash__ = "92d2e75" | ||
| __minimum_substrait_version__ = "0.30.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of these 3 versions, the only thing I've preserved is substrait_version which can be found in version.py now.
substrait_hash is optional, and we don't need to be setting it or making it available if we're making the substrait_version available.
I'm not sure what __minimum_substrait_version__ is for, but it hasn't been update in over 2 years.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this made me think if we might have a problem if someone inadvertently installs different versions of substrait-protobuf and substrait-antlr for example... do you think substrait_version should refer exclusively to protobuf? or maybe we need some sort of an eager check to make sure all those substrait package versions line up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or maybe we need some sort of an eager check to make sure all those substrait package versions line up
I think we should definitely have a build / release check that verifies that all of the substrait-* libraries are on the same version. Otherwise you start to run into weird scenarios like "The library handles protobufs at version X but extensions at version Y" which would be quite confusing IMO.
|
This looks like a lot of changes, but the vast majority of changes are just import path updates of to |
Avoiding conflicts with generated substrait.extension protos
1db7948 to
c93ec8e
Compare
|
Following the merge of substrait-io/substrait-packaging#2, I was able to run the release workflow to publish substrait-protobuf v0.79.0 to TestPyPI. I've updated this PR to depend on that artifact, which allows us to check that it will integrate into substrait-python when published for real. If these changes look reasonable, the next steps would be to:
After that, I can work on extracting the other artifacts (antlr, extensions, etc) 1-by-1 and integrate them in the same way. The other option would be to extract everything first, and then do one mega-update. I have a preference for the piecemeal option, as I think it makes the reviews more tractable. While we're performing this migration, we'll have take a bit of care to avoid bumping the substrait submodule version without updating Let me know what you think of this plan @nielspardon @tokoko. |
Generated code is packaged and available in the substrait-protobuf package