Skip to content

Conversation

@vbarua
Copy link
Member

@vbarua vbarua commented Jan 14, 2026

Generated code is packaged and available in the substrait-protobuf package

@CLAassistant
Copy link

CLAassistant commented Jan 14, 2026

CLA assistant check
All committers have signed the CLA.

@vbarua vbarua changed the title feat: remove generated protobuf code from substrait-python [DO NOT MERGE] feat: remove generated protobuf code from substrait-python Jan 14, 2026
@github-actions
Copy link

ACTION NEEDED

Substrait follows the Conventional Commits
specification
for
release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved all of the extension files to a different dedicated directory, to avoid a conflict with the substrait.extensions module which is where the extension.proto code gets generated into.

from substrait.parameterized_types_pb2 import *
from substrait.plan_pb2 import *
from substrait.type_expressions_pb2 import *
from substrait.type_pb2 import *
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not clever enough to recreate whatever was happening in proto.py, but I am smart enough to re-export it all by hand. It's not as flashy, but it does the job.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh, pretty sure that module was there to avoid writing out gen.proto in imports (which I was doing anyway 😆) if protos are moved to top-level substrait namespace, I don't think we need to keep this around.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So without this we actually get test failures like

AttributeError: module 'substrait.proto' has no attribute 'ExtendedExpression

in https://github.com/substrait-io/substrait-python/blob/main/tests/sql/test_sql_to_substrait.py, which seem to be coming from the following line in pyarrow:

    import substrait.proto  # no-cython-lint

We should probably update the code there to not rely on that import, but I think we want to do that after we have the new path available in this library.

@@ -0,0 +1,3 @@
from importlib.metadata import version

substrait_version = version("vbarua-substrait-protobuf")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of baking in the substrait version into substrait-python, we can just use the version of substrait-protobuf which should correspond to a spec version.


__substrait_version__ = "0.79.0"
__substrait_hash__ = "92d2e75"
__minimum_substrait_version__ = "0.30.0"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of these 3 versions, the only thing I've preserved is substrait_version which can be found in version.py now.

substrait_hash is optional, and we don't need to be setting it or making it available if we're making the substrait_version available.

I'm not sure what __minimum_substrait_version__ is for, but it hasn't been update in over 2 years.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this made me think if we might have a problem if someone inadvertently installs different versions of substrait-protobuf and substrait-antlr for example... do you think substrait_version should refer exclusively to protobuf? or maybe we need some sort of an eager check to make sure all those substrait package versions line up

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or maybe we need some sort of an eager check to make sure all those substrait package versions line up

I think we should definitely have a build / release check that verifies that all of the substrait-* libraries are on the same version. Otherwise you start to run into weird scenarios like "The library handles protobufs at version X but extensions at version Y" which would be quite confusing IMO.

@vbarua
Copy link
Member Author

vbarua commented Jan 14, 2026

This looks like a lot of changes, but the vast majority of changes are just import path updates of

substrait.gen.proto

to

substrait

@vbarua vbarua force-pushed the vbarua/remove-protobuf-codegen branch from 1db7948 to c93ec8e Compare January 22, 2026 22:49
@vbarua
Copy link
Member Author

vbarua commented Jan 22, 2026

Following the merge of substrait-io/substrait-packaging#2, I was able to run the release workflow to publish substrait-protobuf v0.79.0 to TestPyPI. I've updated this PR to depend on that artifact, which allows us to check that it will integrate into substrait-python when published for real.

If these changes look reasonable, the next steps would be to:

  1. Update substrait-packaging to publish substrait-protobuf for real.
  2. Update this PR to depend on the real package.
  3. Review and merge this PR.

After that, I can work on extracting the other artifacts (antlr, extensions, etc) 1-by-1 and integrate them in the same way. The other option would be to extract everything first, and then do one mega-update. I have a preference for the piecemeal option, as I think it makes the reviews more tractable. While we're performing this migration, we'll have take a bit of care to avoid bumping the substrait submodule version without updating substrait-* libraries as well.

Let me know what you think of this plan @nielspardon @tokoko.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants