Skip to content

Conversation

@timsaucer
Copy link
Member

@timsaucer timsaucer commented Dec 11, 2025

Which issue does this PR close?

Addresses part of #18671 but does not close it.

Rationale for this change

This is the major change to address the requirements of #18671. This PR combines all of the previous PRs in the issue and uses them in FFI_TableProvider and FFI_ExecutionPlan. With this change the only remaining thing to close the issue is to remove the core crate. That is a large PR that mostly just changes import paths and will be a follow up.

What changes are included in this PR?

  • Update all structs in the FFI crate to use the FFI_PhysicalExpr, FFI_Session, FFI_TaskContext, and FFI_LogicalExtensionCodec.
  • Remove creation of SessionContext within the FFI crate
  • Updates unit tests

Are these changes tested?

Unit tests are added. Coverage report:
Screenshot 2025-12-11 at 10 42 21 AM

Are there any user-facing changes?

Yes.

There is one major change to using the FFI crate that downstream users will need to implement. Now when creating a table provider, catalog provider, etc you need to provide a TaskContextProvider and an optional LogicalExtensionCodec. The upgrade guide has been updated.

@timsaucer
Copy link
Member Author

@paleolimbot @comphead @milenkovicm @renato2099 Almost done with the FFI epic! There is only one PR after these and it's just an import cleanup job. I'm leaving this in draft until I write up the upgrade guide, but the code here is I believe in decent shape. This pulls together all of the previous work.

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Dec 12, 2025
@timsaucer timsaucer added the api change Changes the API exposed to users of the crate label Dec 12, 2025
Comment on lines -198 to -200
let default_ctx = SessionContext::new();
let codec = DefaultLogicalExtensionCodec {};

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the main thrust of this work. In this place and others we no longer rely on creating a default SessionContext and codec.

Comment on lines -96 to -103
let codec = DefaultPhysicalExtensionCodec {};
let partitioning_data = rresult_return!(serialize_partitioning(
properties.inner().output_partitioning(),
&codec
));
let output_partitioning = partitioning_data.encode_to_vec();

ROk(output_partitioning.into())
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is one of the big wins on the physical side - no longer using the protobuf serialization.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cool, although my reading of this means that any ExecutionPlan passed via FFI will not benefit from any physical optimization in the same way it would have if serialized/unserialized via protobuf. It seems like these are mostly for TableProviders where the plan being passed around would normally not participate in that anyway but I thought I would point that out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for noting that @paleolimbot ! that was not obvious to me

Copy link
Member Author

@timsaucer timsaucer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Address comments from in person review

supports_filters_pushdown_internal(provider.inner(), &filters_serialized)
.map_err(|e| e.to_string().into())
.into()
) -> RResult<RVec<FFI_TableProviderFilterPushDown>, RString> {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switch to FFI_Result

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need updating?

.with_config(config)
.build();
let ctx = SessionContext::new_with_state(session);
let foreign_session = rresult_return!(ForeignSession::try_from(&session));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ForeignSession is not cheap to create, so use a different work around to avoid creating if not neccessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could possibly push this type of optimization into the ForeignSession by only doing expensive initialization if one of the methods is actually called using a OnceLock or something. If I'm remembering that PR correctly, the main issue there is that some of the session trait items don't return a Result so there's no opportunity for failure.

.with_config(config)
.build();
let ctx = SessionContext::new_with_state(session);
let foreign_session = rresult_return!(ForeignSession::try_from(&session));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as other comment

Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such a satisfying amount of hack removal! This looks reasonable to me.

It seems like we are leaning on existing tests to cover the functionality here...I imagine it is tricky to test the situation where the structs you've listed in the upgrade guide are actually provided by a separate build and are actually passing user-defined functions that aren't in the default SessionContext as filters (perhaps datafusion-python is where those high level tests will live?)

Comment on lines -96 to -103
let codec = DefaultPhysicalExtensionCodec {};
let partitioning_data = rresult_return!(serialize_partitioning(
properties.inner().output_partitioning(),
&codec
));
let output_partitioning = partitioning_data.encode_to_vec();

ROk(output_partitioning.into())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is cool, although my reading of this means that any ExecutionPlan passed via FFI will not benefit from any physical optimization in the same way it would have if serialized/unserialized via protobuf. It seems like these are mostly for TableProviders where the plan being passed around would normally not participate in that anyway but I thought I would point that out.

supports_filters_pushdown_internal(provider.inner(), &filters_serialized)
.map_err(|e| e.to_string().into())
.into()
) -> RResult<RVec<FFI_TableProviderFilterPushDown>, RString> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need updating?

.with_config(config)
.build();
let ctx = SessionContext::new_with_state(session);
let foreign_session = rresult_return!(ForeignSession::try_from(&session));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could possibly push this type of optimization into the ForeignSession by only doing expensive initialization if one of the methods is actually called using a OnceLock or something. If I'm remembering that PR correctly, the main issue there is that some of the session trait items don't return a Result so there's no opportunity for failure.

Comment on lines -96 to -103
let codec = DefaultPhysicalExtensionCodec {};
let partitioning_data = rresult_return!(serialize_partitioning(
properties.inner().output_partitioning(),
&codec
));
let output_partitioning = partitioning_data.encode_to_vec();

ROk(output_partitioning.into())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for noting that @paleolimbot ! that was not obvious to me

// TODO Extend FFI to get the registry and codex
let default_ctx = SessionContext::new();
let task_context = default_ctx.task_ctx();
let codex = DefaultPhysicalExtensionCodec {};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just for my own learning, before we were setting a specific codex here, do we need to worry about it? should we be using the one that is coming with from ffi ?

@timsaucer timsaucer marked this pull request as ready for review December 16, 2025 19:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api change Changes the API exposed to users of the crate documentation Improvements or additions to documentation ffi Changes to the ffi crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants