-
Notifications
You must be signed in to change notification settings - Fork 118
Granular updates for model serving endpoints #3988
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
8f26a22 to
fd244e3
Compare
|
Commit: dbdbdf5
9 failing tests:
Top 20 slowest tests (at least 2 minutes):
|
- Merged latest main branch which renamed DoUpdateWithChanges to DoUpdate - DoUpdate now has same signature with changes parameter - Updated all resource implementations to use DoUpdate - All model serving endpoint tests still passing
- Added 'type Changes = deployplan.Changes' alias in alert.go (same as main) - Updated all DoUpdate signatures to use *Changes instead of *deployplan.Changes - Removed unused deployplan imports where only the alias is used - Reduces diff with main branch
| trace update_file.py databricks.yml 'catalog_name: "first-inference-catalog"' 'catalog_name: "second-inference-catalog"' | ||
|
|
||
| trace $CLI bundle plan -o json > out.plan.$DATABRICKS_BUNDLE_ENGINE.json | ||
| trace $CLI bundle deploy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add 'trace $CLI bundle plan' after deploys to check that there is no drift.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests are local server only, because of the slow speed to spin up an endpoint. Would you prefer if we make one of these cloud or is there value in still having bundle plan anyways?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's valuable locally as well but more so on cloud.
Perhaps we can enable one of them as CloudSlow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah sounds good, can do for one.
| var err error | ||
|
|
||
| if r.hasFieldChange(changes, "tags") { | ||
| err = r.updateTags(ctx, id, config.Tags) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this updates no longer run in parallel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. It's not clear that these methods are safe to run in parallel so I'd rather be more conservative and do what terraform does.
Changes
This PR implements granular updates for model serving endpoints. Instead of always updating the entire endpoint configuration, the CLI now only sends updates for the specific fields that have changed. This are:
Why
Model serving endpoint updates are expensive operations. By sending only the changed fields, we reduce the scope of updates and improve deployment performance.
We also don't have guarentees that these API calls are safe to do in parallel. This matches the TF implementation: https://github.com/databricks/terraform-provider-databricks/blob/b0a2a1c6a1688498fd6a00c64003ef4948da21e8/serving/resource_model_serving.go#L366
Tests
Added comprehensive acceptance tests for various update scenarios:
Note: This PR depends on #3995 and should be merged after that PR is merged and this branch is rebased.
Note: These tests are local only because model serving endpoints take a long time (~30 minutes) to spin up and can be flaky. We can confirm though that the TF and DABs behavior matches.