Skip to content

feat(ci): automate release process#3148

Open
milinddethe15 wants to merge 19 commits into
kubeflow:masterfrom
milinddethe15:feat/automate-release
Open

feat(ci): automate release process#3148
milinddethe15 wants to merge 19 commits into
kubeflow:masterfrom
milinddethe15:feat/automate-release

Conversation

@milinddethe15
Copy link
Copy Markdown
Member

@milinddethe15 milinddethe15 commented Jan 28, 2026

What this PR does / why we need it:
To release a newer version of trainer, user has to run make release VERSION=1.0.0 GITHUB_TOKEN=<token> and open PR with the generated commit.

  • Release PR check: validate semver, ensure tag doesn’t exist and verify manifests, chart version, and Python API version match VERSION.
  • Release workflow: create release branch/tag, build Python API dist, publish it to PyPI (requires PYPI_API_TOKEN secret in repo) and create a GitHub release using git-cliff-generated changelog.

This methods ensures release PR can be created by anyone and multiple maintainers can approve a release by LGTM on PR.

More detail in: #3148 (comment)

Which issue(s) this PR fixes (optional, in Fixes #<issue number>, #<issue number>, ... format, will close the issue(s) when PR gets merged):
Fixes #2155

Checklist:

  • Docs included if any changes are user facing

@google-oss-prow
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign electronic-waste for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coveralls
Copy link
Copy Markdown

coveralls commented Jan 28, 2026

Pull Request Test Coverage Report for Build 21821785755

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 2 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.8%) to 51.998%

Files with Coverage Reduction New Missed Lines %
pkg/runtime/framework/plugins/registry.go 2 0.0%
Totals Coverage Status
Change from base Build 21715897523: 0.8%
Covered Lines: 1288
Relevant Lines: 2477

💛 - Coveralls

@jaiakash
Copy link
Copy Markdown
Member

jaiakash commented Feb 9, 2026

/retest

Copy link
Copy Markdown
Contributor

@Krishna-kg732 Krishna-kg732 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious: the SDK release workflow uses OIDC trusted publishing for PyPI (no secrets needed), but this PR uses PYPI_API_TOKEN. Was there a specific reason for choosing the API token approach over trusted publishing? Just want to understand the tradeoff — both work fine for our release cadence, but OIDC avoids managing secrets

Comment thread .github/workflows/release.yaml Outdated
exit 1
fi

BRANCH="release-${VERSION}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Existing release branches use release-X.Y format (release-1.9, release-2.0, release-2.1), per issue #2155. This will create release-2.2.0 instead. Should be :

Suggested change
BRANCH="release-${VERSION}"
MAJOR_MINOR=$(echo "$VERSION" | cut -d. -f1,2)
BRANCH="release-${MAJOR_MINOR}"

Comment thread hack/release.sh
echo "Running make generate"
make -C "$REPO_ROOT" generate
echo "Completed make generate"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
sed -i "s/__version__ = \".*\"/__version__ = \"$NEW_VERSION\"/" "$PYTHON_API_VERSION_FILE"
echo "Updated Python API version to $NEW_VERSION"

$PYTHON_API_VERSION_FILE is git-added but never modified by the script. The init.py version won't be updated, and check-release.yaml will fail on the mismatch. Add before git add:

@milinddethe15
Copy link
Copy Markdown
Member Author

@Krishna-kg732 It's a draft PR and If you'd like to work on it, please feel free to take it over since I won’t be able to work on it this month.

@andreyvelich
Copy link
Copy Markdown
Member

@Krishna-kg732 Given that Trainer v2.2 release is coming, it would be great if you could finalize this work!
Feel free to open separate PR to automate release process.

@jaiakash
Copy link
Copy Markdown
Member

Hi @Krishna-kg732 feel free to take this up.

We do example change log generation with git-cliff on the https://github.com/kubeflow/sdk repo, Check this kubeflow/sdk#99

You can try replicating this.

let me know if need more help for this.

@milinddethe15
Copy link
Copy Markdown
Member Author

@Krishna-kg732 If you haven’t started yet, please wait until next week. I will try to work on it over the weekend.

For this PR, the only remaining task is testing a release on the forked repo.

@Krishna-kg732
Copy link
Copy Markdown
Contributor

@Krishna-kg732 If you haven’t started yet, please wait until next week. I will try to work on it over the weekend.

For this PR, the only remaining task is testing a release on the forked repo.

I’ve already implemented the release workflow and will be opening a separate PR shortly.
Please feel free to review it when you have time.

@jaiakash
Copy link
Copy Markdown
Member

Hi @Krishna-kg732 actually we need this feature. Already raised PR for that, can you help to review that please.
Check this #3231

@Krishna-kg732
Copy link
Copy Markdown
Contributor

Thanks Akash, I’ll take a look at #3231 shortly and review it.
If there’s overlap with the release workflow changes I’ve implemented, we can consolidate into a single approach.

@google-oss-prow google-oss-prow Bot added size/L and removed size/XL labels Feb 22, 2026
@google-oss-prow google-oss-prow Bot added size/XL and removed size/L labels Feb 22, 2026
@milinddethe15 milinddethe15 marked this pull request as ready for review February 22, 2026 20:32
@milinddethe15
Copy link
Copy Markdown
Member Author

I have tested this automation in my fork repo for release version v4.0.0

Steps:

  1. Created the release commit using make target:
make release VERSION=4.0.0 GITHUB_TOKEN=<token>
  1. Opened a PR with above commit on Master branch where check release action will match all the tags to version in VERSION file
image 3. Once the release PR is merged to master, [release](https://github.com/milinddethe15/kf-trainer/actions/runs/22284743190/job/64461002955) action will be triggered image where,
  • python_api build
  • branch creation (release-X.Y)
  • publish pypi package (for testing/verifying the upload, I have used my personal account)
  • Create tag, github release with changelog
  • trigger dockerimage build and publish & publish helm chart with appropriate tags (I haven't tested actual upload of images and chart but via github action logs, its confirmed that it fails only because of permission error, see below)

Chart:

image

Image:

image
  1. And finally the github release is published: https://github.com/milinddethe15/kf-trainer/releases/tag/v4.0.0

Also added release doc for users to understand release flow: https://github.com/milinddethe15/kf-trainer/blob/feat/automate-release/docs/release/README.md

@milinddethe15 milinddethe15 changed the title feat(release): Automated trainer release process feat(ci): automate release process Feb 22, 2026
…n checks

Signed-off-by: milinddethe15 <milinddethe15@gmail.com>
…eration

Signed-off-by: milinddethe15 <milinddethe15@gmail.com>
Signed-off-by: milinddethe15 <milinddethe15@gmail.com>
Signed-off-by: milinddethe15 <milinddethe15@gmail.com>
…kflow and upgrade git-cliff-action version

Signed-off-by: milinddethe15 <milinddethe15@gmail.com>
…ption

Signed-off-by: milinddethe15 <milinddethe15@gmail.com>
…tHub release

Signed-off-by: milinddethe15 <milinddethe15@gmail.com>
…nding and simplify release name

Signed-off-by: milinddethe15 <milinddethe15@gmail.com>
…eration and simplify workflow

Signed-off-by: milinddethe15 <milinddethe15@gmail.com>
…ine tagging process

Signed-off-by: milinddethe15 <milinddethe15@gmail.com>
…ration script

Signed-off-by: milinddethe15 <milinddethe15@gmail.com>
Signed-off-by: milinddethe15 <milinddethe15@gmail.com>
Signed-off-by: milinddethe15 <milinddethe15@gmail.com>
Signed-off-by: milinddethe15 <milinddethe15@gmail.com>
@milinddethe15 milinddethe15 force-pushed the feat/automate-release branch from c98aad3 to 5334b82 Compare March 3, 2026 05:03
Copilot AI review requested due to automatic review settings March 3, 2026 05:03
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces an automated release flow for Kubeflow Trainer driven by VERSION updates: a local make release target prepares a release PR, CI validates the release PR, and a post-merge workflow performs tagging/branching, PyPI publishing, and GitHub release creation.

Changes:

  • Add hack/release.sh + make release to generate a release commit (VERSION/manifests/chart/changelog) and run make generate.
  • Add CI workflows to validate release PRs (check-release.yaml) and to automate releases after merge (release.yaml), plus supporting workflow_dispatch triggers.
  • Replace the old changelog generation script with git-cliff configuration (cliff.toml) and update release documentation.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
hack/release.sh New release-prep script that bumps versions, updates manifests/chart, generates changelog, and commits.
Makefile Adds release target to invoke hack/release.sh.
docs/release/README.md Updates release documentation to the new PR-driven automated workflow.
docs/release/changelog.py Removes legacy PyGithub-based changelog generator.
cliff.toml Adds git-cliff config/template for changelog generation.
.github/workflows/check-release.yaml New PR-time validation for release consistency (VERSION/tag/manifests/chart/python).
.github/workflows/release.yaml New post-merge release automation (branch/tag, build+publish PyPI, GitHub release, dispatch image/chart publish).
.github/workflows/template-publish-image/action.yaml Adds support for tagging images correctly when invoked via workflow_dispatch on tags.
.github/workflows/build-and-push-images.yaml Allows manual dispatch publishing and updates publish gating logic.
.github/workflows/publish-helm-charts.yaml Adds manual dispatch and concurrency settings for release-driven dispatch.
.github/workflows/check-pr-title.yaml Adds area/release to ignored labels for PR title checks.

Comment on lines +32 to +38
VERSION=${RAW_VERSION#v}
if [[ ${VERSION} =~ ${{ env.SEMVER_PATTERN }} ]]; then
echo "Version '${RAW_VERSION}' matches semver pattern."
else
echo "Version '${RAW_VERSION}' does not match semver pattern."
exit 1
fi
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The semver validation strips a leading v and then matches a pattern that still allows an optional v, so an invalid VERSION like vv1.2.3 would incorrectly pass; validate RAW_VERSION against the pattern (as release.yaml does) or make the post-strip pattern disallow v.

Suggested change
VERSION=${RAW_VERSION#v}
if [[ ${VERSION} =~ ${{ env.SEMVER_PATTERN }} ]]; then
echo "Version '${RAW_VERSION}' matches semver pattern."
else
echo "Version '${RAW_VERSION}' does not match semver pattern."
exit 1
fi
if [[ ${RAW_VERSION} =~ ${{ env.SEMVER_PATTERN }} ]]; then
echo "Version '${RAW_VERSION}' matches semver pattern."
else
echo "Version '${RAW_VERSION}' does not match semver pattern."
exit 1
fi
VERSION=${RAW_VERSION#v}

Copilot uses AI. Check for mistakes.
Comment thread hack/release.sh
Comment on lines +23 to +27
if [ -z "$1" ]; then
echo "Usage: $0 <version>"
echo "You must follow this format: X.Y.Z or X.Y.Z-rc.N"
exit 1
fi
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With set -o nounset, referencing $1 when no args are passed will error before this usage check runs; use an argument count check (e.g., $# -lt 1) instead so the script prints the intended usage message.

Copilot uses AI. Check for mistakes.
Comment thread hack/release.sh
# Generate and prepend new changelog section
TEMP_FILE=$(mktemp)
docker run --rm -u "$(id -u):$(id -g)" -v "$ABSOLUTE_REPO_ROOT:/app" \
-e "GITHUB_TOKEN=$GITHUB_TOKEN" -w /app \
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GITHUB_TOKEN is optional per the warning, but the docker command expands $GITHUB_TOKEN under set -o nounset, which will abort when the variable is unset; pass it as ${GITHUB_TOKEN:-} or only include the -e flag when the token is present.

Suggested change
-e "GITHUB_TOKEN=$GITHUB_TOKEN" -w /app \
-e "GITHUB_TOKEN=${GITHUB_TOKEN:-}" -w /app \

Copilot uses AI. Check for mistakes.
Comment thread hack/release.sh
Comment on lines +58 to +60
# Update image tags in manifests
find "$MANIFESTS_DIR" -type f -name '*.yaml' -exec sed -i "s/newTag: .*/newTag: $TAG/" {} +
echo "Updated image tags in manifests to $TAG"
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script uses sed -i (also later for changelog insertion), which is GNU-sed-specific; other repo scripts (e.g., hack/python-api/gen-api.sh) branch on uname == Darwin to keep macOS support, so this should do the same or use a portable alternative.

Copilot uses AI. Check for mistakes.
Comment thread docs/release/README.md
1. Re-validates version and manifest tags.
2. Builds and validates Python package artifacts.
3. Publishes the package to PyPI (`kubeflow-trainer-api`).
4. Creates release branch `release-<version-without-v>` if it does not exist.
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workflow creates branches named release-<major>.<minor> (e.g., release-2.1), but this doc says release-<version-without-v> which reads like release-2.1.0; update the wording to match the actual branch naming logic.

Suggested change
4. Creates release branch `release-<version-without-v>` if it does not exist.
4. Creates release branch `release-<major>.<minor>` (for example, `release-2.1`) if it does not exist.

Copilot uses AI. Check for mistakes.
Comment thread cliff.toml
Comment on lines +76 to +78
# Only stable release tags
tag_pattern = "^v?[0-9]+\\.[0-9]+\\.[0-9]+$"
ignore_tags = ".*-(alpha|beta|rc).*"
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tag_pattern + ignore_tags currently exclude -rc.* tags, so generating changelogs for successive RCs will likely diff against the last stable tag (and repeat entries) instead of the previous RC; include RC tags in tag discovery (or use a separate RC config) so RC-to-RC changelogs are incremental.

Suggested change
# Only stable release tags
tag_pattern = "^v?[0-9]+\\.[0-9]+\\.[0-9]+$"
ignore_tags = ".*-(alpha|beta|rc).*"
# Stable and RC release tags (ignore alpha/beta)
tag_pattern = "^v?[0-9]+\\.[0-9]+\\.[0-9]+(-[0-9A-Za-z.]+)?$"
ignore_tags = ".*-(alpha|beta).*"

Copilot uses AI. Check for mistakes.
Comment thread hack/release.sh
Comment on lines +106 to +108
docker run --rm -u "$(id -u):$(id -g)" -v "$ABSOLUTE_REPO_ROOT:/app" \
-e "GITHUB_TOKEN=$GITHUB_TOKEN" -w /app \
"ghcr.io/orhun/git-cliff/git-cliff:latest" --unreleased --tag "$TAG" -o - > "$TEMP_FILE"
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docker run invocation uses the third-party image ghcr.io/orhun/git-cliff/git-cliff:latest in the release script with access to the repository workspace and GITHUB_TOKEN, but the image is only pinned to the mutable latest tag. If this external image is ever compromised or replaced, an attacker controlling it can exfiltrate GITHUB_TOKEN and tamper with release artifacts or tags when maintainers run the release tooling. Prefer pinning this dependency to an immutable reference (e.g., a specific version tag plus digest) or hosting a vetted image/binary under the Kubeflow project to reduce supply chain compromise risk.

Copilot uses AI. Check for mistakes.
@andreyvelich
Copy link
Copy Markdown
Member

@milinddethe15 @Krishna-kg732 Please can we finalize this PR to automate Trainer releases?

@Krishna-kg732
Copy link
Copy Markdown
Contributor

@milinddethe15 @Krishna-kg732 Please can we finalize this PR to automate Trainer releases?

Hey @andreyvelich , apologies for the late reply, I was busy with uni tests previous weeks

I'll get these addressed and set up a test release to validate the full flow. If it's faster to pick up @milinddethe15's PR instead, I'm totally fine with that too — just let me know how we'd like to proceed.

@andreyvelich
Copy link
Copy Markdown
Member

I'll get these addressed and set up a test release to validate the full flow. If it's faster to pick up @milinddethe15's PR instead, I'm totally fine with that too — just let me know how we'd like to proceed

If you can commit to the @milinddethe15 branch directly, that might be easier to move forward.

@milinddethe15
Copy link
Copy Markdown
Member Author

This PR was ready for review as I remember.

  • Rebasing it to master and testing it e2e is pending.

Krishna let me know if you want to help, else I am happy to continue on this.

@Krishna-kg732
Copy link
Copy Markdown
Contributor

This PR was ready for review as I remember.

  • Rebasing it to master and testing it e2e is pending.

Krishna let me know if you want to help, else I am happy to continue on this.

yup sounds good , lets continue with this PR.

@andreyvelich
Copy link
Copy Markdown
Member

@milinddethe15 @Krishna-kg732 Is this PR ready?

@milinddethe15
Copy link
Copy Markdown
Member Author

@andreyvelich I will once test entire release flow. Will update you next week.

@Krishna-kg732
Copy link
Copy Markdown
Contributor

Krishna-kg732 commented Apr 30, 2026

Hey @milinddethe15 this looks great , could you please link test release here so we can move ahead with this PR

@andreyvelich
Copy link
Copy Markdown
Member

@Krishna-kg732 @milinddethe15 Did you test the release in your local branch?

We would like to release 2.1.1 with a hot fix soon: #3489, and having automation would be nice to test.
cc @tenzen-y @mimowo @kaisoz

@milinddethe15
Copy link
Copy Markdown
Member Author

I need to test it against the latest master branch. I’ll try to do that next week.

@andreyvelich
Copy link
Copy Markdown
Member

I need to test it against the latest master branch. I’ll try to do that next week.

Sure, sounds good! That PR has been open for quite some time, so if @Krishna-kg732 could help you to test it, that would be great!

@Krishna-kg732
Copy link
Copy Markdown
Contributor

Hey @andreyvelich , Yup i will help with the test release for this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve Kubeflow Trainer release process

6 participants