Conversation
There was a problem hiding this comment.
Pull request overview
This PR bumps the prebuilt tokenizers native library version from 1.23.0 to 1.26.0 in build scripts and release CI so it matches the github.com/daulet/tokenizers v1.26.0 version in go.mod.
Changes:
- Updated
TOKENIZERS_VERSIONin Linux and macOS DMG build scripts to1.26.0. - Updated GitHub Actions release workflow tokenizers cache keys to
v1.26.0to ensure correct cache invalidation/use.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
src/scripts/build_linux.sh |
Updates the tokenizers download version for Linux builds. |
src/scripts/build_dmg.sh |
Updates the tokenizers download version for macOS DMG builds. |
.github/workflows/release.yml |
Updates tokenizers cache keys to align CI caching with the bumped version. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/scripts/build_linux.sh
Outdated
| echo "-------------------------------------------------------" | ||
|
|
||
| TOKENIZERS_VERSION="1.23.0" | ||
| TOKENIZERS_VERSION="1.26.0" |
There was a problem hiding this comment.
TOKENIZERS_VERSION is hardcoded here, which requires manual updates to stay in sync with go.mod and CI workflows. Consider allowing an environment override (e.g., TOKENIZERS_VERSION=${TOKENIZERS_VERSION:-...}) and/or deriving the version from go.mod to prevent future drift.
src/scripts/build_dmg.sh
Outdated
| echo "--------------------------------------------------------" | ||
|
|
||
| TOKENIZERS_VERSION="1.23.0" | ||
| TOKENIZERS_VERSION="1.26.0" |
There was a problem hiding this comment.
TOKENIZERS_VERSION is hardcoded in the script, so it can drift from go.mod/CI unless every place is updated together. Consider supporting an env override and/or sourcing the version from a single canonical place (e.g., parse go.mod) to reduce maintenance overhead.
| - name: Cache tokenizers library | ||
| uses: actions/cache@v5 | ||
| with: | ||
| path: build/tokenizers/libtokenizers.a | ||
| key: ${{ runner.os }}-tokenizers-v1.23.0-prebuilt-v2 | ||
| key: ${{ runner.os }}-tokenizers-v1.26.0-prebuilt-v1 | ||
|
|
There was a problem hiding this comment.
Same as above: the cache key hardcodes the tokenizers version in multiple places. Consider reusing a single env var for the version to keep cache keys consistent and avoid missing one occurrence in future bumps.
a82278f to
e89e3bf
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/scripts/build_linux.sh
Outdated
| TOKENIZERS_VERSION=$(grep 'github.com/daulet/tokenizers' "$PROJECT_ROOT/go.mod" | awk '{print $2}' | sed 's/^v//') | ||
| TOKENIZERS_PLATFORM="linux-amd64" | ||
| TOKENIZERS_FILE="libtokenizers.${TOKENIZERS_PLATFORM}.tar.gz" | ||
| TOKENIZERS_URL="https://github.com/daulet/tokenizers/releases/download/v${TOKENIZERS_VERSION}/${TOKENIZERS_FILE}" |
There was a problem hiding this comment.
TOKENIZERS_VERSION is derived via a grep | awk | sed pipeline, but the script only uses set -e (no pipefail). If grep ever fails to match (or go.mod is missing), the pipeline can still exit 0 and set TOKENIZERS_VERSION to an empty string, leading to an invalid download URL and a harder-to-debug failure later. Consider enabling set -o pipefail (or set -euo pipefail) and/or explicitly validating that TOKENIZERS_VERSION is non-empty right after extraction (and failing with a clear error if it is empty).
Use a single awk to parse go.mod, add empty-value checks to fail early, and enable set -euo pipefail in scripts and workflows
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
hanneshapke
left a comment
There was a problem hiding this comment.
This the build working for you? I think there was a breaking change in a newer tokenizer but it might have gotten fixed. That is why I hard coded the version.
|
Yes it does, the breaking part for me was actually on the frontend which was merged in #314 |
This pull request improves the way the
tokenizersdependency version is determined and used across both CI workflows and local build scripts, ensuring consistency and reducing manual maintenance. The scripts now automatically extract thetokenizersversion fromgo.modrather than relying on hardcoded values, and error handling is added to catch missing or invalid versions.