Skip to content

feat: parse tokenizers version from go.mod and let all the flows and shell scripts use that version#313

Merged
Davidnet merged 3 commits intomainfrom
fix/tokenizers-version
Apr 9, 2026
Merged

feat: parse tokenizers version from go.mod and let all the flows and shell scripts use that version#313
Davidnet merged 3 commits intomainfrom
fix/tokenizers-version

Conversation

@Davidnet
Copy link
Copy Markdown
Member

@Davidnet Davidnet commented Apr 9, 2026

This pull request improves the way the tokenizers dependency version is determined and used across both CI workflows and local build scripts, ensuring consistency and reducing manual maintenance. The scripts now automatically extract the tokenizers version from go.mod rather than relying on hardcoded values, and error handling is added to catch missing or invalid versions.

Copilot AI review requested due to automatic review settings April 9, 2026 16:29
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR bumps the prebuilt tokenizers native library version from 1.23.0 to 1.26.0 in build scripts and release CI so it matches the github.com/daulet/tokenizers v1.26.0 version in go.mod.

Changes:

  • Updated TOKENIZERS_VERSION in Linux and macOS DMG build scripts to 1.26.0.
  • Updated GitHub Actions release workflow tokenizers cache keys to v1.26.0 to ensure correct cache invalidation/use.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
src/scripts/build_linux.sh Updates the tokenizers download version for Linux builds.
src/scripts/build_dmg.sh Updates the tokenizers download version for macOS DMG builds.
.github/workflows/release.yml Updates tokenizers cache keys to align CI caching with the bumped version.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

echo "-------------------------------------------------------"

TOKENIZERS_VERSION="1.23.0"
TOKENIZERS_VERSION="1.26.0"
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TOKENIZERS_VERSION is hardcoded here, which requires manual updates to stay in sync with go.mod and CI workflows. Consider allowing an environment override (e.g., TOKENIZERS_VERSION=${TOKENIZERS_VERSION:-...}) and/or deriving the version from go.mod to prevent future drift.

Copilot uses AI. Check for mistakes.
echo "--------------------------------------------------------"

TOKENIZERS_VERSION="1.23.0"
TOKENIZERS_VERSION="1.26.0"
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TOKENIZERS_VERSION is hardcoded in the script, so it can drift from go.mod/CI unless every place is updated together. Consider supporting an env override and/or sourcing the version from a single canonical place (e.g., parse go.mod) to reduce maintenance overhead.

Copilot uses AI. Check for mistakes.
Comment on lines 239 to 244
- name: Cache tokenizers library
uses: actions/cache@v5
with:
path: build/tokenizers/libtokenizers.a
key: ${{ runner.os }}-tokenizers-v1.23.0-prebuilt-v2
key: ${{ runner.os }}-tokenizers-v1.26.0-prebuilt-v1

Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above: the cache key hardcodes the tokenizers version in multiple places. Consider reusing a single env var for the version to keep cache keys consistent and avoid missing one occurrence in future bumps.

Copilot uses AI. Check for mistakes.
@Davidnet Davidnet changed the title misc: Bump tokenizers version to 1.26.0 so it matches go.mod fix: Bump tokenizers version to 1.26.0 so it matches go.mod Apr 9, 2026
@Davidnet Davidnet changed the title fix: Bump tokenizers version to 1.26.0 so it matches go.mod fix: bump tokenizers version to 1.26.0 so it matches go.mod Apr 9, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 35 to 38
TOKENIZERS_VERSION=$(grep 'github.com/daulet/tokenizers' "$PROJECT_ROOT/go.mod" | awk '{print $2}' | sed 's/^v//')
TOKENIZERS_PLATFORM="linux-amd64"
TOKENIZERS_FILE="libtokenizers.${TOKENIZERS_PLATFORM}.tar.gz"
TOKENIZERS_URL="https://github.com/daulet/tokenizers/releases/download/v${TOKENIZERS_VERSION}/${TOKENIZERS_FILE}"
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TOKENIZERS_VERSION is derived via a grep | awk | sed pipeline, but the script only uses set -e (no pipefail). If grep ever fails to match (or go.mod is missing), the pipeline can still exit 0 and set TOKENIZERS_VERSION to an empty string, leading to an invalid download URL and a harder-to-debug failure later. Consider enabling set -o pipefail (or set -euo pipefail) and/or explicitly validating that TOKENIZERS_VERSION is non-empty right after extraction (and failing with a clear error if it is empty).

Copilot uses AI. Check for mistakes.
Use a single awk to parse go.mod, add empty-value checks to fail
early, and enable set -euo pipefail in scripts and workflows
@Davidnet Davidnet requested a review from Copilot April 9, 2026 17:12
@Davidnet Davidnet requested a review from hanneshapke April 9, 2026 17:14
@Davidnet Davidnet changed the title fix: bump tokenizers version to 1.26.0 so it matches go.mod feat: parse tokenizers version from go.mod and let all the flows and shell scripts use that version Apr 9, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Collaborator

@hanneshapke hanneshapke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This the build working for you? I think there was a breaking change in a newer tokenizer but it might have gotten fixed. That is why I hard coded the version.

@Davidnet
Copy link
Copy Markdown
Member Author

Davidnet commented Apr 9, 2026

Yes it does, the breaking part for me was actually on the frontend which was merged in #314

@Davidnet Davidnet merged commit 82cff2a into main Apr 9, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants