Skip to content

Split optional hf-hub loading feature#36

Merged
Pringled merged 4 commits intoMinishLab:mainfrom
jolestar:feature-split-hf-hub
Apr 1, 2026
Merged

Split optional hf-hub loading feature#36
Pringled merged 4 commits intoMinishLab:mainfrom
jolestar:feature-split-hf-hub

Conversation

@jolestar
Copy link
Copy Markdown
Contributor

@jolestar jolestar commented Apr 1, 2026

This keeps the default behavior unchanged while separating remote model downloads from the tokenizer backend features.

Changes:

  • add an explicit optional hf-hub feature and keep it in the default feature set
  • stop pulling tokenizers/http through onig and fancy-regex
  • gate Hugging Face Hub imports and remote loading behind hf-hub
  • return a clear error when from_pretrained() is given a non-local path without hf-hub enabled

Validation:

  • cargo +stable build --verbose
  • cargo +stable test --verbose
  • cargo +stable build --no-default-features --features fancy-regex --verbose
  • cargo +stable test --no-default-features --features fancy-regex --verbose
  • cargo +stable clippy --all-targets --all-features -- -D warnings
  • cargo +stable fmt --all -- --check
  • cargo +stable build --no-default-features --features onig --verbose

Copilot AI review requested due to automatic review settings April 1, 2026 09:03
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR splits Hugging Face Hub remote model downloading into an explicit hf-hub feature while keeping current default behavior intact, and removes unintended coupling that pulled HTTP support via tokenizer backend features.

Changes:

  • Add an explicit optional hf-hub feature (kept enabled by default) and make the hf-hub dependency optional.
  • Remove tokenizers/http from the onig and fancy-regex feature sets so HTTP isn’t implicitly enabled through those backends.
  • Gate hf_hub imports and remote from_pretrained() downloads behind the hf-hub feature, returning a clear error when disabled.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/model.rs Conditionalize hf_hub usage and remote loading; add a feature-gated error for non-local paths without hf-hub.
Cargo.toml Introduce hf-hub feature + make dependency optional; adjust feature sets to stop pulling tokenizers/http.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/model.rs Outdated
Comment on lines 385 to 388
#[cfg(feature = "hf-hub")]
if let Some(tok) = token {
env::set_var("HF_HUB_TOKEN", tok);
}
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

env::set_var("HF_HUB_TOKEN", ...) mutates the process-wide environment and leaves the token set after from_pretrained() returns. This can leak credentials to other parts of the process (or child processes) and can create surprising behavior/races if multiple threads load models with different tokens. Consider scoping this to the remote-loading branch and restoring the previous value after the hub downloads (or using an hf-hub API that accepts a token directly, if available).

Copilot uses AI. Check for mistakes.
Comment on lines +416 to +420
#[cfg(not(feature = "hf-hub"))]
{
return Err(anyhow!(
"remote model downloads require the `hf-hub` feature; pass a local model directory instead"
));
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new #[cfg(not(feature = "hf-hub"))] remote-path error is a user-facing behavior change, but there doesn't appear to be a test asserting this failure mode/message when building without hf-hub. Add an integration test that runs under --no-default-features and verifies from_pretrained("some/repo", ..) returns this error so the gating behavior stays stable.

Copilot uses AI. Check for mistakes.
@Pringled
Copy link
Copy Markdown
Member

Pringled commented Apr 1, 2026

Hey @jolestar, thanks for adding this! This is very useful, LGTM

@Pringled Pringled merged commit d6cdc4c into MinishLab:main Apr 1, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants