Split optional hf-hub loading feature#36
Conversation
There was a problem hiding this comment.
Pull request overview
This PR splits Hugging Face Hub remote model downloading into an explicit hf-hub feature while keeping current default behavior intact, and removes unintended coupling that pulled HTTP support via tokenizer backend features.
Changes:
- Add an explicit optional
hf-hubfeature (kept enabled by default) and make thehf-hubdependency optional. - Remove
tokenizers/httpfrom theonigandfancy-regexfeature sets so HTTP isn’t implicitly enabled through those backends. - Gate
hf_hubimports and remotefrom_pretrained()downloads behind thehf-hubfeature, returning a clear error when disabled.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/model.rs |
Conditionalize hf_hub usage and remote loading; add a feature-gated error for non-local paths without hf-hub. |
Cargo.toml |
Introduce hf-hub feature + make dependency optional; adjust feature sets to stop pulling tokenizers/http. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/model.rs
Outdated
| #[cfg(feature = "hf-hub")] | ||
| if let Some(tok) = token { | ||
| env::set_var("HF_HUB_TOKEN", tok); | ||
| } |
There was a problem hiding this comment.
env::set_var("HF_HUB_TOKEN", ...) mutates the process-wide environment and leaves the token set after from_pretrained() returns. This can leak credentials to other parts of the process (or child processes) and can create surprising behavior/races if multiple threads load models with different tokens. Consider scoping this to the remote-loading branch and restoring the previous value after the hub downloads (or using an hf-hub API that accepts a token directly, if available).
| #[cfg(not(feature = "hf-hub"))] | ||
| { | ||
| return Err(anyhow!( | ||
| "remote model downloads require the `hf-hub` feature; pass a local model directory instead" | ||
| )); |
There was a problem hiding this comment.
The new #[cfg(not(feature = "hf-hub"))] remote-path error is a user-facing behavior change, but there doesn't appear to be a test asserting this failure mode/message when building without hf-hub. Add an integration test that runs under --no-default-features and verifies from_pretrained("some/repo", ..) returns this error so the gating behavior stays stable.
|
Hey @jolestar, thanks for adding this! This is very useful, LGTM |
This keeps the default behavior unchanged while separating remote model downloads from the tokenizer backend features.
Changes:
hf-hubfeature and keep it in the default feature settokenizers/httpthroughonigandfancy-regexhf-hubfrom_pretrained()is given a non-local path withouthf-hubenabledValidation: