-
Notifications
You must be signed in to change notification settings - Fork 17
Add minimal wasm and local-only feature support #37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -145,6 +145,36 @@ cargo build --release | |
| * **Batch Processing:** Encodes multiple sentences in batches. | ||
| * **Configurable Encoding:** Allows customization of maximum sequence length and batch size during encoding. | ||
|
|
||
| ### Feature flags | ||
|
|
||
| The crate exposes a few feature combinations for different runtimes: | ||
|
|
||
| * `default`: native build with `onig` tokenization and optional Hugging Face Hub downloads | ||
| * `fancy-regex`: alternative tokenizer backend for native builds | ||
| * `local-only`: disable remote model downloads and restrict loading to local paths or `from_bytes(...)` | ||
| * `wasm`: minimal WebAssembly-oriented feature set for in-memory loading via `from_bytes(...)` | ||
|
|
||
|
Comment on lines
+150
to
+156
|
||
| Typical invocations are: | ||
|
|
||
| * native local-only build: | ||
| `cargo build --no-default-features --features onig,local-only` | ||
| * wasm check: | ||
| `RUSTFLAGS='--cfg getrandom_backend="wasm_js"' cargo check --no-default-features --features wasm --target wasm32-unknown-unknown` | ||
|
|
||
| The `wasm` feature is intended for `wasm32-unknown-unknown` builds that load models | ||
| from in-memory bytes, for example after fetching assets over HTTP or embedding them | ||
| into the binary. Direct filesystem access is usually not available in browser-style | ||
| WebAssembly environments, so callers should pass file contents through `from_bytes(...)`. | ||
| Remote Hugging Face downloads are not available in this mode. | ||
|
|
||
| For `wasm32-unknown-unknown`, `getrandom` also requires a target-specific backend | ||
| configuration. The minimal check command is: | ||
|
|
||
| ```bash | ||
| RUSTFLAGS='--cfg getrandom_backend="wasm_js"' \ | ||
| cargo check --no-default-features --features wasm --target wasm32-unknown-unknown | ||
| ``` | ||
|
|
||
| ## What is Model2Vec? | ||
|
|
||
| Model2Vec is a technique to distill large sentence transformer models into highly efficient static embedding models. This process significantly reduces model size and computational requirements for inference. For a detailed understanding of how Model2Vec works, including the distillation process and model training, please refer to the [main Model2Vec Python repository](https://github.com/MinishLab/model2vec) and its [documentation](https://github.com/MinishLab/model2vec/blob/main/docs/what_is_model2vec.md). | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
local-onlyis currently a marker feature with no code-level effect: the remote-download path is gated only byfeature = "hf-hub"(seesrc/model.rs:380+). As a result, enablinglocal-onlyalongsidehf-hubstill allows remote model downloads, contradicting the README wording. Consider enforcing the restriction in code (e.g., disable/compile-error the remote branch whenlocal-onlyis enabled) and adding a test that verifies remotefrom_pretrainedfails underlocal-onlyeven ifhf-hubis on.