WebGPU + ONNX provider: Qwen 3 0.6B in-browser (7.4.0) by esokullu · Pull Request #66 · webbrain-one/webbrain

esokullu · 2026-05-22T12:55:51Z

Summary

Adds a new `webgpu` provider type — a fourth "local" provider that runs Qwen 3 0.6B entirely in the browser via WebGPU + ONNX (`@huggingface/transformers`). Unlike llama.cpp / Ollama / LM Studio, it needs nothing installed: model weights download from HuggingFace on first use (~500MB q4), cached in IndexedDB, inference runs in the extension's offscreen document on the user's GPU.

Version bump 7.3.1 → 7.4.0.

Why

The other "local" providers all require the user to install + run a separate server. That's a real onboarding cliff. WebGPU + ONNX gives us a zero-install local option — useful for trying out webbrain without committing to a heavier setup, and as a privacy-preserving fallback when a user just wants to ask quick questions about a page without their data hitting any third party.

Architecture

```
service worker (background)
└─ WebGPUProvider.chat() ──→ chrome.runtime.sendMessage
│
offscreen document ←──────────────────────┘
└─ @huggingface/transformers
pipeline('text-generation', 'onnx-community/Qwen3-0.6B-ONNX',
{ device: 'webgpu', dtype: 'q4' })
```

Service workers have no WebGPU; the offscreen document does. We reuse the existing offscreen doc (already hosting the local-network fetch proxy) and add two new message handlers: `webgpu-chat` and `webgpu-probe`.
Pipeline is loaded lazily on the first chat call and cached for the offscreen doc's lifetime. Previous pipeline is disposed before loading a new model to avoid OOM on integrated GPUs.

Tool use

Enabled. Qwen 3's chat template knows how to render `tools=[...]` into the system prompt; the model emits `<tool_call>{...}</tool_call>` blocks. Offscreen.js parses them back into OpenAI-format `tool_calls` so webbrain's loop detector + dispatch treat WebGPU exactly like any other provider.

Reliability at 0.6B is mixed — this is small-model territory. A follow-up UI hint should nudge users toward Ask mode (similar to how we handle the existing small-model warnings).

Streaming

Deferred. v1 returns the full response in one shot. The 0.6B model finishes a normal turn in seconds, so this is acceptable; the background↔offscreen chunked-message router is the kind of plumbing that wants its own PR. A comment in `chatStream()` marks the upgrade target.

Library vendoring

`@huggingface/transformers` is ~5MB JS + ~30MB ONNX-runtime-web WASM. Too big to commit. `src/chrome/vendor/transformers/README.md` documents the one-command vendoring flow:

```bash
npm install @huggingface/transformers
cp node_modules/@huggingface/transformers/dist/transformers.min.js
src/chrome/vendor/transformers/

+ matching ort-wasm-simd-threaded.* files

```

The provider fails fast with a clear "library not vendored" message if the file is missing, so the failure mode is obvious to anyone testing.

`.gitignore` excludes `.js`/`.wasm`/`*.mjs` inside the vendor dirs so an accidental `git add .` doesn't commit 30MB of WASM. The README stays tracked.

Reviewer notes

Default `enabled: false` because the first-run download is ~500MB. We don't want to auto-burn bandwidth on extension install.
Firefox: stub. Firefox has no `browser.offscreen` and its extension-context WebGPU exposure is its own can of worms (gated, prefs-only on release at the time of writing). Stub fails fast; config stays so the categorization parity test passes. Real Firefox implementation is its own future PR.
CSP unchanged. Existing `script-src 'self' 'wasm-unsafe-eval'; connect-src *` already allows everything the library needs: same-origin imports, WASM eval, and HuggingFace Hub fetches.
Manifest unchanged. Vendor dir is same-origin with the offscreen doc, no `web_accessible_resources` needed.
Tool-call parsing is text-format, not structured JSON-output. Qwen-3 small models can produce slightly malformed blocks; `extractToolCalls()` JSON.parse'es each block in a try/catch and silently drops the malformed ones rather than crashing the turn.

Test plan

Tests pass (`node test/run.js` → 130/130, 4 new for the webgpu provider).
Open Chrome → Settings → Providers. Filter to "Local" → webgpu_qwen3 card visible alongside llama.cpp / Ollama / LM Studio.
Vendor `@huggingface/transformers` per the README (one-time setup).
Click webgpu card → "Test Connection" → reports library version + WebGPU availability without downloading model weights.
Set webgpu_qwen3 active → ask a simple Ask-mode question on any page. First run downloads model (~500MB, ~minute on a fast connection); subsequent runs are instant.
Switch to Act mode → try a simple click task. Verify tool-call parsing produces a sensible `click_ax` or similar.
Switch active provider away from webgpu → memory freed (pipeline disposed on next webgpu chat for a different model, or after page close).
Firefox: install Firefox build → webgpu_qwen3 card visible, "Test Connection" reports "not yet supported on Firefox".

🤖 Generated with Claude Code

Adds a fourth "local" provider alongside llama.cpp / Ollama / LM Studio. Unlike those, this one needs nothing installed — model weights download from HuggingFace on first use (~500MB for q4 Qwen 3 0.6B), cached in IndexedDB by @huggingface/transformers, inference runs on the user's GPU in the extension's offscreen document. Architecture: service worker └─ WebGPUProvider.chat() ──┐ ▼ chrome.runtime.sendMessage offscreen document └─ @huggingface/transformers pipeline('text-generation', 'onnx-community/Qwen3-0.6B-ONNX', { device: 'webgpu', dtype: 'q4' }) Service workers have no WebGPU; the offscreen document does. We reuse the existing offscreen doc (already hosting the local-network fetch proxy) and add new message handlers `webgpu-chat` and `webgpu-probe`. Tool use: enabled. Qwen 3's chat template renders `tools=[...]` into the system prompt and the model emits `<tool_call>{...}</tool_call>` blocks; offscreen.js parses them into OpenAI-format tool_calls so the agent's loop detector / dispatch see WebGPU exactly like any other provider. Reliability at 0.6B is mixed — the settings card will nudge users toward Ask mode in a follow-up. Streaming: v1 returns the full response (no per-token streaming yet). The 0.6B model finishes a normal turn in seconds; the round-trip-and- yield simplification let us ship the provider without first solving the background↔offscreen chunked-message router. Comment in webgpu.js's chatStream() flags the upgrade target. Default-disabled (`enabled:false`) because the first-run download is substantial and the library has to be vendored locally — see src/chrome/vendor/transformers/README.md for the one-command vendoring flow. The provider returns a clear "library not vendored" error when the file is missing, so the failure mode is obvious. Firefox: stub that fails fast with "not yet supported on Firefox". Firefox doesn't have browser.offscreen and its extension-context WebGPU exposure is its own can of worms — wiring those is its own future PR. Stub stays so the categorization parity test stays green. Tests: 4 new (130 total, all passing). webgpu provider present + local + disabled by default; no network fields (truly in-browser); _create- Provider wires the right class; chrome/firefox provider sets stay in sync. The actual chat() path can't be exercised in Node — no chrome. offscreen, no WebGPU — but the wiring + classification do. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

vercel · 2026-05-22T12:55:57Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
webbrain	Ready	Preview, Comment	May 26, 2026 3:20pm

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c7ed0192e5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-22T12:59:30Z

+      type: result.toolCalls ? 'tool_call' : 'text',
+      content: result.content,
+      toolCalls: result.toolCalls,


Emit tool calls in streaming chunk content

When chatStream() returns a tool call, this chunk sets content to result.content (text) instead of the tool-call array. The streaming agent path (processMessageStream) iterates chunk.content as tool-call deltas, so WebGPU tool calls are dropped/misparsed and Act-mode tool execution fails whenever the chat_stream workflow is used.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-22T12:59:30Z

+}
+
+async function getPipeline(modelId, dtype, device) {
+  if (_activePipeline && _activeModelId === modelId) return _activePipeline;


Rebuild cached pipeline when dtype or device changes

The pipeline cache key only checks modelId, but users can edit dtype and device in provider settings. After one successful load, changing quantization/backend (for example q4→q8 or webgpu→wasm) will silently keep using the old pipeline, so configuration changes do not take effect until the offscreen document is recreated.

Useful? React with 👍 / 👎.

The first time a user picks the WebGPU provider, ~500MB of Qwen 3 weights pull from HF Hub — a ~30-60s wait the existing UI doesn't hint at at all. Renders as a frozen "thinking…" spinner, which is indistinguishable from a hang. Add a progress card at the top of the messages container: ┌──────────────────────────────────────────────────┐ │ Downloading onnx-community/Qwen3-0.6B-ONNX — │ │ 142 / 487 MB │ │ █████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │ │ onnx/model_q4.onnx │ └──────────────────────────────────────────────────┘ - Aggregates loaded/total across files (model has ~8 parallel downloads — weights, tokenizer, config, etc.). - Bar fills to 100% + flips green on the 'ready' event, then auto-dismisses ~1.8s later so the user sees confirmation. - Throttled to one progress update per file per 200ms so the message channel doesn't drown in callbacks. - Fire-and-forget broadcast from offscreen → sidepanel (.catch swallows "no listener" errors when no panel is open). Firefox side has the same listener + renderer for parity, even though the Firefox WebGPU provider itself is still stubbed — once the Firefox path is wired up, the progress UI is ready. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Commits the three files the WebGPU provider needs: src/chrome/vendor/transformers/ ├─ transformers.web.min.js (422 KB — browser ESM bundle) ├─ ort-wasm-simd-threaded.jsep.mjs ( 46 KB — WASM loader shim) └─ ort-wasm-simd-threaded.jsep.wasm ( 25 MB — WebGPU ONNX runtime) Yes, the .wasm is 25MB. It's the cost of shipping a real local LLM runtime — there's no smaller variant that does WebGPU. The trade-off: the extension grows from ~2MB to ~28MB on disk, in exchange for a provider that works straight from `git clone` with zero per-developer setup. Previous behaviour was "vendor the library yourself per the README" which is realistic for a 1-person team and friction for any larger group. Implementation details: - We vendor the .web.min.js variant (not the dual ESM/CJS transformers.min.js or the Node builds). Smaller, browser-only, matches our actual import path. - env.backends.onnx.wasm.wasmPaths is pinned to the vendor dir's chrome-extension:// URL. Without this the loader resolves the WASM path relative to transformers.web.min.js's URL — which happens to work today because they're siblings, but only by accident. Setting it explicitly makes the wiring obvious and survives future re-vendoring at different paths. Wrapped in try/catch so library shape changes between versions fall back to default resolution. - The CPU-fallback WASM variants (.wasm / .asyncify.wasm / .jspi.wasm) are intentionally NOT vendored — system without WebGPU gets a clear "WebGPU not available" error instead. Saves ~40MB of WASM we don't use. Add them later if CPU fallback becomes a real ask. - Firefox vendor dir stays empty (gitignored) — the Firefox WebGPU provider is still a stub; no point shipping 25MB of WASM it doesn't reach. Comment in .gitignore flags this for whoever wires the Firefox path next. - package.json now lists @huggingface/transformers as a regular dep (not devDep) — semantically wrong for an ESM file we commit, but useful: `npm install` keeps the version pinned for whoever needs to update the vendored files later. The README documents the update flow. The README in the vendor dir reflects the new "it's checked in" reality and explains the update procedure for next time. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

specifier) Real bug shipped with the previous "vendor the library" commit: transformers.web.min.js contains a dynamic `import("onnxruntime-web/webgpu")` — a bare module specifier. The browser can't resolve bare specifiers without a build step or an import map, and the WebGPU provider failed on first chat with: Failed to resolve module specifier "onnxruntime-web/webgpu". Relative references must start with either "/", "./", or "../". Two-line fix: 1. Vendor onnxruntime-web/dist/ort.webgpu.bundle.min.mjs (111KB, fully self-bundled — no further bare imports inside it). 2. Rewrite the bare specifier in our vendored transformers.web.min.js to "./ort.webgpu.bundle.min.mjs" so it resolves as a relative URL against the patched file's own location. One sed replace, verified the count goes 1→0. Why not an import map: MV3's CSP `script-src 'self'` can block inline `<script type="importmap">` on some Chrome versions. Patching the specifier sidesteps the CSP question entirely. The webgpu bundle is self-contained (the bundled variant inlines all ONNX-runtime dependencies it needs at WebGPU-init time), so no external WASM fetch happens during normal WebGPU inference. The existing jsep.wasm + jsep.mjs files stay vendored as a defensive fallback path in case env.backends.onnx.wasm.wasmPaths ever gets hit at runtime — they're never loaded for WebGPU, but cost nothing since they're already there. Vendor README updated with the sed step + verification command so re-vendoring a future library version doesn't reintroduce the bug. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two fixes folded into one: 1. CHROME WEB STORE / AMO compatibility. The previous commit vendored .min.js builds; both stores want readable source for review and can reject or stall reviews of minified blobs. Switch to the unminified variants: transformers.web.min.js → transformers.web.js (~422K → 1.1M) ort.webgpu.bundle.min.mjs → ort.webgpu.mjs (~111K → 662K) Total vendor dir grows from ~26MB to ~27MB. Negligible at runtime (the JS still parses in microseconds), worth a lot for review process. The 25MB WASM stays where it was — it's already not-text-readable by nature. 2. THE BARE-SPECIFIER FIX, BUT AGAINST THE RIGHT FILE. The previous commit sed-patched transformers.web.MIN.js — but offscreen.js actually loads transformers.web.js after this commit. The minified sibling never loaded, so the fix never ran. Reported as "still the same error" by the user. In the unminified .web.js the bare import is a STATIC import (not the dynamic form the minifier emits): import * as ONNX_WEB from "onnxruntime-web/webgpu"; // line 7547 sed -i 's|"onnxruntime-web/webgpu"|"./ort.webgpu.mjs"|' \ src/chrome/vendor/transformers/transformers.web.js One occurrence, replaced, verified count goes 1→0 with grep. Why not the "bundle" variant of onnxruntime-web/webgpu (the .bundle .min.mjs that inlines everything)? It's only available minified. The plain ort.webgpu.mjs is unminified and has no bare imports of its own (only Node-specific `node:fs` / `node:os` requires that never fire in browsers). Vendor README updated end-to-end: - "What's here" table reflects the new file names + sizes - Adds a "Why unminified" callout pointing at store policy - Update procedure has the new cp + sed lines - "Files NOT vendored" explains why we skip the .bundle variants Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

User reported same error message but a different specifier: "Failed to resolve module specifier 'onnxruntime-common'". Root cause: transformers.web.js has TWO bare specifiers, not one. The first fix (`onnxruntime-web/webgpu` → relative path) resolved one, but at line 7605 there's a second: import { Tensor } from "onnxruntime-common"; onnxruntime-common is a separate npm package providing Tensor + session types. It's a transitive dep of @huggingface/transformers (via onnxruntime-web). Fix: wholesale-vendor its ESM tree, sed-patch the import. - Copy node_modules/onnxruntime-common/dist/esm/*.js (21 small files, ~85KB total) into vendor/transformers/onnxruntime-common/. The ESM tree is self-contained — all inter-file imports are already relative, no further patches needed. - sed: "onnxruntime-common" → "./onnxruntime-common/index.js" in transformers.web.js. One occurrence, replaced, verified. Also added a defensive whole-tree bare-specifier sweep to the vendor README's verification step — catches future versions that introduce a THIRD bare import without needing a debug-runtime round-trip. The remaining "@huggingface/transformers" hit at line ~10667 is a JSDoc example string inside a comment block, not a real import. README documents this so future maintainers don't get spooked. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ONNX Runtime Web dynamically picks a WASM variant at load time. For Qwen 3 on WebGPU, ops that can't run on the GPU fall back to CPU, which needs ort-wasm-simd-threaded.asyncify.{mjs,wasm} — without this pair the runtime errors with "no available backend found, Failed to fetch dynamically imported module .../asyncify.mjs". Add both files (~23MB wasm + 47KB loader) and document why .jspi and the plain variant are still skipped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

'q4' uses 4-bit weights with fp32 activations. The activation buffers for Qwen 3 0.6B mid-inference overrun the WASM 2GB heap, producing 'std::bad_alloc' out of OrtRun on most laptops. 'q4f16' (4-bit weights + fp16 activations) cuts the activation footprint in half and is the dtype the transformers.js team recommends for Qwen on WebGPU. Update the default in WebGPUProvider, the seed config in providers/manager.js, and the placeholder text in the dtype settings field — both chrome and firefox builds. NOTE: existing users with a stored dtype:'q4' need to either remove and re-add the WebGPU provider, or edit the dtype field in Settings. The first run after switching will re-download ~500MB (q4f16 weights); the old q4 weights stay in IndexedDB but go unused. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Some Chrome/GPU combos hit 'Integer overflow' from safeint.h during OrtRun on Qwen 3 + q4f16. The mixed-precision quantization kernels take an int32-shape code path that overflows for the model's attention buffer math. fp16 uses single-precision kernels throughout and sidesteps the issue at the cost of ~1.2GB download (vs ~500MB). - Note the workaround in offscreen.js's pipeline-load comment. - Add a Troubleshooting table to the vendor README covering the full error cascade we've walked through: bare-specifier, asyncify-mjs, bad_alloc, integer-overflow, no-backend. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

If WebGPU silently falls back to a software adapter (SwiftShader on Windows when discrete GPU is power-saved, Lavapipe on Linux without a Vulkan driver, etc.), inference burns 500MB on a download then OOMs the WASM heap with std::bad_alloc on first token. From the user's side this looks like dtype/model bugs. Make the offscreen probe call requestAdapter() and report isFallbackAdapter. webgpu.js#testConnection turns that into a specific error message naming chrome://flags. The pipeline loader also logs adapter info + onnx backend keys to the offscreen DevTools console so we can diagnose future "all dtypes OOM" reports without another round-trip. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

transformers.js's init code auto-sets wasmPaths to the .asyncify variant for non-Safari browsers (line ~7786 of transformers.web.js). The asyncify wasm has Asyncify stack-switching support but NO JSEP (JavaScript Execution Provider) exports — and the WebGPU EP is plumbed THROUGH JSEP. ort.webgpu.mjs calls things like `wasm2.jsepOnCreateSession?.()` with optional chaining; when those exports are undefined, WebGPU initialization SILENTLY no-ops. The runtime then runs the entire model on the WASM CPU backend, blowing the 2GB heap on any sub-1B model. From the user's side this looks like 'std::bad_alloc on every dtype' even though chrome://gpu shows WebGPU is hardware-accelerated. Fix: set wasmPaths to the {mjs, wasm} object form pointing at the .jsep files. The urlOverride path in ort.webgpu.mjs uses them directly, bypassing the asyncify default. .jsep.wasm exports the jsep* functions the WebGPU EP needs. Add hasWebgpuBackend + wasmPaths to the diagnostic log so a future regression is one line to spot. Update the troubleshooting table. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The asyncify wasm (which is the WebGPU-capable build in onnxruntime-web 1.20+ — webgpuInit / webgpuRegisterDevice live there, NOT in the jsep wasm) uses threading for its heap allocator. Threading needs SharedArrayBuffer. SharedArrayBuffer needs crossOriginIsolated. That needs cross_origin_embedder_policy + cross_origin_opener_policy in the manifest. Without isolation, the wasm falls back to a plain ArrayBuffer heap that's tiny — and inference std::bad_allocs on any 100MB+ allocation even when chrome://gpu shows WebGPU is hardware-accelerated and navigator.gpu hands out a real adapter. Confusing because the surface error looks like model-too-big rather than configuration. Add COOP/COEP to the chrome manifest. Also revert the wasmPaths override to point at .asyncify.{mjs,wasm} (the previous commit mistakenly pointed at .jsep, which lacks the webgpu* exports and gave us 'webgpuInit is not a function' instead). Add crossOriginIsolated + SharedArrayBuffer presence to the diagnostic log so the manifest change is verifiable without DevTools spelunking. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Past the WASM-heap saga — WebGPU is now actually running the model (input log shows past_conv.0 / past_recurrent.0 state tensors, confirming Qwen 3.5 0.8B's hybrid Mamba+Transformer architecture is executing). The new error is a Dawn-side "Failed to allocate memory for buffer mapping" during mapAsync, which reads GPU buffers back to CPU. Hybrid/vision models like Qwen 3.5 have past_conv, past_recurrent, AND transformer KV cache — that's a lot of buffers to map back and forth between GPU and CPU on every forward pass. Setting preferredOutputLocation: 'gpu-buffer' keeps the outputs as GPU buffers, so the next forward pass can feed them directly without the round-trip and Dawn doesn't run out of mapping staging memory. transformers.js attempts this automatically for kv-cache outputs when the model config provides cache_config, but the wiring doesn't always populate the right names for hybrid/VL models. Setting it globally is the safe override. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

# Conflicts: # manifest.json # package-lock.json # package.json # src/chrome/ARCHITECTURE.md # src/chrome/manifest.json # src/chrome/src/ui/settings.js # src/firefox/ARCHITECTURE.md # src/firefox/manifest.json # src/firefox/src/ui/settings.js # test/run.js

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

esokullu · 2026-05-26T02:03:41Z

@codex what's wrong here? i'm getting:

Error: webgpu: The data is not on CPU. Use getData() to download GPU data to CPU, or use texture or gpuBuffer property to access the GPU data directly.

chatgpt-codex-connector · 2026-05-26T02:05:23Z

Summary

Fixed the WebGPU inference worker to stop forcing preferredOutputLocation: 'gpu-buffer' on all generation outputs, which can trigger the exact runtime error you reported (The data is not on CPU...) when transformers.js post-processing expects CPU-readable tensors. src/chrome/src/offscreen/inference-worker.jsL122-L130
Added an inline explanatory comment so future changes preserve this constraint in the chat-generation path. src/chrome/src/offscreen/inference-worker.jsL125-L129
Committed the change on the current branch (ed8e6ce) and created a follow-up PR message with the rationale and validation steps.

Testing

✅ node test/run.js (197 passed, 0 failed)

View task →

esokullu and others added 2 commits May 22, 2026 15:52

chore: bump version 7.3.1 → 7.4.0

c7ed019

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector Bot reviewed May 22, 2026

View reviewed changes

Merge branch 'main' into webgpu

351971f

vercel Bot deployed to Preview May 22, 2026 13:02 View deployment

Update offscreen.html

ac615d0

vercel Bot deployed to Preview May 22, 2026 13:03 View deployment

esokullu and others added 2 commits May 22, 2026 16:27

vercel Bot deployed to Preview May 22, 2026 13:28 View deployment

version up

007488c

vercel Bot deployed to Preview May 22, 2026 13:34 View deployment

vercel Bot deployed to Preview May 22, 2026 13:39 View deployment

vercel Bot deployed to Preview May 22, 2026 13:49 View deployment

vercel Bot deployed to Preview May 22, 2026 14:02 View deployment

vercel Bot deployed to Preview May 22, 2026 14:58 View deployment

vercel Bot deployed to Preview May 22, 2026 15:04 View deployment

vercel Bot deployed to Preview May 22, 2026 15:10 View deployment

vercel Bot deployed to Preview May 22, 2026 15:16 View deployment

vercel Bot deployed to Preview May 22, 2026 15:27 View deployment

vercel Bot deployed to Preview May 22, 2026 15:58 View deployment

title change

9db2be7

vercel Bot deployed to Preview May 22, 2026 16:05 View deployment

vercel Bot deployed to Preview May 22, 2026 16:07 View deployment

esokullu and others added 2 commits May 26, 2026 03:59

9.0.0

ffecb66

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

vercel Bot deployed to Preview May 26, 2026 01:07 View deployment

9.0.2

51c4128

vercel Bot deployed to Preview May 26, 2026 01:57 View deployment

Fix WebGPU CPU tensor access error in worker pipeline

69c7f5d

vercel Bot deployed to Preview May 26, 2026 02:07 View deployment

Handle WebGPU OrtRun buffer download/CPU data failures with retry mode

f991fc3

vercel Bot deployed to Preview May 26, 2026 02:18 View deployment

Add robust fallback for WebGPU buffer map failures

1aac015

vercel Bot deployed to Preview May 26, 2026 02:22 View deployment

Improve WebGPU fallback errors and switch default ONNX model to Gemma

9b46bc7

vercel Bot deployed to Preview May 26, 2026 02:29 View deployment

Reset WebGPU mode when WASM kernel init fails

05461e1

vercel Bot deployed to Preview May 26, 2026 02:38 View deployment

Retry WebGPU unaligned-access failures with fp16 dtype

9c6d130

vercel Bot deployed to Preview May 26, 2026 02:44 View deployment

Skip WASM fallback for quantized WebGPU models

c4209b9

vercel Bot deployed to Preview May 26, 2026 02:50 View deployment

Retry quantized WebGPU map failures with fp16 before aborting

901fef1

vercel Bot deployed to Preview May 26, 2026 02:58 View deployment

improvements

28fd949

vercel Bot deployed to Preview May 26, 2026 15:20 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WebGPU + ONNX provider: Qwen 3 0.6B in-browser (7.4.0)#66

WebGPU + ONNX provider: Qwen 3 0.6B in-browser (7.4.0)#66
esokullu wants to merge 33 commits into
mainfrom
webgpu

esokullu commented May 22, 2026

Uh oh!

vercel Bot commented May 22, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 22, 2026

Uh oh!

esokullu May 22, 2026

Uh oh!

chatgpt-codex-connector Bot May 22, 2026

Uh oh!

esokullu May 22, 2026

Uh oh!

esokullu commented May 26, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

esokullu commented May 22, 2026

Summary

Why

Architecture

Tool use

Streaming

Library vendoring

+ matching ort-wasm-simd-threaded.* files

Reviewer notes

Test plan

Uh oh!

vercel Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

esokullu May 22, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

esokullu May 22, 2026

Choose a reason for hiding this comment

Uh oh!

esokullu commented May 26, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 26, 2026

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented May 22, 2026 •

edited

Loading