For the browser extension, WebGPU is more interesting. WebLLM gives in-browser WebGPU LLM inference with an OpenAI-style API, and Transformers.js / ONNX Runtime WebGPU are good fits for smaller summarization/classification pipelines. Browser support is also much better now, though still uneven by OS/browser; see web.dev’s WebGPU availability summary.
transcription
summarization (context compaction)
vision
safesocial
planning
all different components that may or may not be outsourced to webgpu.
we may need an engine to compute resource vs. which local ones to use or which ones to offload stronger cloud models
For the browser extension, WebGPU is more interesting. WebLLM gives in-browser WebGPU LLM inference with an OpenAI-style API, and Transformers.js / ONNX Runtime WebGPU are good fits for smaller summarization/classification pipelines. Browser support is also much better now, though still uneven by OS/browser; see web.dev’s WebGPU availability summary.
transcription
summarization (context compaction)
vision
safesocial
planning
all different components that may or may not be outsourced to webgpu.
we may need an engine to compute resource vs. which local ones to use or which ones to offload stronger cloud models