Releases: TadMSTR/ollama-queue-proxy
Releases · TadMSTR/ollama-queue-proxy
v0.2.0 — client injection, model-aware routing, embedding cache
Highlights
- Client injection — port-based auth bypass for clients that can't send Bearer headers; loopback by default, non-loopback bind requires
allow_public_injection: true. - Model-aware routing — weighted round-robin across Ollama hosts that already have the requested model loaded, with fast-path invalidation on miss.
- Embedding response cache — SHA256-keyed Valkey/Dragonfly cache for
/api/embedand/api/embeddings; hits bypass the queue and upstream entirely. - keep_alive defaulting — proxy-level body injection so Ollama doesn't unload models between bursty requests.
- Per-client concurrency caps —
max_concurrentonauth.keys[], enforced via per-client async semaphore with fairness bound.
Security
Pre-release audit clean (0 critical / high / medium). Two low findings and one informational note remediated before tag:
- L1 —
allow_public_injectionnow fails config validation on non-loopback bind; warning also fires when auth is enabled (injection bypasses Bearer auth). - L2 —
/metricsescapes Prometheus label values, preventing label-injection via client-supplied model names. - N1 —
DockerfileCMDswitched toollama-queue-proxyconsole script somain:run()orchestrates the N+1 server gather in containerized deployments.
Compatibility
All v0.1.x configs continue to work unchanged. New fields default to v0.1.x-equivalent behavior.
Full changelog in CHANGELOG.md.
v0.1.2
What's Changed
Patch release fixing two bugs discovered during claudebox deployment.
Bug fixes
- Streaming response detection now handles
application/x-ndjsoncontent-type — Ollama uses this for/api/generateand/api/chatstreaming responses; the previous check only matchedtext/event-streamand chunkedapplication/json - Webhook SSRF check now supports an
allowed_hostslist in config — enables webhook delivery to internal hostnames (e.g., ntfy on a LAN IP) without disabling the SSRF guard entirely
Full Changelog: v0.1.1...v0.1.2
v0.1.1
What's Changed
Security fixes
- SSRF validation bypass via hostnames —
validate_webhook_url()previously only checked raw IP literals; hostnames (e.g.,http://localhost/hook) bypassed the blocklist. Now resolves hostnames to IP viasocket.getaddrinfo()before blocklist comparison. Added169.254.0.0/16(link-local / cloud metadata) andfe80::/10to_PRIVATE_NETWORKS. - Dockerfile missing USER instruction — container now runs as
appuser(non-root) by default, consistent with the composeuser: 1000:1000override. - Queue management tier parameter now validated —
?tier=bogusreturns HTTP 400 instead of unhandledKeyError→ 500. Acceptshigh,normal,low. - CI action versions updated —
actions/checkout→ v6.0.2,actions/setup-python→ v6.2.0 with correct SHA pins.
Full Changelog: https://github.com/TadMSTR/ollama-queue-proxy/blob/main/CHANGELOG.md