Skip to content

Releases: TadMSTR/ollama-queue-proxy

v0.2.0 — client injection, model-aware routing, embedding cache

23 Apr 01:37

Choose a tag to compare

Highlights

  • Client injection — port-based auth bypass for clients that can't send Bearer headers; loopback by default, non-loopback bind requires allow_public_injection: true.
  • Model-aware routing — weighted round-robin across Ollama hosts that already have the requested model loaded, with fast-path invalidation on miss.
  • Embedding response cache — SHA256-keyed Valkey/Dragonfly cache for /api/embed and /api/embeddings; hits bypass the queue and upstream entirely.
  • keep_alive defaulting — proxy-level body injection so Ollama doesn't unload models between bursty requests.
  • Per-client concurrency capsmax_concurrent on auth.keys[], enforced via per-client async semaphore with fairness bound.

Security

Pre-release audit clean (0 critical / high / medium). Two low findings and one informational note remediated before tag:

  • L1allow_public_injection now fails config validation on non-loopback bind; warning also fires when auth is enabled (injection bypasses Bearer auth).
  • L2/metrics escapes Prometheus label values, preventing label-injection via client-supplied model names.
  • N1Dockerfile CMD switched to ollama-queue-proxy console script so main:run() orchestrates the N+1 server gather in containerized deployments.

Compatibility

All v0.1.x configs continue to work unchanged. New fields default to v0.1.x-equivalent behavior.

Full changelog in CHANGELOG.md.

v0.1.2

21 Apr 18:39

Choose a tag to compare

What's Changed

Patch release fixing two bugs discovered during claudebox deployment.

Bug fixes

  • Streaming response detection now handles application/x-ndjson content-type — Ollama uses this for /api/generate and /api/chat streaming responses; the previous check only matched text/event-stream and chunked application/json
  • Webhook SSRF check now supports an allowed_hosts list in config — enables webhook delivery to internal hostnames (e.g., ntfy on a LAN IP) without disabling the SSRF guard entirely

Full Changelog: v0.1.1...v0.1.2

v0.1.1

21 Apr 12:11

Choose a tag to compare

What's Changed

Security fixes

  • SSRF validation bypass via hostnamesvalidate_webhook_url() previously only checked raw IP literals; hostnames (e.g., http://localhost/hook) bypassed the blocklist. Now resolves hostnames to IP via socket.getaddrinfo() before blocklist comparison. Added 169.254.0.0/16 (link-local / cloud metadata) and fe80::/10 to _PRIVATE_NETWORKS.
  • Dockerfile missing USER instruction — container now runs as appuser (non-root) by default, consistent with the compose user: 1000:1000 override.
  • Queue management tier parameter now validated?tier=bogus returns HTTP 400 instead of unhandled KeyError → 500. Accepts high, normal, low.
  • CI action versions updatedactions/checkout → v6.0.2, actions/setup-python → v6.2.0 with correct SHA pins.

Full Changelog: https://github.com/TadMSTR/ollama-queue-proxy/blob/main/CHANGELOG.md