Skip to content

fix(gateway): re-apply tool rate limiter after system_configs overlay#1111

Open
nguyennguyenit wants to merge 1 commit intodevfrom
fix/rate-limiter-db-overlay-order
Open

fix(gateway): re-apply tool rate limiter after system_configs overlay#1111
nguyennguyenit wants to merge 1 commit intodevfrom
fix/rate-limiter-db-overlay-order

Conversation

@nguyennguyenit
Copy link
Copy Markdown
Contributor

Summary

  • setupToolRegistry initialises the tool rate limiter from cfg.Tools.RateLimitPerHour early in startup (gateway.go line 137), but system_configs DB overlay (cfg.ApplySystemConfigs) runs ~50 lines later. DB overrides of tools.rate_limit_per_hour were silently lost - the limiter object was already built from the JSON5 default.
  • Re-apply the limiter after ApplySystemConfigs so the DB value wins. Server has not started yet, no in-flight tool calls. nil case handled too so DB writes can disable rate limiting without restart.

Why this matters

Production observation: a Zalo sales-bot session burst past the 150/h default and operators tried to bump it via system_configs UPDATE + container restart. No effect. Workaround was to inject /app/data/config.json with {"tools":{"rate_limit_per_hour":800}}, which contradicts the DB-as-source-of-truth pattern other tunables rely on.

Tests

  • New: TestRegistry_SetRateLimiter_ReplacesPriorLimiter covers the replace-with-higher-limit path and the nil-disables path.
  • All existing rate limiter tests still pass.
  • go build ./..., go build -tags sqliteonly ./..., go vet ./... clean.

Out of scope

The same ordering pattern likely affects other config consumed inside setupToolRegistry (tools.scrub_credentials, MCP server wiring from DB). Those need a broader restructure - kept out of this hotfix.

Test plan

  • Merge to dev, deploy, set UPDATE system_configs SET value='800' WHERE key='tools.rate_limit_per_hour', restart container, verify log line tool rate limiting reapplied from system_configs per_hour=800.
  • Confirm the existing tool rate limiting enabled per_hour=<json5_default> log still fires earlier - the reapply log is additive.
  • Set DB value to 0, restart, verify limiter is disabled (no tool rate limit exceeded errors regardless of call volume).

setupToolRegistry creates the rate limiter from cfg.Tools.RateLimitPerHour
during early bootstrap (gateway.go line 137). System_configs DB overlay
runs ~50 lines later via cfg.ApplySystemConfigs (line 191), so any DB
override of tools.rate_limit_per_hour was silently lost - the limiter
object was already initialised from the JSON5 default.

Symptom in production: editing tools.rate_limit_per_hour via system_configs
table or the config HTTP API had no effect on running gateways. Operators
had to inject a config.json file to change the value, defeating the
DB-as-source-of-truth pattern that other tunables rely on.

Re-apply the limiter after ApplySystemConfigs runs. Safe ordering: server
has not started yet, no in-flight tool calls, and SetRateLimiter is a
plain field assignment with no shutdown cost on the discarded limiter.
nil case (rate_limit_per_hour <= 0) also handled so DB writes can disable
the limiter without restart.

Tests: new TestRegistry_SetRateLimiter_ReplacesPriorLimiter covers both
the replace-with-higher-limit path and the nil-disables path. All
existing rate limiter tests still pass.

Note: the same ordering pattern likely affects other config consumed
inside setupToolRegistry (tools.scrub_credentials, MCP server wiring).
Out of scope for this hotfix - they need a deeper restructure.
@clark-cant
Copy link
Copy Markdown

🔍 Code Review — fix(gateway): re-apply tool rate limiter after system_configs overlay

🎯 Tổng quan

Hotfix: rate limiter init từ config trước khi DB overlay chạy → DB value bị silent loss. Production: Zalo bot burst past limit, operator bump DB → no effect.

Scope: 2 files, +48/-0, 1 commit


🤖 CI + Merge

CI: go: PENDING, web: PASSED (52s)
Merge: UNSTABLE (MERGEABLE) — cần rebase


✅ Điểm Tốt

  1. Root cause đúng. setupToolRegistry (line ~137) dùng cfg.Tools.RateLimitPerHour, ApplySystemConfigs (line ~191) overlay sau → limiter đã built từ default.

  2. Fix gọn, đúng chỗ. Re-apply limiter sau ApplySystemConfigs — 12 dòng, clear comment, safe ordering.

  3. nil case handled. RateLimitPerHour <= 0 → SetRateLimiter(nil).

  4. Test đủ. Replace-with-higher (1→5) + nil-disables path.


📊 Summary

No CRITICAL, no MEDIUM. Clean hotfix.


💡 Recommendation

🟢 APPROVED

Wait for CI go:complete before merging. 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants