Skip to content

Commit b4d18db

Browse files
claudebokelley
authored andcommitted
fix(server): fix register_lazy+resolve disabled-state inconsistencies
Two correctness issues found in pre-PR review: 1. register_lazy(await_first_validation=True) + validator returns False: the platform was being written to _platforms even though the tenant is disabled, and the factory was left in _factories. Mirrors the resolve() cold-path: on validator failure, discard the platform and clear the factory so the disabled tenant behaves consistently regardless of how it reached that state. 2. resolve() docstring claimed it returns None when validator returns False — only true for the lazy cold-path. The fast-path (eager or previously-resolved lazy) returns TenantResolution(health="disabled"). Rewritten to make the contract unambiguous and to warn against gating solely on result is None. Also adds: - Class docstring: do-not-use-as-SubdomainTenantRouter warning (same resolve(host) signature, incompatible return type) - recheck() docstring: lazy-pending and lazy-disabled caveats (recheck on pending-lazy advances health without building the platform; recheck on factory-disabled is insufficient — re-register is required) - Test: register_lazy + await_first_validation + validator=False must not cache platform and must not retry factory on subsequent resolve() https://claude.ai/code/session_01DRv6qahN7Jjt3Q4oxGBXkd
1 parent d02ee30 commit b4d18db

2 files changed

Lines changed: 67 additions & 8 deletions

File tree

src/adcp/server/tenant_registry.py

Lines changed: 42 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -115,6 +115,12 @@ class TenantRegistry:
115115
before ``unregister()`` was called completes safely — zombie-entry
116116
guards in both methods prevent stale writes after removal.
117117
118+
**Do not pass a TenantRegistry as a SubdomainTenantRouter.**
119+
Both classes expose ``async def resolve(host)``, but the return types
120+
are incompatible (:class:`TenantResolution` vs :class:`Tenant`).
121+
Mypy will flag the mismatch; duck-typing and ``isinstance`` checks
122+
will not.
123+
118124
:param validator: Optional validation callable (sync or async).
119125
``None`` → principal-token mode; validation always succeeds.
120126
:param default_serve_options: Optional dict of defaults to store for
@@ -364,9 +370,18 @@ async def register_lazy(
364370
exc_info=True,
365371
)
366372
self._health[tenant_id] = "disabled"
373+
self._factories.pop(tenant_id, None)
367374
return
368-
self._platforms[tenant_id] = platform
369-
self._health[tenant_id] = "healthy" if ok else "disabled"
375+
if ok:
376+
self._platforms[tenant_id] = platform
377+
self._factories.pop(tenant_id, None)
378+
self._health[tenant_id] = "healthy"
379+
else:
380+
# Validator rejected the platform — discard it and clear the
381+
# factory to mirror resolve() cold-path behavior: a disabled
382+
# lazy tenant needs register_lazy() + recheck() to recover.
383+
self._health[tenant_id] = "disabled"
384+
self._factories.pop(tenant_id, None)
370385

371386
def unregister(self, tenant_id: str) -> None:
372387
"""Remove a tenant from the registry.
@@ -402,6 +417,21 @@ async def recheck(self, tenant_id: str) -> None:
402417
The health state is updated before any exception propagates, so
403418
the state is always consistent even when the validator raises.
404419
420+
**Lazy-tenant caveats:**
421+
422+
* For a lazy tenant in ``pending`` state (factory never invoked),
423+
``recheck()`` runs the validator against the registered
424+
``agent_url`` only. If it succeeds, health advances to
425+
``healthy`` — but the platform has not been built yet.
426+
:meth:`resolve_by_host` still returns ``None``; use the async
427+
:meth:`resolve` which triggers the factory on first call.
428+
* For a lazy tenant that reached ``disabled`` via factory failure,
429+
the factory has been cleared. Calling ``recheck()`` alone is
430+
insufficient to recover — the validator may succeed but there
431+
is no platform to serve. To retry platform construction, call
432+
:meth:`register_lazy` again with the same factory, then call
433+
:meth:`recheck` if you also need to re-run the validator.
434+
405435
:raises KeyError: when ``tenant_id`` is not registered.
406436
:raises Exception: re-raises any exception from the validator
407437
after updating the health state.
@@ -492,14 +522,18 @@ async def resolve(self, host: str) -> TenantResolution | None:
492522
Concurrent first-hit resolves for the same tenant serialize on
493523
the per-tenant lock — only one factory invocation occurs.
494524
495-
Returns ``None`` when:
525+
Returns ``None`` when no tenant is registered for this host, or when
526+
a lazy tenant's factory/validator fails on this call (health set to
527+
``disabled`` in both cases).
496528
497-
* No tenant is registered for this host.
498-
* The factory raised (health is set to ``disabled``).
499-
* The validator returned ``False`` (health is set to ``disabled``).
529+
Returns a :class:`TenantResolution` — which may have
530+
``health="disabled"`` — when the platform was already built (eager
531+
registration, lazy + ``await_first_validation=True``, or a previous
532+
:meth:`resolve` call). **Always check ``result.health`` before
533+
serving; never gate solely on ``result is None``.**
500534
501-
The caller is responsible for checking ``result.health`` and
502-
gating traffic — the registry does not 503 automatically.
535+
The caller is responsible for gating traffic — the registry does
536+
not 503 automatically.
503537
504538
:param host: Raw ``Host`` header value. Port suffixes are stripped;
505539
full URLs are also accepted. See :meth:`_normalize_host` for

tests/test_tenant_registry.py

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -694,6 +694,31 @@ async def factory(tid: str) -> Any:
694694
assert result.platform is new_platform
695695

696696

697+
@pytest.mark.asyncio
698+
async def test_register_lazy_await_first_validation_validator_false_does_not_cache() -> None:
699+
"""When validator returns False in register_lazy(await_first_validation=True),
700+
platform must NOT be cached and factory must be cleared — mirrors resolve()
701+
cold-path behavior so disabled tenants are consistent regardless of how they
702+
were registered."""
703+
platform = _mock_platform()
704+
705+
async def factory(tid: str) -> Any:
706+
return platform
707+
708+
registry = TenantRegistry(validator=lambda tid, url: False)
709+
await registry.register_lazy(
710+
"acme",
711+
agent_url="https://acme.example.com",
712+
factory=factory,
713+
await_first_validation=True,
714+
)
715+
assert registry.health("acme") == "disabled"
716+
# Sync path must return None (platform not cached).
717+
assert registry.resolve_by_host("acme.example.com") is None
718+
# Async path must also return None (factory was cleared, no retry).
719+
assert await registry.resolve("acme.example.com") is None
720+
721+
697722
@pytest.mark.asyncio
698723
async def test_resolve_factory_failure_does_not_retry_on_subsequent_calls() -> None:
699724
"""After factory failure sets health=disabled, subsequent resolve() calls

0 commit comments

Comments
 (0)