Skip to content

Expired wildcard cert keeps being served for newly-added specific subdomains #384

@mbardelmeijer

Description

@mbardelmeijer

What version of the package are you using?

v0.25.3

What are you trying to do?

Run two *certmagic.Config instances sharing one *certmagic.Cache:

  • Config A issues and serves wildcard certs (*.example.com) via DNS-01.
  • Config B issues and serves only exact-SAN certs via on-demand HTTP-01.

The TLS server picks A or B per handshake based on whether the SNI is currently administratively configured as a wildcard or not. The shared cache is required so a single CacheOptions.GetConfigForCert can route renewals to the right Config.

What steps did you take?

  1. Through Config A, obtain *.example.com. It is cached and written to storage.
  2. Wildcard configuration is removed administratively. The cert remains in the cache (and in storage).
  3. A new specific hostname test.example.com is added; it now routes to Config B.
  4. A TLS handshake arrives for test.example.com. The server dispatches it to Config B's GetCertificate.

Time passes, the wildcard domain is removed, and the DNS challenge is removed. The wildcard cert expires.

What did you expect to happen, and what actually happened instead?

Expected: Config B refuses the cached wildcard cert (different issuance policy, expired) and falls through to obtainOnDemandCertificate to issue a fresh cert for test.example.com.

Actual: getCertificateFromCache's wildcard-name walk in handshake.go:142-152 finds *.example.com in the shared cacheIndex and returns it as matched = true to Config B. The follow-up renewal in optionalMaintenance runs against the SNI (test.example.com), which has no resource in storage; it errors. Because the cert is expired, handshake.go:463-465 still returns it, and it is served on every subsequent handshake. On-demand issuance for test.example.com is never attempted.

Logs repeatedly show:

on_demand  renewing certificate on-demand failed
  subjects=["*.example.com"]  error="context canceled"

How do you think this should be fixed?

A combination of:

  1. Per-Config opt-out of the wildcard cache walk. Add Config.DisableWildcardCacheFallback bool. When set, skip the label-replacement loop at handshake.go:142-152. Configs that should only serve exact-SAN matches enable it. Minimal change, fixes this report.

  2. Optional ScopedCertificateSelector interface. A CertificateSelector that also implements ScopedToIndexedNames() bool (returning true) tells selectCert it only needs index hits for the looked-up name. Two effects:

    • selectCert skips the getAllCerts() fallback at handshake.go:195-197, so deployments with thousands of cached certs avoid a per-handshake []Certificate allocation proportional to cache size.
    • Combined with (1) the selector is reached only for true exact-name hits, allowing it to reject wildcard SANs trivially.

    It is a non-breaking addition, existing CertificateSelector implementations keep current behavior.

(1) alone resolves the user-visible bug; (2) additionally fixes the allocation hot path

Please link to any related issues, pull requests, and/or discussion

None found in a quick search

Bonus: What do you use CertMagic for, and do you find it useful?

For our SaaS redirect.pizza 🍕

Bug report written by Claude Opus 4.7.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions