Skip to content

Tolerate clang SUBSTPACK placeholders#313

Open
kristjanvalur wants to merge 9 commits intogimli-rs:masterfrom
kristjanvalur:pr1-substpack-clean
Open

Tolerate clang SUBSTPACK placeholders#313
kristjanvalur wants to merge 9 commits intogimli-rs:masterfrom
kristjanvalur:pr1-substpack-clean

Conversation

@kristjanvalur
Copy link
Copy Markdown

@kristjanvalur kristjanvalur commented Apr 15, 2026

Summary

  • tolerate clang SUBSTPACK and SUBSTBUILTINPACK placeholders in parser/demangler paths
  • keep this split isolated from diagnostics and broader parser refactors
  • include focused compatibility tests

Motivation and Context

The SUBSTPACK / SUBSTBUILTINPACK markers are clang-emitted compatibility artifacts (from fallback code paths in clang's mangler), not standard Itanium grammar productions. They still appear in production symbol streams, so strict parsing causes avoidable failures.

Behavior Contract

This PR treats those markers as recovery-only noise tokens for parsing purposes, while rendering a visible placeholder in demangled output:

  • {clang-subst-pack-noise}
  • placeholder tokens are inserted into substitutions so clang-emitted substitution references (for example S_) resolve correctly

Rationale:

  • avoids malformed/ambiguous output forms (for example empty template arguments)
  • keeps data-loss explicit to callers
  • preserves compatibility of the public AST API surface (no new public enum variants)

Additional guardrail:

  • a legitimate length-prefixed source-name identifier with that spelling remains literal and is not rewritten to the noise placeholder.

Scope

  • narrow compatibility handling only for these placeholder markers

Validation

  • direct token parse coverage for SUBSTPACK and SUBSTBUILTINPACK
  • real-world mangled symbol probe that previously hit this path
  • regression probe for length-prefixed SUBSTPACK identifier

@kristjanvalur kristjanvalur marked this pull request as ready for review April 15, 2026 20:41
@khuey
Copy link
Copy Markdown
Collaborator

khuey commented Apr 16, 2026

I think the goal here is fine but there's at least one big issue with the implementation. In your PR a comment says

"They are intentionally non-substitutable so that synthetic recovery tokens do not perturb real S... indices"

But they clearly are substitutable in at least one case (the break on line 2425 will go to line 2431). (This is probably an oversight by the clang developers but that's the price of admission when trying to demangle symbols they didn't expect anyone to try to demangle).

https://github.com/llvm/llvm-project/blob/925e2156c71da60a61f1199960f5fff82de35f57/clang/lib/AST/ItaniumMangle.cpp#L2419

@kristjanvalur
Copy link
Copy Markdown
Author

Great catch. I was assuming this was just ignorable noise, but you are right.
I updated the code so that they now participate in substitution-table insertion, and adjusted the comment.
I also added regression coverage to assert substitution reuse (S_) after parsing each placeholder token.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants