Skip to content

fix(idl-v2): cycle/duplicate guards and FFI hardening#1345

Open
ukint-vs wants to merge 6 commits intomasterfrom
worktree-sharded-gathering-hejlsberg
Open

fix(idl-v2): cycle/duplicate guards and FFI hardening#1345
ukint-vs wants to merge 6 commits intomasterfrom
worktree-sharded-gathering-hejlsberg

Conversation

@ukint-vs
Copy link
Copy Markdown
Member

@ukint-vs ukint-vs commented May 7, 2026

Closes #1319 (with documentation, not a new API — see thread for the discussion).

Background

The original report asked for a computeInterfaceIds API to avoid the "submit IDL with @0x0000..., paste hex from the validation error, repeat" loop when authoring v2 IDL fixtures. Per @vobradovich's review, the existing parse_idl already returns canonical ids in the AST when callers omit explicit @0x... suffixes on non-@partial services. So this PR closes the issue by documenting that pattern rather than adding a parallel API.

What this PR ships

A handful of correctness fixes that surfaced while exploring the original API design — each one is a real bug or UB on the existing parse_idl_to_json path:

File Fix
rs/idl-parser-v2/src/post_process.rs Cycle detection on extends graphs (service A { extends { A } } and longer cycles A → B → A previously stack-overflowed in compute_service_id with no validation guard)
rs/idl-parser-v2/src/post_process.rs Duplicate-service-name guard at index-build time (BTreeMap previously collapsed the second entry, then trapped later via expect(interface_id must be set))
rs/idl-parser-wasm/src/lib.rs Replace String::from_utf8_unchecked with safe str::from_utf8 (was UB on any non-UTF-8 input from a raw FFI caller)
rs/idl-parser-wasm/src/lib.rs Sanitize NUL bytes before CString::new in create_parse_result (validation errors can quote user input containing \0, e.g. @entry_id: 1\0, which previously panicked the WASM module)
rs/idl-parser-wasm/src/lib.rs Accept zero-length input with a non-null pointer so empty/whitespace-only IDLs do not surface as NullPtr errors

Documentation

docs/idl-v2-spec.md now spells out the strip-and-parse pattern for tooling that needs ids deterministically up front:

  1. Generate or load IDL.
  2. Strip explicit @0x... suffixes from non-@partial service declarations.
  3. Call sails_idl_parser_v2::parse_idl (Rust) or SailsIdlParser.parse(idl) (sails-js).
  4. Read each service's interface_id from the returned AST.
  5. Optionally re-render the IDL with the computed ids filled in.

@partial services must keep their explicit id — the partial signature is a subset, so the parser cannot recompute the original service's id from it.

Test plan

  • cargo test -p sails-idl-parser-v2 --locked — 35 tests pass (3 new regression tests exercise cycle / self-cycle / duplicate-name detection via parse_idl)
  • cargo clippy -p sails-idl-parser-v2 -p sails-idl-parser-wasm --all-targets --locked -- -D warnings — clean
  • cargo fmt --all --check — clean
  • pnpm --filter sails-js-parser-idl-v2 build — clean rebuild
  • Jest in js/parser-idl-v2 — 8 tests pass (3 new regression tests for cycle / self-cycle / duplicate names via parse())
  • Diff against master: 5 files, +194/-23

🤖 Generated with Claude Code

ukint-vs and others added 3 commits May 7, 2026 17:12
Tooling that emits v2 IDL from another source (codegen, schema-first
generators, IDL linters) needs deterministic interface_ids up front.
Today the only path is round-tripping through parse_idl with placeholder
@0x0000... and copy-pasting the hex from the validation error one
service at a time.

Add compute_interface_ids in sails-idl-parser-v2 and a matching
compute_interface_ids_to_json C ABI export in sails-idl-parser-wasm.
Both run the full validation pipeline but clear explicit ids on
non-@partial services first, so placeholder mismatches no longer
error out. @partial services pass through verbatim — explicit id is
preserved, missing id surfaces the same error as parse_idl.

Closes part of #1319.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SailsIdlParser.computeInterfaceIds(idl) returns a Record<string, string>
mapping each service name to its canonical interface_id. Wraps the new
WASM export added in the previous commit. Refactor the WASM call
plumbing into a private invoke<T>() helper shared with parse() so both
entry points use the same memory-fill / error-decode / cleanup path.

Tests cover: not-initialized error, basic shape, placeholder mismatches
ignored, cross-check that computeInterfaceIds() agrees with parse() on
the same IDL (regression pin against algorithm drift), @partial
pass-through with explicit id, @partial without id errors,
whitespace-only IDL returns {}, parse error propagation.

Update docs/idl-v2-spec.md to clarify that @<interface_id> is optional
on non-@partial services and point tooling at computeInterfaceIds.

Closes #1319.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex review found six reachable bugs in the parser, four pre-existing
and exposed by the new compute_interface_ids API. All cause WASM traps
or panics that surface in JS as RuntimeError.

Rust:
- post_process: detect cyclic `extends` graphs (self-extends and
  longer cycles A→B→A previously stack-overflowed in compute_service_id).
  Track a `visiting` set and return a Validation error.
- post_process: reject duplicate service names at index-build time
  instead of letting the BTreeMap silently collapse them and trapping
  later via `expect(interface_id must be set)`.
- idl-parser-wasm: replace `String::from_utf8_unchecked` with safe
  `str::from_utf8`. The previous code was UB on any non-UTF-8 input
  from a raw FFI caller.
- idl-parser-wasm: sanitize NUL bytes before `CString::new` in
  create_parse_result. Validation errors can quote user input
  (e.g. `@entry_id: 1\0`), which previously panicked the Wasm module.
- idl-parser-wasm: accept zero-length input with a non-null pointer so
  the FFI matches the safe-Rust API (compute_interface_ids("") → {}).

JS:
- SailsIdlParser.computeInterfaceIds returns an Object.create(null)
  record. Service names like `__proto__`, `constructor`, or `toString`
  are valid IDL idents (per the Pest grammar's ASCII_ALPHA | "_"),
  and would otherwise pollute downstream dictionary consumers.

Tests: 3 new Rust tests and 5 new JS tests cover each bug.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new computeInterfaceIds API to the Sails IDL parser, enabling deterministic computation of service identifiers. Key changes include updates to the IDL specification to support an optional @interface_id suffix, implementation of the new logic in both Rust and TypeScript, and enhanced validation to detect duplicate service definitions and cyclic extension graphs. Review feedback highlights a potential fragility in manual JSON construction within the WASM FFI and suggests an optimization to reduce redundant string allocations in the Rust post-processing phase.

Comment thread rs/idl-parser-wasm/src/lib.rs Outdated
Comment on lines +62 to +63
// Service idents are ASCII per the IDL grammar; no escaping needed.
json.push_str(name);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Manual JSON construction without escaping assumes that service identifiers are strictly restricted to characters that do not require JSON escaping (like double quotes or backslashes). While the IDL grammar likely enforces this, it's a fragile assumption for an FFI boundary. If the grammar is ever expanded to allow more characters in identifiers, this will produce invalid JSON.

Copy link
Copy Markdown
Member Author

@ukint-vs ukint-vs May 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — switched to serde_json::to_string over a BTreeMap<&str, String> in 54e7055. serde_json is already in the WASM blob via sails-idl-ast's serde feature (used by parse_idl_to_json), so the ~13KB cost is for the new monomorphization, not a new dep. Future-proof against any grammar evolution.

Comment thread rs/idl-parser-v2/src/post_process.rs Outdated
Comment on lines +253 to +254
let name = s.name.name.to_string();
if service_index.insert(name.clone(), idx).is_some() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The service name is being cloned twice here: once via to_string() and then again via clone() for the insert call. Since s.name.name is already a String, you can avoid one allocation by cloning it once and reusing it for both the index insertion and the potential error message.

Suggested change
let name = s.name.name.to_string();
if service_index.insert(name.clone(), idx).is_some() {
let name = s.name.name.clone();
if service_index.insert(name.clone(), idx).is_some() {

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the same commit. Used the borrow form in format!() so we keep the single clone BTreeMap::insert actually needs and drop the redundant one.

ukint-vs and others added 2 commits May 7, 2026 17:48
Two suggestions from the review bot, both legitimate:

- compute_interface_ids_to_json: replace the hand-rolled JSON map
  emitter with serde_json::to_string. Service idents are ASCII-safe
  per the current Pest grammar, but serde_json is already in the WASM
  blob (sails-idl-ast pulls it in via the `serde` feature for
  parse_idl_to_json), so using it here removes the "what if grammar
  changes" footgun for a ~13KB blob increase.

- ServiceInterfaceId::new: the duplicate-name guard cloned the service
  name twice — once via to_string() for the index key, and again via
  .clone() into the error message. Use the borrow in format! and keep
  the single clone BTreeMap::insert needs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI's stable rustfmt (1.95.0) wanted CString::new placed on one line,
not split with the `.expect(...)` on its own line. Local toolchain had
formatted the original layout as acceptable but stable disagreed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ukint-vs ukint-vs requested a review from vobradovich May 7, 2026 14:57
@ukint-vs ukint-vs self-assigned this May 7, 2026
Copy link
Copy Markdown
Member

@vobradovich vobradovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove computeInterfaceIds

parse_idl_to_json already returns the post-processed AST JSON with computed interface_ids: parse_idl runs validate_and_post_process, and the JS parse() path normalizes those ids into InterfaceId values. So SailsIdlParser.computeInterfaceIds() mostly duplicates existing surface area. Its only semantic difference is that it ignores explicit non-@partial placeholder/mismatched ids, which also bypasses the normal parse_idl validation behavior. I’d remove the new exported WASM function and JS method, and have callers read parse(idl).services[*].interface_id instead.

IDL text that already contains placeholder or stale explicit ids

parse_idl already computes the canonical id when the suffix is absent. So tooling can do:

  1. generate or load IDL,
  2. remove explicit service ids from non-@partial service declarations,
  3. call parse_idl / parse(),
  4. read service.interface_id from the returned AST,
  5. optionally render the IDL again with the computed ids filled in.

The only exception is @partial: those must keep an explicit id, because their subset cannot safely recompute the original service id from the partial signature.

Keep cycle detection

The extends cycle detection is still worth keeping. I checked the merge base: old ServiceInterfaceId::compute_service_id only looked in computed and then recursively computed bases, so service A { extends { A } } and A -> B -> A would recurse indefinitely. There was no previous cyclic / visiting validation in the parser path.

Per @vobradovich's review: parse_idl already returns canonical
interface_ids in the AST when callers omit explicit @0x... suffixes
on non-@partial services, so the new computeInterfaceIds API mostly
duplicates surface area while bypassing the standard validation
mismatch. Tooling that needs ids deterministically can:

  1. strip explicit @0x... from non-@partial service decls,
  2. call parse_idl,
  3. read service.interface_id from the returned AST.

Remove:
- compute_interface_ids in sails-idl-parser-v2
- compute_interface_ids_to_json WASM FFI export
- SailsIdlParser.computeInterfaceIds() and its IIdlParser entry
- the dedicated test file

Keep (these benefit parse_idl too — they are real bug fixes the
reviewer explicitly asked to retain):
- post_process: cycle detection on `extends` graphs (`service A
  { extends { A } }` and longer cycles previously stack-overflowed
  in compute_service_id with no validation guard)
- post_process: duplicate-service-name guard at index-build time
  (BTreeMap previously collapsed the second entry, then trapped
  later via `expect`)
- idl-parser-wasm: replace `String::from_utf8_unchecked` with safe
  `str::from_utf8` (was UB on any non-UTF-8 input)
- idl-parser-wasm: sanitize NUL bytes before `CString::new`
  (validation errors can quote user input containing \0)
- idl-parser-wasm: accept zero-length input with a non-null pointer
  so empty/whitespace-only IDLs do not surface as NullPtr errors

Tests: 3 Rust regression tests via parse_idl (cycle, self-cycle,
duplicate names) and 3 matching JS regression tests via parse().

Update docs/idl-v2-spec.md to document the strip-and-parse pattern
in place of the removed API.

Closes #1319.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ukint-vs
Copy link
Copy Markdown
Member Author

ukint-vs commented May 8, 2026

@vobradovich Done in 704032f — you were right to push back on the duplicate surface.

Removed: compute_interface_ids (Rust), compute_interface_ids_to_json (WASM FFI), SailsIdlParser.computeInterfaceIds (JS), the dedicated test file, and the direct serde-json dep that the WASM export had pulled in.

Kept (per your request): cycle detection on extends graphs. Also kept the duplicate-name guard, str::from_utf8/NUL-sanitization in the FFI, and zero-length input acceptance — each fixes a real bug or UB on the existing parse_idl_to_json path. Three Rust + three JS regression tests now exercise these via parse_idl / parse().

Docs: rewrote the docs/idl-v2-spec.md paragraph to document the strip-and-parse pattern (omit @0x... from non-@partial services, call parse_idl, read service.interface_id from the AST) in place of the removed API.

Diff against master is now 5 files, +194/-23.

@ukint-vs ukint-vs changed the title feat(js,idl-v2): expose computeInterfaceIds for tooling fix(idl-v2): cycle/duplicate guards and FFI hardening May 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sails-js: export computeInterfaceId so tooling can author v2 IDLs without trial-and-error

2 participants