You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When the agent fails an action with Invalid element reference 'EN'. The element does not exist on the current page., there are two distinct root causes that the current code can't distinguish:
LLM hallucination — the model invented a ref that was never in any snapshot.
DOM mutation between snapshot and click — the snapshot promised a ref, set data-pilo-ref on the DOM, but a subsequent re-render (commonly React reconciliation) stripped the attribute before the click executed.
For (1), failing loudly is correct. For (2), we have the information to recover — ariaSnapshot.ts:55-57 already maintains globalThis.__piloRefMap (a Map<string, Element>) precisely because of "DOM re-renders that strip data-pilo-ref attributes, e.g. React reconciliation" — but the current validateElementRef path doesn't consult it.
This was identified during the investigation of #447 (closed). Per-task event traces from React-heavy sites (Booking, Google Flights, Apple) repeatedly show valid refs failing lookup because the attribute was stripped between snapshot and click; the Element reference held in __piloRefMap is still valid in those cases.
The proposed fallback is independent of the ref scheme — it would equally improve the current E1/E2/... behavior. No PR #447 dependency.
privateasyncvalidateElementRef(ref: string): Promise<Locator>{if(!this.page)thrownewError("Browser not started");constlocator=this.page.locator(`[data-pilo-ref="${ref}"]`);constcount=awaitlocator.count();if(count===0){thrownewInvalidRefException(ref);}// ...}
If count === 0, the call fails immediately with InvalidRefException. The Element reference stored in __piloRefMap (which Playwright cannot see directly, since it lives in the page context) is unused.
Equivalent code in packages/extension/src/background/ExtensionBrowser.ts:356-358 does a single document.querySelector and has no fallback either.
Proposed fix
When the [data-pilo-ref="..."] selector returns 0, fall through to evaluating in the page context and checking globalThis.__piloRefMap.get(ref). If the map has a live Element, re-attach the data-pilo-ref attribute and return a locator scoped to that element.
el.isConnected guards against the element having been removed from the DOM entirely (genuine DOM mutation, not just attribute strip — in that case the ref really is dead).
ExtensionBrowser.ts gets an analogous patch but simpler because it's already running in the page context — direct __piloRefMap.get(ref) + setAttribute + re-query.
Booking task: agent's click E_6b28 succeeded via locator resolution but timed out at click time (<input id=":R9qmr5:" value="flight" type="...">). That suggests the attribute was still there for the lookup. But other Booking actions failed with "ref does not exist" — likely the same React-reconciliation stripping attributes between snapshot rendering and the action 1-2 seconds later.
Google Flights similarly: refs that were in the snapshot the LLM saw came back as "does not exist" at click time.
Some of these failures the trace shows were hallucinations (good — should fail loud). But some were genuine attribute-strip cases where __piloRefMap would have recovered the element.
Acceptance criteria
validateElementRef (playwright) and the extension's equivalent fall back to __piloRefMap when the attribute-selector lookup fails.
Fallback respects Element.isConnected so detached elements still fail loudly.
New unit test simulates the React-strip scenario: snapshot sets attribute → test code strips it → click succeeds via fallback.
Existing tests still pass.
Manual eval spot-check on a known React-heavy site (Booking or Google Flights) shows fewer stale_element_refs-classified failures from attribute-strip cases.
Effort estimate
~half a day. Two surface-level patches, two unit tests, one spot-check.
Out of scope
Distinguishing hallucination vs attribute-strip in error messages or failure classification (separate concern, would let the eval judge bucket these correctly).
Problem
When the agent fails an action with
Invalid element reference 'EN'. The element does not exist on the current page., there are two distinct root causes that the current code can't distinguish:data-pilo-refon the DOM, but a subsequent re-render (commonly React reconciliation) stripped the attribute before the click executed.For (1), failing loudly is correct. For (2), we have the information to recover —
ariaSnapshot.ts:55-57already maintainsglobalThis.__piloRefMap(aMap<string, Element>) precisely because of "DOM re-renders that strip data-pilo-ref attributes, e.g. React reconciliation" — but the currentvalidateElementRefpath doesn't consult it.This was identified during the investigation of #447 (closed). Per-task event traces from React-heavy sites (Booking, Google Flights, Apple) repeatedly show valid refs failing lookup because the attribute was stripped between snapshot and click; the
Elementreference held in__piloRefMapis still valid in those cases.The proposed fallback is independent of the ref scheme — it would equally improve the current
E1/E2/...behavior. No PR #447 dependency.Current behavior
packages/core/src/browser/playwrightBrowser.ts:770-789(validateElementRef):If
count === 0, the call fails immediately withInvalidRefException. The Element reference stored in__piloRefMap(which Playwright cannot see directly, since it lives in the page context) is unused.Equivalent code in
packages/extension/src/background/ExtensionBrowser.ts:356-358does a singledocument.querySelectorand has no fallback either.Proposed fix
When the
[data-pilo-ref="..."]selector returns 0, fall through to evaluating in the page context and checkingglobalThis.__piloRefMap.get(ref). If the map has a liveElement, re-attach thedata-pilo-refattribute and return a locator scoped to that element.Sketch for
playwrightBrowser.ts:el.isConnectedguards against the element having been removed from the DOM entirely (genuine DOM mutation, not just attribute strip — in that case the ref really is dead).ExtensionBrowser.tsgets an analogous patch but simpler because it's already running in the page context — direct__piloRefMap.get(ref)+setAttribute+ re-query.Why this matters
From PR #447's eval traces:
click E_6b28succeeded via locator resolution but timed out at click time (<input id=":R9qmr5:" value="flight" type="...">). That suggests the attribute was still there for the lookup. But other Booking actions failed with "ref does not exist" — likely the same React-reconciliation stripping attributes between snapshot rendering and the action 1-2 seconds later.Some of these failures the trace shows were hallucinations (good — should fail loud). But some were genuine attribute-strip cases where
__piloRefMapwould have recovered the element.Acceptance criteria
validateElementRef(playwright) and the extension's equivalent fall back to__piloRefMapwhen the attribute-selector lookup fails.Element.isConnectedso detached elements still fail loudly.stale_element_refs-classified failures from attribute-strip cases.Effort estimate
~half a day. Two surface-level patches, two unit tests, one spot-check.
Out of scope