Skip to content

docs(sitemap-promote): seed twitter + hackernews PoC at global path#1823

Merged
jackwener merged 3 commits into
mainfrom
docs/sitemap-promote-hackernews
Jun 1, 2026
Merged

docs(sitemap-promote): seed twitter + hackernews PoC at global path#1823
jackwener merged 3 commits into
mainfrom
docs/sitemap-promote-hackernews

Conversation

@jackwener
Copy link
Copy Markdown
Owner

Summary

Promote Sitemap Hub PoC v1.1 from local-only to global seed for two complementary sites:

  • twitter (12 files, dense UI + testid-heavy + COOKIE_API) — by @opencli-user
  • hackernews (10 files, SSR + no testid + structural selector) — by @opencli-质量官

Both seeds land at skills/opencli-sitemap-author/references/site-memory/{twitter,hackernews}/sitemap/ and bump frontmatter source: local → global. The two PoCs together close the cross-validation loop on the v1.1 schema (#1822) — complex SPA vs simple SSR, write-heavy vs read-heavy, modern testid vs structural sibling-traversal — both encode cleanly under the same schema and tooling.

v1.1 schema application checklist (per #1822)

Both PoC sets apply the 12-patch schema delta as a unit:

  1. Form B compact YAML actions (pre / do / post / fail / recover / evidence, ~80 tokens) used as default; Form A markdown reserved for actions that genuinely need prose explanation.
  2. adapter_health_update: <adapter> -> suspect|broken write directive on every adapter-primary action and workflow Fallback path — closes the read↔write loop so opencli-browser-sitemap consumer can mutate local overlay.
  3. Regularized delimiters: | for failure-signal enums, || for fallback priority in do:, ; for sequential recovery steps.
  4. selector_pattern first-class anchor: forms used across both PoCs include id-anchored, sibling-traversal (HN tr.athing[id="<id>"] + tr a[href^="item?id="]), data-testid (Twitter [data-testid="like"]), form-name, ARIA role. Discouraged forms (nth-child(<rank>) / single class / text-only) avoided.
  5. Partial pages (_ prefix + empty url_patterns: []) used for cross-page UI primitives — Twitter _tweet_card.md references from home / profile / status.
  6. 800 token hard limit respected per file — see size table below; both PoCs natural-fit without forcing.

Cross-PoC token sizes (per file, bytes ≈ token×1.5 for English-heavy content)

Site Min Median Max Total
twitter (12 files) 1.7 KB 2.4 KB 3.1 KB 29.7 KB
hackernews (10 files) 1.0 KB 1.7 KB 2.5 KB 17.5 KB

Per-file all under the 800-token hard limit. Insight from 质量官: simple sites with 1-2 actions per page naturally sit at 800-2000 tokens — don't force-split when natural. Author-side empirical guidance (3-tier <1500 / 1500-3000 / >3000) will land in a follow-up SKILL.md patch.

Trust-reality verification

Both seeds were verified end-to-end against live sites before promotion:

  • twitter: 4 action chains tested (like / repost / bookmark via adapter; reply via DOM fallback); pitfalls.md records 6 verified site-specific traps (login wall / mobile UA redirect / queryId rotation / reply composer quirk / RTL caret / non-English locale).
  • hackernews: 3 workflows (read-story / submit-story / upvote) tested across news / newest / item routes; pitfalls.md records SSR-specific traps (login-required-for-vote / nth-child anti-pattern / age-gated submit).

What's NOT in scope (deferred)

  • SKILL.md size guidance 3-tier patch — follow-up PR (per thread agreement). Rationale: keeps this PR pure-seed for clean diff review; size guidance is author-side empirical complement to schema's hard 800 limit, deserves own description space.
  • More sites — PoC stops at 2 cross-validating cases; expansion (e.g., toutiao / xhs / linkedin) waits for opencli-browser-sitemap skill to land Phase 2 features.

Test plan

  • CI green
  • LGTM from @opencli-质量官 (review by author of hackernews seed)
  • Reviewer spot-checks frontmatter source: global on all 22 files
  • Reviewer spot-checks at least one Form B action in each PoC for delimiter regularity
  • Reviewer spot-checks at least one workflow Fallback path for adapter_health_update directive presence

cc @opencli-质量官 (review owner per thread)

jackwener added 3 commits June 2, 2026 02:51
Promote-prep commit — hackernews PoC v1.1 files copied from local overlay
(~/.opencli/sites/hackernews/sitemap/) to the global seed path
(skills/opencli-sitemap-author/references/site-memory/hackernews/sitemap/),
with frontmatter source bumped local → global on all 10 files.

@opencli-user's twitter v1.1 normalize will land on top of this branch,
after which the joint promote PR opens against main.

Files:
- SITE.md, apis.md, pitfalls.md (site-level)
- pages/front.md, feed.md, item.md, user.md
- workflows/read-story.md, submit-story.md, upvote.md

All Form B YAML actions, drop action-level verified_at/source, delimiter
form `|`/`||`/`;`, adapter_health_update directives on adapter-primary
actions, selector_pattern 5 types declared. Per v1.1 schema (#1822).
- Add 12 twitter sitemap files at skills/opencli-sitemap-author/references/site-memory/twitter/sitemap/
- frontmatter source bumped local -> global
- v1.1 schema applied: Form B compact YAML actions, adapter_health_update directives on write workflows, regularized | / || / ; delimiters, selector_pattern anchors where applicable
- pairs with hackernews seed in prior commit; both become global seed at promote
Align with twitter sitemap files in this PR and with the v1.1 schema applied
(Form B YAML actions, adapter_health_update directives, etc.). Catch from
review — twitter files already use schema_version: 1.1 throughout.
@jackwener jackwener merged commit 3dcddb6 into main Jun 1, 2026
11 checks passed
@jackwener jackwener deleted the docs/sitemap-promote-hackernews branch June 1, 2026 19:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant