Write an eval suite for the pf-token-auditor skill (design-audit plugin). This skill audits designs against PF6 token architecture and bridges Figma styles to PF semantic tokens.
Discriminating signal: PF6 token architecture knowledge — semantic vs base vs chart token tiers, correct mapping hierarchy, knowing when to use --pf-t--global vs --pf-t--chart vs deprecated patterns.
MCP dependency: Partial — code-scanning cases work standalone, Figma bridge cases need MCP. Tag Figma-dependent cases with requires_mcp: figma.
Acceptance criteria:
-
eval/pf-token-auditor/eval.yaml exists following the pf-unit-test-generator template
-
5+ test cases covering explicit invocation, implicit/contextual prompts, and at least 1 negative control (e.g., correct v6 token usage)
-
Inline Python judges test discriminating behavior (PF6 token hierarchy decisions, not general CSS variable knowledge)
-
With-skill/without-skill A/B delta documented
-
All judges pass at defined thresholds
Jira Issue: PF-4340
Write an eval suite for the
pf-token-auditorskill (design-auditplugin). This skill audits designs against PF6 token architecture and bridges Figma styles to PF semantic tokens.Discriminating signal: PF6 token architecture knowledge — semantic vs base vs chart token tiers, correct mapping hierarchy, knowing when to use
--pf-t--globalvs--pf-t--chartvs deprecated patterns.MCP dependency: Partial — code-scanning cases work standalone, Figma bridge cases need MCP. Tag Figma-dependent cases with
requires_mcp: figma.Acceptance criteria:
eval/pf-token-auditor/eval.yamlexists following the pf-unit-test-generator template5+ test cases covering explicit invocation, implicit/contextual prompts, and at least 1 negative control (e.g., correct v6 token usage)
Inline Python judges test discriminating behavior (PF6 token hierarchy decisions, not general CSS variable knowledge)
With-skill/without-skill A/B delta documented
All judges pass at defined thresholds
Jira Issue: PF-4340