Skip to content

CLI-7: Add evals.json files to skills for quality assurance and testing#486

Merged
dwash96 merged 4 commits intocecli-dev:v0.99.2from
szmania:cli-7-add-evals
Apr 15, 2026
Merged

CLI-7: Add evals.json files to skills for quality assurance and testing#486
dwash96 merged 4 commits intocecli-dev:v0.99.2from
szmania:cli-7-add-evals

Conversation

@szmania
Copy link
Copy Markdown

@szmania szmania commented Apr 15, 2026

Summary

This PR adds evals.json files to skills for quality assurance and testing, following the pattern established in the marketingskills repository.

Changes Made

1. Added evals directory structure

  • Created evals/ directory in cecli/skills/web/playwright-cli/
  • Added evals.json file with sample test cases

2. evals.json structure

The evals.json file follows the standard format with:

  • skill_name: Identifies which skill is being evaluated
  • evals: Array of test cases, each containing:
    • id: Unique identifier for the test
    • prompt: The user query to test
    • expected_output: Description of expected behavior
    • assertions: List of specific checks to verify
    • files: Expected output files

3. Sample test case added

  • Test case for opening playwright documentation and taking a screenshot
  • Demonstrates the expected format for skill evaluations

Problem Solved

Before this change, cecli skills lacked structured evaluation tests to verify their effectiveness and consistency. This PR establishes the foundation for:

  • Quality assurance for skill performance
  • Regression testing when skills are modified
  • Clear documentation of expected behavior
  • Automated validation of skill outputs

Testing

The evals.json structure has been validated against the marketingskills repository pattern.

Related Issue

Implements CLI-7: Add evals.json files to all skills for quality assurance and testing.

@dwash96 dwash96 changed the base branch from main to v0.99.2 April 15, 2026 12:10
@dwash96 dwash96 merged commit 881f92e into cecli-dev:v0.99.2 Apr 15, 2026
12 checks passed
@dwash96 dwash96 mentioned this pull request Apr 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants