Skip to content

Major refactoring#107

Merged
xuafeng merged 6 commits intosys-intelligence:mainfrom
bastoica:main
Feb 4, 2026
Merged

Major refactoring#107
xuafeng merged 6 commits intosys-intelligence:mainfrom
bastoica:main

Conversation

@bastoica
Copy link
Collaborator

@bastoica bastoica commented Jan 31, 2026

Description

This PR implements shared base classes and reusable primitives for agent validation oracles. It also refactors the artifact-specific agent validation oracles for Acto, Anvil, EqWalker, and Wasabi to use the same unified/standardized code structure. The goal is to reduce duplicate logic, make the oracle code easier to read and maintain, and ensure agent validation runs consistently across artifacts in ArtEvalBench.

Changes

  • Implemented 4 base oracle classes corresponding to the 4 canonical stages of the artifact evaluation process (environment setup, benchmark preparation, artifact build, and experiment runs) along shared requirement/check primitives.
  • Refactored Acto, Anvil, EqWalker, and Wasabi artifact-specific agent validation oracles to the standardized primitives and base orchestrator flow.
  • Created a README that documents the base classes and primitives, explains how to derive new oracle classes, and shows how to use the helper functions; basically this serves as a step-by-step guide for implementing the agent validation oracle part for any new artifact to-be-added to ArtEvalBench.

Testing

Installed the refactored artifacts locally and ran the agent validation oracles as standalone using a main.py runner.

Checklist

  • [x ] Tests pass locally
  • [x ] Code follows project style guidelines
  • [x ] Documentation updated (if needed)

@bastoica bastoica self-assigned this Jan 31, 2026
@bastoica bastoica requested review from Couen and xuafeng January 31, 2026 07:24
@bastoica bastoica added enhancement New feature or request feature new feature required labels Jan 31, 2026
@bastoica bastoica linked an issue Jan 31, 2026 that may be closed by this pull request
@bastoica bastoica marked this pull request as ready for review February 3, 2026 21:52
@xuafeng xuafeng merged commit 7ef9288 into sys-intelligence:main Feb 4, 2026
4 checks passed
tareknaser pushed a commit that referenced this pull request Feb 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request feature new feature required

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix the eval scripts issue of existing artifacts

2 participants