feat(autoresearch): standalone experiment runner + result store (#2597)#2615
Merged
mrveiss merged 3 commits intoDev_new_guifrom Mar 28, 2026
Merged
feat(autoresearch): standalone experiment runner + result store (#2597)#2615mrveiss merged 3 commits intoDev_new_guifrom
mrveiss merged 3 commits intoDev_new_guifrom
Conversation
|
| Metric | Count |
|---|---|
| Total Violations | 2 |
| SSOT Violations (high priority) | 1 |
| Other Violations | 1 |
⚠️ 1 values have SSOT config equivalents!
These should be replaced with SSOT config imports:
Python:
from src.config.ssot_config import config
# Use: config.vm.main, config.port.backend, config.backend_urlTypeScript:
import config from '@/config/ssot-config'
// Use: config.vm.main, config.port.backend, config.backendUrl📖 See SSOT_CONFIG_GUIDE.md for documentation.
b0f08af to
11e68f4
Compare
2 tasks
…#2597) Milestone 1 of #1440 — AutoResearch integration. Implements: - Data models (Experiment, ExperimentResult, HyperParams, ExperimentStats) - Config module with env-var overrides - Output parser for autoresearch training metrics (val_bpb, loss, tokens/sec) - Dual persistence store (Redis for timeline, ChromaDB for semantic search) - Subprocess-isolated experiment runner with timeout and auto-evaluation - REST API: list/get/create experiments, stats, baseline, cancel - 28 unit tests covering parser, models, config, and store
Critical fixes from code review: - Remove erroneous await on sync get_redis_client() (would crash at runtime) - Add input validation for hp.extra keys (prevent flag injection via allowlist) - Add check_admin_permission auth to all routes High-priority fixes: - Make POST /experiments non-blocking via BackgroundTasks - Fix state index inconsistency: pass old_state to save_experiment for cleanup - Add asyncio.Lock for runner concurrency safety Medium fix: - Guard improvement_pct None in _build_document to prevent TypeError
11e68f4 to
5be03eb
Compare
- Fix state-tracking race: remove duplicate update_experiment_state calls from _evaluate_result, rely on single save in finally block - Add string value sanitization in _validate_extra_params: reject strings >256 chars or containing '--' to prevent flag injection - Add Pydantic request models (CreateExperimentRequest, SetBaselineRequest) with field length constraints for POST endpoints - Fix config.py env-var timing: move os.getenv calls into field(default_factory=...) for testability - Fix list_experiments state-filtered ordering: use timeline sorted set scores for chronological order instead of lexicographic UUID sort
6 tasks
This was referenced Mar 28, 2026
Bug: CI code-quality workflow never passes — isort Python 3.10 vs 3.12 categorization mismatch
#2683
Open
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Milestone 1 of #1440 (AutoResearch integration). Implements the standalone experiment runner and result store foundation.
Components
Experiment,ExperimentResult,HyperParams,ExperimentStatsdataclasses with full serializationAutoResearchConfigwith env-var overrides (AUTOBOT_AUTORESEARCH_*)val_bpb, train/val loss, tokens/sec from autoresearch training outputGET /experiments,GET /experiments/{id},GET /experiments/stats,POST /experiments,POST /experiments/baseline,GET /status,POST /cancelfeature_routers.pyat/api/autoresearchArchitecture
get_redis_client()andget_async_chromadb_client()Test plan
curl -sk https://localhost:8443/api/autoresearch/statusafter deployRelated issues