Skip to content

feat(autoresearch): standalone experiment runner + result store (#2597)#2615

Merged
mrveiss merged 3 commits intoDev_new_guifrom
feature/autoresearch-m1
Mar 28, 2026
Merged

feat(autoresearch): standalone experiment runner + result store (#2597)#2615
mrveiss merged 3 commits intoDev_new_guifrom
feature/autoresearch-m1

Conversation

@mrveiss
Copy link
Copy Markdown
Owner

@mrveiss mrveiss commented Mar 27, 2026

Summary

Milestone 1 of #1440 (AutoResearch integration). Implements the standalone experiment runner and result store foundation.

Components

  • models.pyExperiment, ExperimentResult, HyperParams, ExperimentStats dataclasses with full serialization
  • config.pyAutoResearchConfig with env-var overrides (AUTOBOT_AUTORESEARCH_*)
  • parser.py — Extracts val_bpb, train/val loss, tokens/sec from autoresearch training output
  • store.py — Dual persistence: Redis (timeline queries, state indices) + ChromaDB (semantic search over findings)
  • runner.py — Subprocess-isolated experiment execution with timeout, auto-evaluation (keep/discard based on improvement threshold)
  • routes.py — REST API: GET /experiments, GET /experiments/{id}, GET /experiments/stats, POST /experiments, POST /experiments/baseline, GET /status, POST /cancel
  • Router registered in feature_routers.py at /api/autoresearch

Architecture

  • Follows existing service patterns (composition, lazy init, dependency injection)
  • Uses canonical get_redis_client() and get_async_chromadb_client()
  • No hardcoded IPs or values — all config via env vars + SSOT

Test plan

  • 28 unit tests passing (parser, models, config, store)
  • Manual: curl -sk https://localhost:8443/api/autoresearch/status after deploy
  • Integration: Run actual training on .20 GPU node (requires autoresearch repo clone)

Related issues

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 27, 2026

⚠️ SSOT Configuration Compliance: Violations Found

Metric Count
Total Violations 2
SSOT Violations (high priority) 1
Other Violations 1

⚠️ 1 values have SSOT config equivalents!

These should be replaced with SSOT config imports:

Python:

from src.config.ssot_config import config
# Use: config.vm.main, config.port.backend, config.backend_url

TypeScript:

import config from '@/config/ssot-config'
// Use: config.vm.main, config.port.backend, config.backendUrl

📖 See SSOT_CONFIG_GUIDE.md for documentation.

mrveiss added 2 commits March 28, 2026 19:55
…#2597)

Milestone 1 of #1440 — AutoResearch integration. Implements:
- Data models (Experiment, ExperimentResult, HyperParams, ExperimentStats)
- Config module with env-var overrides
- Output parser for autoresearch training metrics (val_bpb, loss, tokens/sec)
- Dual persistence store (Redis for timeline, ChromaDB for semantic search)
- Subprocess-isolated experiment runner with timeout and auto-evaluation
- REST API: list/get/create experiments, stats, baseline, cancel
- 28 unit tests covering parser, models, config, and store
Critical fixes from code review:
- Remove erroneous await on sync get_redis_client() (would crash at runtime)
- Add input validation for hp.extra keys (prevent flag injection via allowlist)
- Add check_admin_permission auth to all routes

High-priority fixes:
- Make POST /experiments non-blocking via BackgroundTasks
- Fix state index inconsistency: pass old_state to save_experiment for cleanup
- Add asyncio.Lock for runner concurrency safety

Medium fix:
- Guard improvement_pct None in _build_document to prevent TypeError
@mrveiss mrveiss force-pushed the feature/autoresearch-m1 branch from 11e68f4 to 5be03eb Compare March 28, 2026 17:55
- Fix state-tracking race: remove duplicate update_experiment_state calls
  from _evaluate_result, rely on single save in finally block
- Add string value sanitization in _validate_extra_params: reject strings
  >256 chars or containing '--' to prevent flag injection
- Add Pydantic request models (CreateExperimentRequest, SetBaselineRequest)
  with field length constraints for POST endpoints
- Fix config.py env-var timing: move os.getenv calls into
  field(default_factory=...) for testability
- Fix list_experiments state-filtered ordering: use timeline sorted set
  scores for chronological order instead of lexicographic UUID sort
@mrveiss mrveiss merged commit 6fc44af into Dev_new_gui Mar 28, 2026
3 of 4 checks passed
@mrveiss mrveiss deleted the feature/autoresearch-m1 branch March 28, 2026 18:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant