Skip to content

Meta: Fully portable agents — memory + pipelines + flows as a round-trip bundle #1137

@chubes4

Description

@chubes4

Definition

An agent in Data Machine = memory files + pipelines + flows. All portable.

The datamachine_agents row is the handle. The agent itself is the complete bundle:

  • Memory files — agent-layer SOUL.md, MEMORY.md, daily memory, any custom agent-layer file registered via MemoryFileRegistry
  • Pipelines — every pipeline with agent_id = this agent, with their full pipeline_config (step types, labels, system prompts, provider, model, step-specific settings)
  • Flows — every flow whose pipeline belongs to this agent, with flow_config including handler_slugs and handler_configs

"Portable" means an agent can be exported from one install as a bundle artifact, moved (via filesystem copy, email, git, whatever), and imported into another install with byte-for-byte identical behavior — modulo install-specific concerns like auth credentials and owner_id resolution.

Why this matters

The direct motivation is distributable domain agents — a ciab-agent, a woocommerce-agent, a matt-agent, each shipping as a portable artifact that carries everything needed to generate and maintain its specific domain's wiki. Plugins don't ship these agents; users and orgs create them, export them, share them via git repos, and pull updates over time.

The generalization is much broader:

  • Backup / migration — export everything, move host, import, it still works
  • Templates — clone an agent as the starting point for a new one
  • Sharing — GitHub-hosted agent definitions that any install can clone
  • Canonical upstreamsAutomattic/woocommerce-agent becomes the community-maintained WC expert everyone can subscribe to

Today, none of this works. Pipeline export is partial (config captured but import drops it). Flow config is exported but never read on import. Memory files aren't part of export at all. There's no agent-level export ability. The CSV format itself isn't fit for purpose.

Current state

Already works

What's broken or missing

  1. Pipeline import is lossystep_config (system prompts, provider, model, label) is parsed out of CSV on import and thrown away. (fix(import-export): pipeline import is lossy — step_config dropped, prompts/models/handlers lost #1133, step 1 in flight.)
  2. Flow import is non-existent — the flow rows of CSV export (flow_id, flow_name, handler, settings columns) are never read on import at all. Flows + handler configs simply don't restore.
  3. No agent-level export — nothing bundles memory files + pipelines + flows into a single artifact. You have to manually export pipelines, separately ship memory files, separately reconstruct flows.
  4. No agent-level import — symmetric gap.
  5. CSV format is a dead end — cramped for the data model, secrets in plaintext, no JSON structure for nested config, flow rows and step rows share a schema that should be separate.
  6. No manifest filter — no way to declare "this export includes SOUL but not MEMORY" or "this export strips tokens." One-size-fits-all doesn't fit the share-vs-backup-vs-fork spectrum.
  7. No auth reference model — handler_configs commonly contains tokens, keys, bearers. Exporting them verbatim leaks secrets; stripping them breaks flows. Need a handle-based reference convention that resolves against the target install's auth providers at import time.

Sub-issues and work sequence

Each item is generic, standalone, composable. Dependency order shown:

Phase 1 — Pipeline round-trip (foundation)

# Title Status Owner
1a Pipeline import restores step_config (system_prompt, provider, model, label) 🍳 in flight (#1133 step 1, PR open via agent session) DM
1b Flow import restores flow_config including handler_slugs + handler_configs 📋 filed as #1133 step 2 DM

Phase 2 — Agent bundle format + export/import abilities

# Title Status Owner
2a Agent bundle format — directory layout + manifest.json schema, replaces CSV 📋 #1303 DM
2b datamachine_agent_export_manifest filter — selective portability (share/backup/fork profiles) 📋 #1304 DM
2c datamachine/export-agent ability — takes agent_id, honors manifest filter, emits bundle 📋 #1305 DM
2d datamachine/import-agent ability — takes bundle, materializes agent + memory + pipelines + flows 📋 #1306 DM

Phase 3 — Auth handling

# Title Status Owner
3a Auth reference convention — auth_ref: "slack:default" handles in handler_configs, resolved at import via existing auth providers 📋 #1307 DM
3b Secret stripping policy on export — default to refs mode, full opt-in with encryption story, omit for fork profile ❌ not filed DM

Phase 4 — Distribution integration

# Title Status Owner
4a Agent bundle ↔ GitSync binding convention — an agent bundle directory binds to an upstream git repo via DMC GitSync ❌ not filed DMC / Intelligence

Proposed bundle format (Phase 2a)

Directory layout, zip-or-dir distribution, human-readable, git-friendly:

<agent-slug>/
├── manifest.json           identity, included-sections, schema_version
├── memory/
│   ├── SOUL.md
│   ├── MEMORY.md
│   ├── daily/
│   │   ├── 2026-04-15.md
│   │   └── 2026-04-16.md
│   └── <custom>.md         any registered agent-layer file
├── pipelines/
│   ├── <pipeline-slug>.json    full pipeline_config keyed by fresh step IDs
│   └── …
└── flows/
    ├── <flow-slug>.json        flow_config with handler_slugs + handler_configs (auth refs, not tokens)
    └── …

Manifest schema sketch:

{
  "schema_version": 1,
  "exported_at": "2026-04-20T15:30:00Z",
  "agent": {
    "slug": "woocommerce-agent",
    "label": "WooCommerce Knowledge Keeper",
    "description": "Maintains the WooCommerce wiki.",
    "agent_config": { /* agent_config JSON */ }
  },
  "included": {
    "memory":       ["SOUL.md", "MEMORY.md"],
    "pipelines":    ["wc-daily-ingest", "wc-weekly-lint"],
    "flows":        ["wc-daily-ingest-flow", "wc-weekly-lint-flow"],
    "handler_auth": "refs"
  }
}

Rationale for directory + JSON rather than a single JSON blob:

  • Memory files stay as real markdown files — readable, PR-reviewable, editable in place
  • Each pipeline/flow is its own file — git diffs are scoped, merges are sane
  • Composes with MDI's wiki directory (separate sibling wiki/ subtree if agent + wiki ship in one repo)
  • Zipping the directory is a trivial distribution shortcut when you want a single file

Proposed manifest filter (Phase 2b)

apply_filters(
    'datamachine_agent_export_manifest',
    array(
        'soul'         => true,   // bool or filename
        'memory'       => false,  // MEMORY accumulates per-install; default excluded
        'user'         => false,  // USER.md is per-install personal; excluded
        'daily_memory' => false,  // typically excluded
        'agent_config' => true,
        'pipelines'    => true,
        'flows'        => true,
        'handler_auth' => 'refs', // 'refs' | 'full' | 'omit'
    ),
    $agent_id
);

Built-in profiles emerge as presets of filter values:

Profile SOUL MEMORY USER pipelines flows handler_auth
share (default) refs
backup full (encrypted)
fork shapes only omit

Profiles are opt-in via ability input (profile: 'backup'); users who need something bespoke supply the filter values directly.

Proposed auth reference model (Phase 3a)

Every handler_config that carries credentials replaces the raw secret with a handle reference at export time:

// Before (live in DB):
{
  "handler_slugs": ["slack"],
  "handler_configs": {
    "slack": { "token": "xoxb-real-token-value", "channel": "ceo" }
  }
}

// After (exported with handler_auth='refs'):
{
  "handler_slugs": ["slack"],
  "handler_configs": {
    "slack": { "auth_ref": "slack:default", "channel": "ceo" }
  }
}

At import, the target install resolves auth_ref against its own configured auth providers. Missing refs surface as "agent imported but flows require auth setup" rather than silent breakage or leaked secrets.

The auth_ref convention is also useful beyond export/import — it cleanly expresses "this flow uses the site's default Slack creds" regardless of which install the flow runs on, making handler configs genuinely portable across environments.

Relationship to adjacent work

#1131datamachine_register_agents hook

Complementary, not overlapping. The registration hook is for plugin-bundled static agents (DM's default admin, hypothetical helper agents). The portable agents feature is for user-created domain agents that travel as file artifacts. Both coexist:

  • A plugin can ship a bundled helper agent via register_agents with memory_seeds
  • A user can export a domain agent they built to a git repo
  • Another user can import that bundle on a different install
  • Neither path needs the other

DMC #42 — GitSync primitive

Phase 4 dependency. Once the bundle format is stable, an agent bundle directory becomes a natural GitSync binding target. The binding points at a canonical upstream repo (e.g. Automattic/woocommerce-agent); GitSync handles pull/submit. MEMORY.md stays excluded from sync via allowed_paths. SOUL + pipelines + flows flow from upstream to every bound install.

Automattic/markdown-database-integration — wiki content

Orthogonal. Agents are the maker; wikis are the made. MDI already handles wiki portability via markdown-on-disk + git. An agent bundle and a wiki directory can ride together in one repo (e.g. Automattic/woocommerce-agent-and-wiki/{agent,wiki}/) but through different export mechanisms. Mixing them would conflate concerns.

Automattic/intelligence#78 / #89 — wiki-generator vision

The direct use case this unblocks. Per-domain agents (ciab-agent, woocommerce-agent) built manually first, then exported as portable bundles once the primitives in this meta exist. Intelligence ships reusable mechanics (wiki-generation pipeline templates, lint routines) as toolchain, not bundled domain agents.

Success criteria

An agent can be exported on install A and imported on install B such that:

  • wp datamachine agent list on B shows the agent with identical label, description, agent_config
  • Agent-layer memory files are present on B at the same paths with the same content (per manifest inclusion rules)
  • Every pipeline on A with agent_id = this agent exists on B with byte-identical pipeline_config
  • Every flow on A for those pipelines exists on B with byte-identical flow_config, modulo handler auth (which resolves via B's auth providers)
  • Running the same pipeline/flow on B produces the same behavior it would on A, once B's auth is configured
  • Round-tripping (export B → import C) preserves state indefinitely

Non-goals for this meta

  • Wiki content portability — MDI already handles this. Agents move toolchain, not artifacts.
  • Job queue state, processed-items history, scheduled runs — these are runtime state, not pipeline definition. Out of scope.
  • Cross-install agent identity merging — if you import an agent slug that already exists on the target install, the right behavior is collision error (with --replace opt-in). Not smart merging.
  • Secret escrow / encrypted full backupshandler_auth: 'full' needs a secret model. Deferred to phase 3b; not blocking core round-trip.

Tracking

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions