Skip to content

feat: add Git resources and publish verification for DAGs #2091

@yohamta0

Description

@yohamta0

Problem

Authoring DAGs that safely publish repository changes currently requires too much hand-rolled shell. A DAG can pass schema validation while still missing critical product-level guarantees, such as:

  • verifying the exact repository, remote URL, branch, and expected remote SHA before mutation
  • refusing to push when the remote branch moved after analysis
  • recording the commit SHA and pushed remote SHA as structured artifacts
  • enforcing a repository-specific commit message policy
  • making it obvious which DAG definition revision a historical run used

This makes publishing workflows fragile. The YAML is structurally valid, but the run can still be semantically wrong.

Proposal

Add first-class Git publishing primitives and stricter run validation so DAG authors can express repository mutation workflows declaratively.

1. Repository resources

Introduce typed repository resources at the DAG level:

resources:
  repos:
    source:
      path: /path/to/repo
      remote: origin
      url: git@github.com:dagucloud/dagu.git
      branch: main
      require_clean_tracked: true

Dagu should validate:

  • the path is a Git checkout
  • the current branch matches branch when required
  • the configured remote URL matches url
  • tracked local changes are rejected when require_clean_tracked is true

2. Git executor

Add a type: git executor with safe built-in actions:

steps:
  - id: pull_source
    type: git
    with:
      repo: source
      action: pull_ff_only

  - id: create_worktree
    type: git
    with:
      repo: source
      action: worktree_from_remote
      ref: origin/main
      path: /tmp/worktrees/source-${DAG_RUN_ID}

  - id: commit
    type: git
    with:
      repo: source
      action: commit
      worktree: ${create_worktree.output.path}
      message: ${COMMIT_MESSAGE}
      message_policy: source_docs

  - id: push
    type: git
    with:
      repo: source
      action: push
      worktree: ${create_worktree.output.path}
      push_ref: HEAD:refs/heads/main
      expected_remote_sha: ${create_worktree.output.base_remote_sha}
      verify_remote: true

The push action should:

  • fetch before push
  • fail if the remote branch no longer matches expected_remote_sha
  • push only the exact push_ref
  • fetch after push
  • fail unless the remote branch now points to the pushed commit
  • emit structured output with commit_sha, remote_before_push, remote_after_push, remote, branch, and push_ref

3. Commit message policy

Add reusable commit message policies:

commit_message_policies:
  source_docs:
    allowed_prefixes: [doc, fix]
    max_length: 72
    forbidden_words: [daily, sweep, automation, generated]
    forbid_square_brackets: true

The Git commit action should reject invalid messages before creating the commit.

4. Semantic linting

Add dagu lint --strict checks that catch common publishing workflow bugs that schema validation cannot catch:

  • a DAG description says it pushes or publishes, but no push/publish step exists
  • a Git push step does not verify the remote after mutation
  • a mutation step follows approval but lacks structured output
  • a commit step uses a static vague message
  • multiple repository paths are referenced in shell but not declared as repository resources

5. Run spec identity

Record and show the DAG definition identity used by each run:

  • DAG file path
  • DAG spec hash
  • resolved base config hash, if applicable
  • captured DAG spec artifact

The UI and CLI should make it clear when a historical run used an older DAG definition than the one currently on disk.

Acceptance criteria

  • A DAG can declare Git repositories as typed resources.
  • A Git executor can pull, create worktrees, commit, push, and verify the remote branch without custom shell.
  • A push fails if the remote branch moved after the worktree was created.
  • A push succeeds only when post-push fetch confirms the remote branch points to the pushed commit.
  • Commit message policies reject vague or disallowed messages before commit creation.
  • dagu lint --strict reports semantic risks for publishing DAGs.
  • Run history exposes the DAG spec identity used by that run.

Non-goals

  • Replacing arbitrary shell steps.
  • Implementing a full GitHub issue or pull request workflow.
  • Supporting force-push as a default behavior.

Why this matters

Publishing DAGs are high-impact workflows. They need fail-closed primitives for repository identity, commit creation, push safety, and remote verification. Moving these concerns into Dagu would make complex automation DAGs substantially easier to author correctly on the first attempt.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions