Skip to content

feat(cli,portability): attestix import — consume M6 cloud bundles (closes the portability round-trip)#93

Merged
ascender1729 merged 1 commit into
mainfrom
feature/cli-bundle-import
May 28, 2026
Merged

feat(cli,portability): attestix import — consume M6 cloud bundles (closes the portability round-trip)#93
ascender1729 merged 1 commit into
mainfrom
feature/cli-bundle-import

Conversation

@ascender1729
Copy link
Copy Markdown
Member

@ascender1729 ascender1729 commented May 28, 2026

What

OSS attestix import <bundle.tar.gz> — consumes the cloud M6 P1 bundle wire-format and hydrates a local SQLite via the existing OSS service layer. Closes the portability round-trip the cloud bundled at https://attestix.io/spec/bundle/v1.

Wire-format compatibility

  • Reuses the existing JCS canonicaliser at attestix/auth/crypto.py::canonicalize_json (the same helper the OSS signer + AuditEvent chain depend on). No second canonicaliser introduced.
  • Re-verifies per-table sha256 from the manifest against tar contents + recomputes the manifest's own sha (JCS-canonical with manifest.sha256 field stripped) against the side-car manifest.sha256.
  • Schema-version guard: BundleSchemaTooNewError if manifest.schemas.db_migration_max > SUPPORTED_DB_MIGRATION_MAX (0010).
  • Tarball safety: regular files only (no symlinks/dirs/../absolute paths); 128 MiB member cap; 256 MiB total bundle cap.

Import sequence

Dependency-ordered. OSS-relevant tables (skipping cloud-only workspaces/users/memberships/subscriptions/etc):

  1. identities (OSS identities)
  2. credentials (OSS credentials)
  3. compliance_profiles (OSS compliance)
  4. conformity_assessments (nested in OSS compliance doc)
  5. audit_eventschain-verified end-to-end via attestix.audit.events.verify_chain BEFORE any commit; failure aborts the whole import atomically
  6. anchors (OSS anchors)

Audit rows stay under the bundle's original tenant slug because chain_hash includes tenant_id in the canonicalised body — the CLI prints a note when --workspace overrides storage tenant for the other tables.

CLI

  • attestix import <bundle.tar.gz> — default refuses on non-empty store (friendly hint to --force).
  • --force — proceeds despite existing data.
  • --workspace <name> — re-tenant non-audit tables to a chosen local label.
  • --verify-only — parses + verifies bundle integrity end-to-end (manifest sha, per-table shas, JCS round-trip, chain consistency); writes nothing.
  • One progress line per table: [✓] audit_events 1342 rows sha256:abc12345….

Test plan

  • 507 tests pass / 2 skipped (same 2 historical, including POSIX-chmod Windows). +26 new: 10 tests/portability/test_bundle_reader.py + 9 tests/portability/test_importer.py + 7 tests/cli/test_import_command.py.
  • Targeted: pytest tests/portability tests/cli -v → 26 pass.
  • Fixture bundle tests/fixtures/bundles/sample-v1.tar.gz = 2,768 bytes; generator at tests/fixtures/bundles/generate_sample_bundle.py is byte-deterministic (mtime=0).

Deferred (P2)

  • Cross-repo CI hook proving byte-parity vs a cloud-worker-produced bundle.
  • Parquet decode path (cloud M6 P2 will emit Parquet — OSS reader hard-codes JSONL today).
  • --merge mode for idempotent re-import.
  • --resolve-did-web for live did:web resolution (today: structural-only, offline-safe).

Summary by CodeRabbit

Release Notes

  • New Features
    • Added attestix import CLI command to ingest portability bundles into local storage.
    • Supports bundle integrity verification with --verify-only flag.
    • Includes --force flag to allow imports alongside existing data.
    • Added --workspace option to target specific tenants during import.
    • Displays bundle metadata and per-table import status with rows written and skipped table information.

Review Change Stack

…oses the portability round-trip)

Closes the OSS side of the M6 P1 cloud exporter handshake. Bundles ship as
USTAR + gzip tarballs of JCS-canonical JSONL files plus a manifest, and the
OSS now consumes them through:

  attestix import <bundle.tar.gz> [--force] [--workspace <name>] [--verify-only]

* attestix/portability/bundle_reader.py — tarfile-based reader that verifies
  the manifest's own SHA (re-canonicalising through the existing JCS helper
  in attestix.auth.crypto.canonicalize_json — no second canonicaliser) and
  every per-table SHA, with a schema-version guard refusing bundles stamped
  with a db_migration_max newer than this build supports.
* attestix/portability/importer.py — applies a verified bundle through the
  v0.4.0 Repository boundary in dependency order (identities -> credentials
  -> compliance_profiles -> conformity_assessments -> audit_events -> anchors).
  audit_events are chain-verified end-to-end via attestix.audit.events.verify_chain
  BEFORE any write commits, so a broken chain aborts the entire import; the
  chain rows stay under the bundle's original tenant slug because chain_hash
  includes tenant_id. did:key identities round-trip through DIDService; did:web
  identities are structurally validated against the embedded did_document.
* attestix/cli.py — new 'import' subcommand with --force / --workspace /
  --verify-only flags, a non-empty-storage guard, per-table sha progress line,
  and verifier next-step hints.
* Fixture: tests/fixtures/bundles/sample-v1.tar.gz (2,768 bytes) plus the
  generator script that produces it. The generator mirrors the cloud exporter's
  USTAR + gzip + JCS-JSONL format byte-for-byte and is deterministic
  (mtime=0 in the gzip header), so the committed bundle is reproducible.
* Tests: 10 bundle-reader cases (valid parse, manifest tamper, per-table
  tamper, schema-too-new, schema-older-accepted, unknown format URL, member
  path safety), 9 importer cases (round-trip counts, chain verification,
  tenant isolation, --force semantics, cloud-only skip, conformity-assessment
  routing into compliance.assessments), 7 CLI cases (--verify-only, default
  refusal on non-empty, --force override, --workspace remap, help listing).

Zero new dependencies: stdlib tarfile + hashlib + gzip only. All 507 pre-existing
tests still pass (26 new tests bring the suite to 509).
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 28, 2026

Caution

Review failed

Pull request was closed or merged during review

Note

.coderabbit.yaml has unrecognized properties

CodeRabbit is using all valid settings from your configuration. Unrecognized properties (listed below) have been ignored and may indicate typos or deprecated fields that can be removed.

⚠️ Parsing warnings (1)
Validation error: Unrecognized key: "ignore"
⚙️ Configuration instructions
  • Please see the configuration documentation for more information.
  • You can also validate your configuration using the online YAML validator.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json
📝 Walkthrough

Walkthrough

This PR introduces a complete portability data import feature for Attestix: bundle wire-format parsing with integrity verification, dependency-ordered row staging with data projection, audit-chain reconciliation and DID validation, CLI integration with force/workspace/verify-only options, and comprehensive end-to-end testing with deterministic fixture generation.

Changes

Portability Bundle Import

Layer / File(s) Summary
Bundle format, schema, and integrity verification
attestix/portability/bundle_reader.py
Bundle wire-format constants and exception hierarchy; Bundle dataclass with manifest metadata/payloads; row iteration via iter_rows() and rows(); integrity verification via verify_table_sha(), verify_manifest(), and verify(); schema compatibility guard rejecting bundles newer than OSS-supported db_migration_max.
Bundle tarball parsing and validation
attestix/portability/bundle_reader.py, attestix/portability/__init__.py
read_bundle() function opens gzip tarballs, enforces member/size caps and safe naming, parses manifest.json and manifest.sha256 side-car, validates structure and version/format fields, builds per-table metadata, and returns fully hydrated Bundle or raises BundleError-derived exceptions. Portability package entrypoint re-exports public API symbols.
Data projection and import plan
attestix/portability/importer.py
Row-mapping helpers transform cloud bundle shapes to OSS collection shapes (identity, credential, profile, conformity, anchor, audit-event), with audit-event re-tagging to preserve workspace slug for chain verification. IMPORT_PLAN documents dependency-ordered bundle-to-OSS table mapping with per-table id-fields and projection functions; cloud-only tables marked with oss_collection=None.
Importer execution and staging logic
attestix/portability/importer.py
Importer.run() validates bundle schema compatibility and integrity, stages rows per IMPORT_PLAN tracking counts, performs pre-commit audit-chain reconciliation (aborts import on verification failure), validates identity DID documents (did:key resolution round-trip; did:web structure/id checks), supports verify_only early exit, commits staged rows via Repository with tenant routing (audit under bundle workspace slug; other collections under configured tenant), handles conformity assessments via compliance document API. Produces ImportResult with per-table summaries and chain verification status.
CLI import subcommand with options
attestix/cli.py
New attestix import command accepts required bundle_path, --force (import into non-empty stores), --workspace (override target tenant), --verify-only (validation-only). Lazily imports portability types, parses bundle, determines target tenant, blocks on non-empty local data unless forced, outputs bundle metadata (format/versions/export info), runs importer with error handling, reports per-table status and audit-chain reconciliation results, exits early in verify-only mode, prints success + suggested follow-ups.
Bundle fixture generator and test suites
tests/fixtures/bundles/generate_sample_bundle.py, tests/portability/test_bundle_reader.py, tests/portability/test_importer.py, tests/cli/test_import_command.py
Deterministic fixture generator creates sample-v1.tar.gz (USTAR + gzip, mtime=0) with canonical JSONL and manifest; supports tampering modes (SHA corruption, schema bump) for negative coverage. Reader tests validate committed fixture loads/verifies, parses tables, detects tampering in manifest/bodies, rejects schema-too-new bundles and unrecognised formats, confirms snake_case JSON output. Importer tests validate round-trip writes, audit-chain re-verification, verify_only no-write semantics, cross-tenant isolation, force override, workspace override targeting, compliance document persistence, cloud-only table skipping. CLI tests cover verify-only success/failure, import on empty/non-empty stores, force override, workspace selection, help text.

Sequence Diagram

sequenceDiagram
    participant CLI as CLI import command
    participant Reader as read_bundle()
    participant Importer as Importer.run()
    participant Chain as verify_chain()
    participant DID as DID validation
    participant Repo as Repository
    CLI->>Reader: parse bundle from tarball
    Reader->>Reader: verify manifest/table SHAs
    Reader->>Reader: check schema compatibility
    Reader-->>CLI: return Bundle
    CLI->>Importer: run(bundle, force, verify_only)
    Importer->>Importer: stage rows per IMPORT_PLAN
    Importer->>Chain: reconstruct & verify audit chain
    Importer->>DID: validate identity DIDs (did:key, did:web)
    alt verify_only mode
        Importer-->>CLI: ImportResult (no writes)
    else normal import
        Importer->>Repo: commit staged rows
        Repo->>Repo: write collections under configured tenant
        Repo->>Repo: write audit under bundle workspace slug
        Importer-->>CLI: ImportResult (with counts & chain_verified)
    end
    CLI->>CLI: print summary & next steps
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • VibeTensor/attestix#83: Portability importer audit-chain reconciliation and tenant-scoped audit persistence depend directly on the v0.4.0 structured audit event API (AuditEvent/verify_chain, tenant-context handling).
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 42.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and specifically describes the main change: adding a CLI command to consume M6 cloud bundles and complete the portability round-trip.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/cli-bundle-import

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment thread tests/portability/test_bundle_reader.py Dismissed
from __future__ import annotations

import hashlib
import io
import tarfile
from dataclasses import dataclass, field
from pathlib import Path
from typing import Dict, Iterator, List, Optional, Tuple, Union
import gzip
import hashlib
import io
import os
from dataclasses import dataclass, field
from typing import Callable, Dict, List, Optional, Tuple

from attestix.audit.events import AuditEvent, verify_chain

from __future__ import annotations

import shutil
@ascender1729 ascender1729 merged commit f9715c8 into main May 28, 2026
17 of 19 checks passed
@ascender1729 ascender1729 deleted the feature/cli-bundle-import branch May 28, 2026 08:12
ascender1729 added a commit that referenced this pull request May 28, 2026
…s (closes symmetric portability)

Adds the OSS-side exporter symmetric to the importer that landed in PR #93.
Closes the persona-review critical: OSS users could import but not export,
violating the constitutional zero-lock-in promise.

- New attestix export <output.tar.gz> CLI subcommand with --workspace,
  --include-anchors/--include-audit, --force, --no-pretty options.
- New attestix/portability/bundle_writer.py implements write_bundle() as the
  inverse of importer.py - reads through Repository, projects each OSS row to
  the cloud wire-format shape, JCS-canonicalises via the shared
  auth.crypto.canonicalize_json (no second canonicaliser), and emits a USTAR
  + gzip tarball that bundle_reader + Importer accept byte-cleanly.
- Deterministic output: alphabetical member order, USTAR mtime=0, gzip
  header mtime=0; synthetic cloud UUIDs derived via UUID5 over row content
  so two consecutive exports of the same OSS state are byte-identical.
- Audit-chain integrity: workspace.slug is inferred from the audit_events
  collection's tenant (or --workspace override) so the importer's chain
  re-verification round-trips cleanly.
- EXPORT_PLAN mirrors the cloud worker's EXPORT_TABLE_SPECS order so the
  manifest tables array and on-disk member order are byte-comparable across
  cloud and OSS implementations.
- 12 new tests (6 round-trip + 6 CLI smoke). Suite goes 507 -> 519 passing.
ascender1729 added a commit that referenced this pull request May 28, 2026
…s (closes symmetric portability) (#95)

Adds the OSS-side exporter symmetric to the importer that landed in PR #93.
Closes the persona-review critical: OSS users could import but not export,
violating the constitutional zero-lock-in promise.

- New attestix export <output.tar.gz> CLI subcommand with --workspace,
  --include-anchors/--include-audit, --force, --no-pretty options.
- New attestix/portability/bundle_writer.py implements write_bundle() as the
  inverse of importer.py - reads through Repository, projects each OSS row to
  the cloud wire-format shape, JCS-canonicalises via the shared
  auth.crypto.canonicalize_json (no second canonicaliser), and emits a USTAR
  + gzip tarball that bundle_reader + Importer accept byte-cleanly.
- Deterministic output: alphabetical member order, USTAR mtime=0, gzip
  header mtime=0; synthetic cloud UUIDs derived via UUID5 over row content
  so two consecutive exports of the same OSS state are byte-identical.
- Audit-chain integrity: workspace.slug is inferred from the audit_events
  collection's tenant (or --workspace override) so the importer's chain
  re-verification round-trips cleanly.
- EXPORT_PLAN mirrors the cloud worker's EXPORT_TABLE_SPECS order so the
  manifest tables array and on-disk member order are byte-comparable across
  cloud and OSS implementations.
- 12 new tests (6 round-trip + 6 CLI smoke). Suite goes 507 -> 519 passing.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants