Skip to content

[FEATURE]: Pipeline init hooks and extensions/libraries layout #38

@rederik76

Description

@rederik76

Is there an existing issue for this?

  • I have searched the existing issues

Problem statement

Pipeline bundles need a supported way to run custom Python before SDP table/view declarations (e.g. Spark conf, factory registration) and after declarations (e.g. @dp.on_event_hook), without mixing that code with importable extension modules.

Today, extensions/ is only added to sys.path as a flat directory, which:

Encourages flat imports (import utility) that collide with framework modules (e.g. utility) and confuse unit tests.
Does not distinguish “run once as a script” from “import as a library” used by Data Flow Specs (pythonModule, etc.).

There is no first-class hook lifecycle integrated with DLTPipelineBuilder.initialize_pipeline().

Proposed Solution

  1. Reorganise extensions under a single tree:
    • extensions/libraries/ — importable modules; only this path (or the bundle/framework root if using package imports) is added to sys.path for spec resolution.
    • extensions/pre_init/ — executable *.py files, run before the create_dataflow loop (after builder init: specs, substitutions, secrets, Spark config applied).
    • extensions/post_init/ — executable *.py files, run after all dataflows are created.
  2. Execution model: runpy.run_path(..., run_name="main") per file; sorted order; skip _*.py; framework hooks before bundle hooks; exceptions propagate (fail pipeline).
  3. Backward compatibility: If extensions/libraries/ is missing but legacy flat extensions/*.py exists, add legacy path and emit a deprecation warning; remove legacy behavior in a later minor.
  4. Documentation & samples: Update Sphinx (feature_python_extensions.rst), bundle samples, scaffold script, and Cursor/skill snippets for the new paths.

Additional Context

Optional follow-ups (separate issues): sys.path including bundle src so from extensions.libraries... works without flat imports; richer error context when a hook raises; manifest-driven hook order (explicitly deferred in original design).

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions