Is there an existing issue for this?
Problem statement
Pipeline bundles need a supported way to run custom Python before SDP table/view declarations (e.g. Spark conf, factory registration) and after declarations (e.g. @dp.on_event_hook), without mixing that code with importable extension modules.
Today, extensions/ is only added to sys.path as a flat directory, which:
Encourages flat imports (import utility) that collide with framework modules (e.g. utility) and confuse unit tests.
Does not distinguish “run once as a script” from “import as a library” used by Data Flow Specs (pythonModule, etc.).
There is no first-class hook lifecycle integrated with DLTPipelineBuilder.initialize_pipeline().
Proposed Solution
- Reorganise extensions under a single tree:
- extensions/libraries/ — importable modules; only this path (or the bundle/framework root if using package imports) is added to sys.path for spec resolution.
- extensions/pre_init/ — executable *.py files, run before the create_dataflow loop (after builder init: specs, substitutions, secrets, Spark config applied).
- extensions/post_init/ — executable *.py files, run after all dataflows are created.
- Execution model: runpy.run_path(..., run_name="main") per file; sorted order; skip _*.py; framework hooks before bundle hooks; exceptions propagate (fail pipeline).
- Backward compatibility: If extensions/libraries/ is missing but legacy flat extensions/*.py exists, add legacy path and emit a deprecation warning; remove legacy behavior in a later minor.
- Documentation & samples: Update Sphinx (feature_python_extensions.rst), bundle samples, scaffold script, and Cursor/skill snippets for the new paths.
Additional Context
Optional follow-ups (separate issues): sys.path including bundle src so from extensions.libraries... works without flat imports; richer error context when a hook raises; manifest-driven hook order (explicitly deferred in original design).
Is there an existing issue for this?
Problem statement
Pipeline bundles need a supported way to run custom Python before SDP table/view declarations (e.g. Spark conf, factory registration) and after declarations (e.g. @dp.on_event_hook), without mixing that code with importable extension modules.
Today, extensions/ is only added to sys.path as a flat directory, which:
Encourages flat imports (import utility) that collide with framework modules (e.g. utility) and confuse unit tests.
Does not distinguish “run once as a script” from “import as a library” used by Data Flow Specs (pythonModule, etc.).
There is no first-class hook lifecycle integrated with DLTPipelineBuilder.initialize_pipeline().
Proposed Solution
Additional Context
Optional follow-ups (separate issues): sys.path including bundle src so from extensions.libraries... works without flat imports; richer error context when a hook raises; manifest-driven hook order (explicitly deferred in original design).