Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new Dataset Selector source operator that lets users pick a dataset version in the property panel and emits one tuple per file path in that version, using Texera’s dataset path format.
Changes:
- Backend: introduce
DatasetSelectorSourceOpDesc+DatasetSelectorSourceOpExecand register the operator type. - Frontend: add a custom Formly field (
datasetversionselector) to select dataset + version and bind it todatasetVersionPath. - Tests/assets: add a basic descriptor/schema unit test and an operator icon.
Reviewed changes
Copilot reviewed 9 out of 10 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| frontend/src/assets/operator_images/DatasetSelector.png | Adds an icon for the new operator. |
| frontend/src/app/workspace/component/property-editor/operator-property-edit-frame/operator-property-edit-frame.component.ts | Maps datasetVersionPath to a custom Formly control type. |
| frontend/src/app/workspace/component/dataset-version-selector/dataset-version-selector.component.ts | Implements dataset+version selector logic and writes back datasetVersionPath. |
| frontend/src/app/workspace/component/dataset-version-selector/dataset-version-selector.component.html | UI for dataset and version dropdowns. |
| frontend/src/app/common/formly/formly-config.ts | Registers the new Formly field type datasetversionselector. |
| frontend/src/app/app.module.ts | Declares the new selector component. |
| common/workflow-operator/src/test/scala/.../DatasetSelectorSourceOpDescSpec.scala | Adds unit tests for descriptor metadata and output schema. |
| common/workflow-operator/src/main/scala/.../DatasetSelectorSourceOpExec.scala | Implements tuple production by resolving dataset version and listing objects. |
| common/workflow-operator/src/main/scala/.../DatasetSelectorSourceOpDesc.scala | Defines operator metadata + output schema (filename). |
| common/workflow-operator/src/main/scala/.../LogicalOp.scala | Registers DatasetSelector in the operator type list. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
...onent/property-editor/operator-property-edit-frame/operator-property-edit-frame.component.ts
Show resolved
Hide resolved
...d/src/app/workspace/component/dataset-version-selector/dataset-version-selector.component.ts
Outdated
Show resolved
Hide resolved
...d/src/app/workspace/component/dataset-version-selector/dataset-version-selector.component.ts
Outdated
Show resolved
Hide resolved
.../src/main/scala/org/apache/texera/amber/operator/source/dataset/FileListerSourceOpExec.scala
Show resolved
Hide resolved
.../src/main/scala/org/apache/texera/amber/operator/source/dataset/FileListerSourceOpExec.scala
Show resolved
Hide resolved
kunwp1
left a comment
There was a problem hiding this comment.
LGTM! One minor suggestion for the property panel: could we align the UI with the CSV file scan operator for consistency? Specifically, could we reuse the UI of opening a separate window to select the dataset and version followed by a "Select Dataset" button?
Updated following the suggestion. The screenshots are also updated. |
What changes were proposed in this PR?
This PR adds a new File Lister operator that allows users to select a dataset from the property panel and output one tuple per filepath in that version. The emitted values follow Texera’s existing dataset file path format, so they can be consumed directly by downstream operators. On the frontend, this PR adds a dedicated dataset-version selector field in the property panel and wires datasetVersionPath to that custom UI.
Any related issues, documentation, discussions?
Closes #4363.
How was this PR tested?
Tested manually, and a test case was added.
The test covers the dataset selector descriptor metadata and output schema.
Was this PR authored or co-authored using generative AI tooling?
No.