I think drevalpy could use a more hierarchical CLI structure. typer allows to build beautiful CLIs like this quite easily. Something like this:
drevalpy (if no further command, runs full pipeline)
load: Loads raw dataset data (e.g. CTRP) from the official sources and stores in standard format
curation: (if no further command, takes standard format and runs curvecurator, inkl. pre- and postprocessing)
split: takes standard format and creates groups, mostly for nextflow pipeline
fit: processes a single group
split: Takes a curated dataset and creates a fold split
train: Takes a fold split and a model and performs training
predict: Takes a fold split and a trained model and creates predictions
evaluate: Takes predictions and computes performance metrics
report: Takes predictions and performance metrics and creates MultiQC report
I mostly thought about the load and curation subcommands so far. I am aware that there are some CLI commands currently available that are not reflected in the list above. Ideally I think these could be bundled under the matching umbrella (e.g. as subcommands of trainor split.
As the discussion proceeds and I get deeper into the train/predict structure of the package, I will update the list above with more precise information.
I think drevalpy could use a more hierarchical CLI structure. typer allows to build beautiful CLIs like this quite easily. Something like this:
drevalpy(if no further command, runs full pipeline)load: Loads raw dataset data (e.g. CTRP) from the official sources and stores in standard formatcuration: (if no further command, takes standard format and runs curvecurator, inkl. pre- and postprocessing)split: takes standard format and creates groups, mostly for nextflow pipelinefit: processes a single groupsplit: Takes a curated dataset and creates a fold splittrain: Takes a fold split and a model and performs trainingpredict: Takes a fold split and a trained model and creates predictionsevaluate: Takes predictions and computes performance metricsreport: Takes predictions and performance metrics and creates MultiQC reportI mostly thought about the
loadandcurationsubcommands so far. I am aware that there are some CLI commands currently available that are not reflected in the list above. Ideally I think these could be bundled under the matching umbrella (e.g. as subcommands oftrainorsplit.As the discussion proceeds and I get deeper into the train/predict structure of the package, I will update the list above with more precise information.