Open
Conversation
…regression format
…es on develop to feature/forecasting-task
…rics and tests for multi-output support
…shAI - Implementa ForecastingTask para predicción de series temporales - Añade ProphetModel como wrapper de Facebook Prophet - Valida datasets con columna temporal (ds) y frecuencia - UI muestra requisitos específicos para ForecastingTask - Métricas de forecasting: sMAPE, MAPE, MASE - División temporal (temporal splits) para evitar data leakage - Predice únicamente timestamps presentes en dataset seleccionado - Soporte para variables exógenas (regresores externos) Fixes: Errores de linting corregidos (PD901, E501, N803, F841)
- Renombra todas las referencias de sMAPE a SMAPE - Corrige imports en __init__.py de metrics y forecasting - Actualiza initial_components.py con el nuevo nombre
- Add ExtendTimeSeriesConverter to extend time series datasets with future timestamps - Supports automatic frequency inference (daily, hourly, monthly, etc.) - Auto-detects datetime columns or allows manual specification - Handles edge cases: irregular intervals, duplicate timestamps, empty datasets - Implements safety limit (MAX_N_STEPS=100,000) to prevent memory issues - Includes comprehensive validation and descriptive error messages - Register converter in __init__.py and initial_components.py - Ready for use in forecasting prediction workflows
…oint
Add the following lines to get_dataset_file() at line ~727:
# Use jsonable_encoder to handle Timestamp and other
# non-JSON-serializable types
row = jsonable_encoder(row)
This fixes TypeError: 'Object of type Timestamp is not JSON serializable'
that occurs when the /api/v1/dataset/file/ endpoint tries to return
datasets containing pandas.Timestamp objects (from datetime columns).
Why needed:
- Arrow stores timestamps as timestamp[ns] type
- PyArrow's .as_py() converts these to pandas.Timestamp objects
- pandas.Timestamp is not JSON serializable by Python's json module
- FastAPI's jsonable_encoder converts Timestamp to ISO 8601 strings
Impact:
- Only affects HTTP response serialization for frontend display
- Does not modify underlying Arrow data storage
- Aligns with existing behavior in /sample endpoints
- Enables proper visualization of time series datasets after using
converters like ExtendTimeSeriesConverter
- Implemented ForecastUncertainty explainer for analyzing prediction uncertainty in forecasting models. - Created base classes for global and local explainers specialized for forecasting tasks, providing common functionality for timestamp handling, exogenous variable management, and data preparation. - Developed an abstract ForecastingModel class to standardize the interface for all forecasting models, ensuring model-agnostic handling of time series data and exogenous variables. - Included methods for fitting models, generating predictions, and retrieving column names in their original format.
- Implement StatsmodelsARIMAModel wrapper for statsmodels ARIMA - Implement StatsmodelsSARIMAXModel wrapper for statsmodels SARIMAX - Both models support: - Exogenous variables - Temporal metadata from ForecastingTask - In-sample and out-of-sample predictions - Model save/load functionality - Add complete schemas with all model parameters - Export new models in forecasting __init__.py
- Implement SklearnMultiStepForecaster: sklearn-based multi-step forecasting model - Inherits from ForecastingModel (compatible with ForecastingTask) - Automatically creates lag features from time series - Supports multiple base estimators (linear, ridge, random_forest) - Two forecasting strategies: direct (separate model per horizon) and recursive - Handles exogenous variables - Full save/load functionality - Remove MultiOutputRegressionTask to avoid confusion - Eliminated from tasks/__init__.py - Removed from initial_components.py - Updated MultiOutputRegression to only be compatible with RegressionTask - Removed from metrics (MAPE, SMAPE, regression_metric) - Removed from model_job.py This change provides a cleaner, more intuitive approach to multi-step forecasting by using the ForecastingTask infrastructure instead of the confusing multi-output regression approach.
The default trend='c' (constant) caused errors when d>0 (differencing is applied). Statsmodels doesn't allow constant trends with integration because they would be eliminated by the differencing operation. Changes: - Set default trend='n' (no trend) for both ARIMA and SARIMAX - Update schema descriptions to explain trend restrictions with integration - This prevents the ValueError: 'constant cannot be included in ARIMA(p,d,q) model when d>0' Users can still manually select 't' (linear trend) which is valid with d=1, or 'n' for no trend.
- Get temporal_metadata from task.get_temporal_metadata() after prepare_for_task - Pass temporal_metadata to model.fit() for ForecastingTask models - Fixes SklearnMultiStepForecaster requiring temporal_metadata parameter - Prophet, ARIMA, SARIMAX now receive metadata consistently
- Implement in-sample prediction mode for metrics calculation - Use 1-step ahead model for in-sample predictions (standard practice) - Handle NaN values in first window_size predictions due to lag creation - Improve base_forecasting_model.py documentation: * Emphasize MUST support both in-sample and out-of-sample modes * Add warning about NotImplementedError for in-sample predictions * Add _validate_predict_implementation() helper for testing - Fixes 'Metrics calculation failed' error during model evaluation
- Store complete training series (training_full_series) during fit() - In-sample predictions now use historical data instead of requiring target in input - Fixes 'Target column not found' error during metrics calculation - Matches Prophet/ARIMA behavior: only needs timestamps for prediction - Update save/load to persist full training history
- Add NaN filtering in prepare_to_metric() for forecasting models - Handles single-output and multi-output regression cases - Prevents 'Out of range float values are not JSON compliant' error - Essential for lag-based forecasting models that produce NaN in first window_size predictions
- Prophet: Handle time series with missing dates (gaps) * Return NaN for missing timestamps instead of raising error * prepare_to_metric() filters NaN values before computing metrics * Prevents failures when data has irregular timestamps - Metrics: Add HIGHER_IS_BETTER attribute to BaseMetric * True for Accuracy, F1, Precision, Recall (maximize) * False for MAE, RMSE, MAPE, SMAPE (minimize) * Enables systematic optimization direction detection - OptunaOptimizer: Use metric.HIGHER_IS_BETTER instead of hardcoded list * Removes hardcoded metric names * Correctly detects optimization direction from metric class * Works with any metric, including custom ones - HyperOptOptimizer: Fix optimization direction * fmin always minimizes, so multiply by -1 for maximize metrics * Previously multiplied by 1 (incorrect for Accuracy/F1) * Now correctly handles both minimize and maximize metrics Fixes: - Prophet failing with 'Unable to obtain predictions for requested timestamps' - SMAPE optimization increasing instead of decreasing - Optimizer direction based on metric name instead of metric property
- Add forecast_periods optional parameter to prediction schema - Implement automatic timestamp generation in PredictJob when forecast_periods is provided - Add UI input field for forecast periods (ForecastingTask only) - Make dataset_id optional when using auto-generated timestamps - Generate future dates from last_training_date + frequency - Block auto-generation for models with exogenous variables - Update frontend components to pass forecast_periods through the flow - Maintain backward compatibility with existing dataset upload flow
- Add ForecastingTask with Prophet, ARIMA, SARIMAX, SklearnMultiStepForecaster - Add ForecastDecomposition and ForecastUncertainty explainers - Add prediction job support for forecasting tasks - Fix ruff linting issues
…tion steps - Added temporal information fetching for forecasting tasks in NewGlobalExplainerModal and NewLocalExplainerModal. - Integrated temporal info display in SelectDatasetStep, including frequency validation and mismatch alerts. - Updated PredictionModal to fetch temporal info for preselected and manually selected models. - Introduced ForecastingExplainerInfo component to display detailed temporal properties and explainer-specific information. - Enhanced user experience with loading indicators and success/error messages related to temporal frequency matching.
…l and row partitioning components
- Added methods for obtaining forecast uncertainty and components in various forecasting models (e.g., Prophet, ARIMA, SARIMAX, SklearnMultiStepForecaster). - Improved timestamp handling in forecasting tasks to accommodate numeric time-step indices. - Updated prediction status handling in the frontend to utilize a new utility function for better localization. - Refactored prediction modal and results table components to streamline status display and loading indicators. - Enhanced error handling and logging for dataset reading and model predictions.
…smodelsARIMAModel
Resolves all 17 merge conflicts by keeping forecasting changes while adopting upstream's direct import style and structural refactors. Key resolutions: - Migrated package-level __init__.py imports to direct imports in initial_components.py - Added forecasting models/metrics/job/explainers/task as direct imports - Fixed forecasting files to use direct BaseModel/BaseTask/BaseMetric imports - Kept forecasting-specific predict_job.py logic (if/else forecasting vs standard) - Retained numpy/pyarrow/math imports required by forecasting code - Adopted upstream's useMemo/state patterns in React components - SplitDatasetTemporal.jsx moved to correct modelSession/ directory - Added prophet, statsmodels, and pywebview to requirements.txt Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- datasets.py: quote 'sessionmaker' type annotation in temporal-info endpoint (TYPE_CHECKING guard — must be string literal on Python 3.14) - forecasting_job.py: use direct import for BaseOptimizer since back/optimizers/__init__.py was emptied by upstream - LiveMetricsChart.jsx: remove duplicate filteredMetrics declaration left over from merge conflict resolution
TYPE_CHECKING-only imports are not available at runtime in Python 3.10-3.13. Method parameter annotations in class bodies are evaluated eagerly in those versions (unlike Python 3.14 which uses deferred evaluation via PEP 649).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add Native Time Series Forecasting Support
Summary
Adds native time series forecasting as a first-class task type in DashAI. This includes a new
ForecastingTask, four model families (Prophet, ARIMA, SARIMAX, SklearnMultiStep), a dedicatedForecastingJobwith causal splitting, three explainers (decomposition, feature importance, uncertainty), forecasting-specific metrics (MAPE, sMAPE), automatic temporal column detection on CSV upload, and all required frontend modals and panels to support the full workflow.Type of Change
Changes (by file)
Backend — Task & Models
DashAI/back/tasks/forecasting_task.py: newForecastingTaskclass with timestamp/target/exogenous column validation and temporal metadata exposure.DashAI/back/models/forecasting/base_forecasting_model.py: abstract base class definingfit,predict, exogenous helpers, and frequency utilities.DashAI/back/models/forecasting/prophet_model.py: Prophet-based model with seasonality, holidays, and component decomposition.DashAI/back/models/forecasting/statsmodels_arima_model.py: ARIMA model with configurable (p, d, q) and trend options.DashAI/back/models/forecasting/statsmodels_sarimax_model.py: SARIMAX model extending ARIMA with seasonal (P, D, Q, m) components.DashAI/back/models/forecasting/sklearn_multistep_forecaster.py: scikit-learn-based model with lag features and direct/recursive strategies.Backend — Job & Factory
DashAI/back/jobs/forecasting_job.py: new job class with temporal splitting,fitorchestration, and Huey integration.DashAI/back/job_queues/model_factory.py: updated to handle forecasting evaluation edge cases (small splits, NaN/Inf metric sanitization).Backend — Explainers
DashAI/back/explainability/forecasting/base_forecasting_explainer.py: shared base for all forecasting explainers.DashAI/back/explainability/forecasting/forecast_decomposition.py: STL trend/seasonal/residual decomposition.DashAI/back/explainability/forecasting/forecast_feature_importance.py: permutation-based exogenous variable importance.DashAI/back/explainability/forecasting/forecast_uncertainty.py: prediction interval and confidence estimation.DashAI/back/routers/explainers_router.py: added name auto-deduplication logic.Backend — Predict
DashAI/back/jobs/predict_job.py: added forecasting-specific validation (_validate_forecasting_dataset), auto-generate mode, temporal metadata propagation, andsanitize_for_json()utility.Backend — Dataset / Converters
DashAI/back/routers/datasets_router.py: newGET /datasets/{id}/temporal-infoendpoint.DashAI/back/converters/time_series_window_converter.py: supervised lag-feature transformation.DashAI/back/converters/extend_time_series_converter.py: future-row appender for out-of-sample prediction.Backend — Metrics
DashAI/back/metrics/mape.py: MAPE metric with zero-value guard.DashAI/back/metrics/smape.py: sMAPE metric, bounded 0–200%.Backend — Registration
DashAI/back/initial_components.py: registered all new task, models, converters, metrics, explainers, and job.requirements.txt: addedprophetandstatsmodels.Frontend
DashAI/front/src/components/splits/SplitDatasetTemporal.tsx: temporal split panel with train/val/test sliders and gap parameter.DashAI/front/src/components/predictions/ForecastingOptions.tsx: prediction modal panel for auto-generate vs. dataset mode, with frequency validation.DashAI/front/src/components/explainers/NewGlobalExplainerModal.tsx: extended to fetch and display temporal metadata for forecasting explainers.DashAI/front/src/components/explainers/ForecastingExplainerInfo.tsx: new component showing model, horizon, and frequency context.DashAI/front/src/components/runs/RunnerDialog.tsx: passestask_namewhen enqueuing jobs.Tests
DashAI/tests/back/models/test_forecasting_models.py: 473-line suite covering all four model families.DashAI/tests/back/tasks/test_forecasting_task.py: task validation tests.DashAI/tests/back/jobs/test_forecasting_job.py: job splitting and training tests.Testing (optional)
pytest DashAI/tests/back/models/test_forecasting_models.pyto verify all model families.date,value) to aForecastingTaskexperiment and confirm automatic frequency detection.ProphetModel, then run a prediction using auto-generate mode (e.g., 30 future periods) and verify the output in the UI.ForecastDecompositionexplainer to a trained run and confirm trend/seasonal/residual components are displayed.Notes (optional)
prophetandstatsmodelsare new dependencies; ensure they are installed before running forecasting experiments.💡 Reviewer Guide: Key Architectural Adaptations for Forecasting (Click to expand)
Forecasting introduces specific domain requirements that differ from classical ML (Regression/Classification). Here is a quick guide to the 6 key architectural adaptations implemented in this PR to integrate Forecasting seamlessly into the DashAI ecosystem:
1. Temporal Splitting — Strict Chronological Order
dashai_dataset.pygapparameter to prevent data leakage.2. Frequency Detection and Metadata Propagation
forecasting_task.py,predict_job.py3. Prediction Logic — Auto-generate vs. Exogenous Dataset
predict_job.pyforecast_periods), or validating a user-provided CSV containing future values for exogenous variables.4. Richer Interface — BaseForecastingModel
base_forecasting_model.pyBaseModelinterface. Forecasting models utilize a richer interface to store temporal metadata (frequency, last observed date) so that the downstream infrastructure (Explainers, PredictJob) can consume it uniformly.5. Supervised Transformation — Windowing Converters
time_series_window_converter.pysklearnalgorithms for multi-step forecasting without modifying their core logic.6. Metric Sanitization — JSON Safety
mape.py,smape.pyinforNaNmetric results, which invalidate JSON serialization for the frontend. Masks and semantic fallbacks were added to ensure robust API stability.Overview
This PR introduces native time series forecasting capabilities to DashAI, enabling users to build, train, evaluate, and explain forecasting models directly within the platform. The implementation covers the full ML pipeline: data ingestion with automatic temporal column detection, task/job orchestration, four model families, specialized metrics, explainability tools, and a dedicated frontend experience.
1. Forecasting Task
A new
ForecastingTaskclass defines the contract for time series problems. It expects one timestamp column (required), one target column (output), and optionally exogenous/regressor columns as additional inputs. The task validates column types at configuration time, provides informative error messages when columns are missing or mistyped, and exposes temporal metadata that flows downstream to training, prediction, and explainability components.2. Forecasting Models
Four model families are available out of the box:
ProphetModelStatsmodelsARIMAModelStatsmodelsSARIMAXModelSklearnMultiStepForecasterlinear,ridge, andrandom_forestbase estimators withdirectorrecursivemulti-step strategiesAll models support both in-sample (validation) and out-of-sample (future) prediction modes, and all share a unified interface for handling exogenous variables.
3. Base Forecasting Model
BaseForecastingModelis the abstract base class that all forecasting models extend. It defines:fit(X, y, temporal_metadata)— trains the model while storing timestamp column, target column, and frequency info.predict(X)— dispatches to in-sample or out-of-sample logic depending on whether future timestamps are present.get_exogenous_columns()/has_exogenous_variables()— unified exogenous variable management across implementations.D,W,Mto integer periods).This design ensures that any new forecasting model added to DashAI only needs to implement the core logic while inheriting consistent behaviour for metadata management, column tracking, and prediction dispatch.
4. Forecasting Job
ForecastingJoborchestrates the full training lifecycle for forecasting experiments:gapbetween splits to prevent leakage.temporal_metadatatomodel.fit().The model factory was also updated to handle forecasting-specific edge cases during evaluation: small splits that fail prediction are gracefully skipped, NaN/Inf metric values are sanitized to
nullbefore JSON serialization, and per-metric try/except blocks prevent a single failing metric from aborting the entire evaluation.5. Explainers
Three new forecasting explainers provide interpretability at different levels:
ForecastDecomposition— breaks down any forecast into trend, seasonal, and residual components using STL (Seasonal-Trend decomposition via Loess). Works across all four model families. Configurablehorizon(1–365 periods) and aninclude_historicalflag.ForecastFeatureImportance— permutation-based importance for exogenous variables. Measures the degradation in forecast quality (MAE, RMSE, or MAPE) when each external regressor is randomly shuffled. Useful for understanding which external signals actually help the model. Configurablen_repeats(1–50).ForecastUncertainty— quantifies prediction intervals and confidence around each forecast point, helping users assess how reliable a forecast is over the horizon.All three share a
ForecastingGlobalExplainerbase class that handles timestamp detection, frequency inference, and dataset preparation, reducing duplication across implementations.Additionally, the explainers API endpoint now auto-deduplicates names (e.g.,
"Explainer"→"Explainer_1") to avoid conflicts.6. Predict — Forecasting Particularities
Forecasting prediction is fundamentally different from classification/regression, so it required dedicated handling in the predict job:
Two prediction modes:
forecast_periods); DashAI automatically generates the future timestamp sequence using the detected frequency, so no prediction dataset is needed.Validation (
_validate_forecasting_dataset): when a dataset is provided, the job validates it thoroughly — strict chronological ordering, no duplicate timestamps, consistent frequency, no backcasting (no predictions before the training window ends), and presence of all required exogenous columns without NaN values.Temporal metadata propagation: the training run's temporal config (timestamp column, frequency, training date range) is passed to the prediction job so it can validate compatibility without re-reading the training data.
A new
sanitize_for_json()utility converts anyNaN/Infinityfloat values in prediction outputs tonullbefore returning them via the API.7. CSV Upload with Automatic Temporal Column Detection
When a user uploads a CSV for a
ForecastingTask, DashAI automatically scans column types to detect the timestamp column. The detection logic:dsfirst (Prophet convention).A new API endpoint
GET /datasets/{dataset_id}/temporal-info?timestamp_column={col}returns the detectedfrequency_code,frequency_label,frequency_description,start_date,end_date,total_periods, and anydetected_gaps. The frontend uses this to populate the split configuration and to validate that a prediction dataset matches the training frequency before allowing the user to proceed.Two new converters support the data transformation pipeline:
TimeSeriesWindowConverter: transforms the series into a supervised format with configurable lag features (lag_1 … lag_w) and a shifted target (y_target_h).ExtendTimeSeriesConverter: appendsn_stepsfuture rows with inferred timestamps andNaNfor unobserved columns, used to create the prediction input for out-of-sample forecasting.8. Forecasting Metrics
MAPE (Mean Absolute Percentage Error)
Scale-independent and returns a percentage (lower is better). Zero values in
y_trueare masked with a1e-8threshold to avoid division by zero. Compatible with bothForecastingTaskandRegressionTask.sMAPE (Symmetric Mean Absolute Percentage Error)
Bounded between 0–200%, more numerically stable than MAPE, and symmetric with respect to over/under-forecasting. Also handles zero values gracefully.
Both metrics are registered in
initial_components.pyand are selectable from the UI when configuring a forecasting experiment.9. Frontend Changes
Temporal Dataset Splitting (
SplitDatasetTemporal)A new split configuration panel replaces the standard random split for
ForecastingTask. It provides train/validation/test percentage sliders, a gap parameter (rows to skip between splits to prevent leakage), floating-point tolerance validation, and auto-scaled minimum sizes based on dataset length. Validation errors are shown inline with clear explanations.Prediction Modal — Forecasting Options (
ForecastingOptions)When running a prediction for a forecasting model, the prediction modal opens a new
ForecastingOptionspanel that lets the user choose between the two prediction modes (auto-generate vs. dataset). It displays a summary of the training data's temporal context (frequency, date range, total periods) and — when a dataset is selected — fetches its temporal info and shows a frequency match/mismatch indicator with visual feedback before allowing the user to submit.New Explainer Modal
The
NewGlobalExplainerModalwas extended to fetch and display temporal metadata for forecasting models. A newForecastingExplainerInfocomponent shows the model name, configured horizon, and frequency context inline within theConfigureExplainerStep, giving users clarity on what the explainer will operate over.Runner Dialog
The
RunnerDialognow passestask_namewhen enqueuing jobs, enabling the backend to route to the correct job class (includingForecastingJob).10. Additional Changes
prophetandstatsmodelsadded torequirements.txt.initial_components.py: All new task, models, converters, metrics, explainers, and job are registered.