Reproducible research code accompanying the paper:
[Paper title] [Authors] [Journal, Year]
This repository implements an Ensemble Kalman Filter (EnKF) for dual state and parameter estimation in Chinese Hamster Ovary (CHO) cell culture bioprocesses, with cross-condition knowledge transfer.
The EnKF is applied to six CHO cell culture datasets covering two cell lines (T127 and GS46), multiple bioreactor scales, and different feeding strategies. The workflow:
- Integrates the bioreactor volume ODE
- Runs a nominal forward simulation with literature parameters (Kotidis et al. 2019)
- Tunes ensemble size by sweeping RMSE across sizes
[10, 25, 50, 75] - Performs long-term state prediction using the best ensemble size
- Tests robustness with an irregular 48/72 h measurement schedule
- Quantifies prior width sensitivity and ±20% mean parameter sensitivity
BiotechBioeng/
│
├── cho_enkf/ # Python package
│ ├── config.py # All constants: RUN_NAME, paths, parameters, noise
│ ├── data_loader.py # Load Excel datasets
│ ├── model.py # Volume integration, kinetic model (ODE step)
│ ├── enkf.py # EnKF classes + all runner functions
│ ├── analysis.py # R², convergence tables, correlation matrix
│ ├── plotting.py # All publication-quality figure functions
│ └── io_utils.py # Pickle save/load, path construction
│
├── scripts/ # Numbered execution pipeline
│ ├── 01_ensemble_tuning.py # Load data → run EnKF sweep → save pkl
│ ├── 02_longterm_pred.py # Long-term forecasting with best ensemble size
│ ├── 03_irregular.py # EnKF with 48/72 h irregular measurements
│ ├── 04_sensitivity.py # Prior width + ±% parameter sensitivity
│ └── 05_comparisons.py # EnKF vs reparametrised model comparison
│
├── data/ # CHO experimental datasets (Excel)
│ ├── CHO_T127_flask_PMJ.xlsx
│ ├── CHO_T127_SNS_36.5.xlsx
│ ├── CHO_T127_SNS_32.xlsx
│ ├── CHO_GS46_F_C_Inv.xlsx
│ ├── CHO_GS46_F_all.xlsx
│ └── CHO_GS46_F_all_pl40.xlsx
│
├── results/ # Generated outputs (gitignored)
│ └── {RUN_NAME}/
│ ├── run_notes.txt # Auto-created on first run — fill in manually
│ ├── 01_ensemble_tuning/
│ │ ├── pkl/
│ │ └── figures/
│ ├── 02_longterm_pred/
│ │ ├── pkl/
│ │ └── figures/
│ ├── 03_irregular/
│ │ ├── pkl/
│ │ └── figures/
│ ├── 04_sensitivity/
│ │ ├── pkl/
│ │ └── figures/
│ └── 05_comparisons/
│ ├── pkl/
│ └── figures/
│
├── original.ipynb # Original monolithic notebook (reference only)
├── pyproject.toml
├── poetry.lock
└── README.md
# Install Poetry (if not already installed)
pip install poetry
# Create virtual environment and install all dependencies
poetry install
# Activate the environment
poetry shell
# OR: source .venv/Scripts/activate (Windows)
# source .venv/bin/activate (Linux/macOS)Edit cho_enkf/config.py:
RUN_NAME = "run_v1" # all outputs go to results/run_v1/Change this string to version a new experiment. Old results are preserved.
Each script saves results to its own subfolder. Set LOAD_FROM_PKL = True (default) to skip recomputation and regenerate figures only.
# Step 1 (~2–4 h): ensemble tuning
poetry run python scripts/01_ensemble_tuning.py
# Step 2 (~30–60 min): long-term prediction with best ensemble size
poetry run python scripts/02_longterm_pred.py
# Step 3 (~30–60 min): irregular measurement schedule
poetry run python scripts/03_irregular.py
# Step 4 (~2–4 h): sensitivity analyses (prior width + ±% params)
poetry run python scripts/04_sensitivity.py
# Step 5 (< 5 min): EnKF vs reparametrised model comparison
poetry run python scripts/05_comparisons.pyTip: Each script has
LOAD_FROM_PKL = Trueat the top. With this set, scripts load saved pkl files and skip all computation — useful for regenerating figures without re-running the EnKF.
results/{RUN_NAME}/
├── run_notes.txt # Fill in to document what changed in this run
├── 01_ensemble_tuning/pkl|figures/
├── 02_longterm_pred/pkl|figures/
├── 03_irregular/pkl|figures/
├── 04_sensitivity/pkl|figures/
└── 05_comparisons/pkl|figures/
| Dataset | Cell Line | Condition |
|---|---|---|
CHO_T127_flask_PMJ |
T127 (Cell Line A) | Shake flask, 36.5°C |
CHO_T127_SNS_36.5 |
T127 (Cell Line A) | Bioreactor, 36.5°C |
CHO_T127_SNS_32 |
T127 (Cell Line A) | Bioreactor, 32°C |
CHO_GS46_F_C_Inv |
GS46 (Cell Line B) | Feed C |
CHO_GS46_F_all |
GS46 (Cell Line B) | Feed U |
CHO_GS46_F_all_pl40 |
GS46 (Cell Line B) | Feed U +40% |
Each Excel file contains three sheets: schedule (feed schedule), feed (feed concentrations), exp_meas (measured state variables ± std).
| Symbol | Description | Unit |
|---|---|---|
| Xv | Viable cell density | cell L⁻¹ |
| mAb | Monoclonal antibody titre | mg L⁻¹ |
| Glc | Glucose | mM |
| Amm | Ammonia | mM |
| Gln | Glutamine | mM |
| Lac | Lactate | mM |
| Glu | Glutamate | mM |
| Asn | Asparagine | mM |
| Variable | Description |
|---|---|
RUN_NAME |
Experiment version tag; controls output folder |
TUNING_ENSEMBLE_SIZES |
Sizes swept in Step 1 (default [10, 25, 50, 75]) |
BEST_ENSEMBLE_SIZES |
Best size per dataset (set after reviewing Step 1 results) |
MEAN_PARAMETERS |
Nominal model parameters from Kotidis et al. 2019 |
PARAMETERS_ENSEMBLE_COVARIANCE |
Prior width for each parameter |
DATASET_NOISE_VARIANCES |
Process (Q) and observation (R) noise per dataset |
KQ_DICT / KR_DICT |
Scaling factors for Q and R matrices |
PRIOR_WIDTH_SCALES |
Scales tested in sensitivity analysis |
PARAM_SENS_PERTURBATIONS |
Fractions for ±% sensitivity (default [0.10, 0.20, 0.30]) |
run_enkf_with_tuning(...)— sweep ensemble sizesenkf_long_pred_best_ensemble_size(...)— long-term predictionrun_pipeline_irregular_48_72(...)— irregular measurement schedulerun_enkf_with_mean_params(...)— parameter sensitivity runs
compute_r2_table(...)— R² for all datasetscompute_overall_convergence_table(...)— parameter convergence %get_posterior_param_matrix(...)— posterior ensemble for correlation
plot_rmse_variance_and_computation_time_all(...)— tuning figureoverlay_T127_subplots_with_errorbars(...)— T127 comparisonoverlay_gs46_subplots_with_errorbars(...)— GS46 comparisonplot_longterm_pred_ensemble_simulation_errorbar(...)— long-term predplot_parameter_comparison_across_datasets(...)— cross-dataset paramsplot_posterior_param_correlation(...)— correlation heatmapplot_prior_width_sensitivity_rmse(...)— prior width RMSE barplot_param_sensitivity_comparison(...)— ±20% sensitivity
| Package | Purpose |
|---|---|
| numpy, scipy | Numerical computation, ODE integration |
| pandas | Data loading and tabular analysis |
| matplotlib, seaborn | Plotting |
| openpyxl | Reading Excel files |
| tqdm | Progress bars |
| jupyter, ipykernel | Interactive exploration (optional) |
Parameters from Kotidis et al. (2019) for CHO-T127 shake flask cultures. The model describes growth, death, and metabolite dynamics via Monod-type kinetics, with ammonia and lactate inhibition, and a full yield-based metabolic network.
If you use this code, please cite:
@article{[key],
title = {[Title]},
author = {[Authors]},
journal = {[Journal]},
year = {[Year]},
doi = {[DOI]}
}[License]