Integrating Nowcasts into an Ensemble of Data-Driven Forecasting Models for SARI Hospitalizations in Germany
Daniel Wolffram, Johannes Bracher
code/— Python project (primary codebase)src/— reusable Python modules*.ipynb— Jupyter notebooks for tuning, training, evaluation, and plottingpyproject.toml,uv.lock— Python environment
r/— R project (separate renv environment)hhh4/,tscount/,persistence/— model-specific scriptsnowcasting/— nowcast computationillustrations/— visualizationsrenv.lock,.Rprofile— R environment
data/— input datasetsfigures/— generated plotsforecasts/— generated forecastsnowcasts/— generated nowcastsresults/— generated resultsscores/— evaluation metricstuning/— hyperparameter tuning results
Python code lives in code/. R code lives in r/, with its own environment. Shared inputs and outputs (data/, forecasts/, nowcasts/, figures/, results/) live at the repo root and are accessible from both Python and R.
The project uses uv to manage the Python environment.
Install uv on your system as follows:
- Linux / macOS
curl -LsSf https://astral.sh/uv/install.sh | sh- Windows:
irm https://astral.sh/uv/install.ps1 | iex(In case of problems, please refer to the official installation guide.)
Once uv is installed, set up the environment from the repository root:
uv syncThis will create a local .venv/ and install all dependencies specified in pyproject.toml and uv.lock. It will also automatically install the required Python version if it is not already available on your system.
To run the notebooks with this environment, you must first register it as a Jupyter kernel:
uv run -m ipykernel install --user --name=replication-sariFor interactive use, you can start JupyterLab inside the managed environment:
uv run jupyter labThis provides a browser-based interface, useful if you don't have a preferred IDE installed. After launching, select the kernel replication-sari when opening notebooks.
To ensure reproducibility, please use R 4.5.1. Dependencies are managed with renv. From the r/ folder, restore the environment with:
R -e "install.packages('renv'); renv::restore()"This will restore all R package dependencies as specified in renv.lock.
uv, renv does not install R itself — you must install R 4.5.1 manually.
Note: The repository includes .Rprofile files (at both the root and in r/) that automatically activate the correct renv environment and anchor the here package to the repository root. This ensures that paths like here("data", ...) always work consistently, whether you open the whole repo or just the R subproject.
The repository contains a helper script run_pipeline.py that orchestrates the execution of all notebooks and R scripts in a defined order. This ensures reproducibility of results and allows running the full pipeline or just selected parts of it.
(If preferred, you can also open and run the individual notebooks or R scripts manually.)
The pipeline runs through the following stages:
exploration: Exploratory data analysis and visualization.
plot_sari.ipynb: visualize SARI dataplot_ari.ipynb: visualize ARI dataplot_delays.ipynb: analyze reporting delaysautocorrelation.ipynb: investigate correlation structure of time series
nowcasts: Real-time estimation of current case counts.
nowcasting/compute_nowcasts.R
tuning: Hyperparameter tuning for machine learning models (⚠️ may take several days).
tuning_lightgbm.ipynbtuning_tsmixer.ipynb
forecasts: Generate forecasts with different model variants.
baseline_historical.ipynb: historical baseline modelcompute_forecasts.ipynb: compute ML-based forecastspersistence/persistence.R: persistence baselinehhh4/hhh4_default.R,hhh4/hhh4_exclude_covid.R,hhh4/hhh4_naive.R,hhh4/hhh4_oracle.R,hhh4/hhh4_shuffle.R,hhh4/hhh4_skip.R,hhh4/hhh4_vincentization.R: hhh4 model variantstscount/tscount_extended.R,tscount/tscount_simple.R: tscount models
ensemble: Combine forecasts into an ensemble.
compute_ensemble.R
scores: Compute forecast evaluation scores.
compute_scores.ipynb
evaluation: Final visualization and evaluation of forecasts.
plot_nowcasts.ipynbplot_forecasts.ipynbevaluation.ipynbevaluation_quantiles.ipynbdiebold_mariano.ipynb
The pipeline can be executed with different options from the repository root.
(We use uv run instead of python to ensure the script is executed inside the correct environment managed by uv.)
-
Run the entire pipeline
uv run run_pipeline.py
-
Run a single stage
uv run run_pipeline.py --stage evaluation
-
Run a contiguous range of stages
uv run run_pipeline.py --start forecasts --end scores
-
Run everything except selected stages
uv run run_pipeline.py --skip tuning
tuning stage can take a very long time (several days). If you do not want to run it, use --skip tuning
When running the pipeline, make sure that the Rscript command points to the correct R version (4.5.1).
On some systems, the default Rscript may refer to an older version of R.
You can check this with:
Rscript --version