Skip to content

feat(benchmark): add run_benchmark() with dataframe-first pipeline, metrics, and tests#3892

Open
anshul23102 wants to merge 13 commits intoPecanProject:developfrom
anshul23102:feat/benchmark-runner
Open

feat(benchmark): add run_benchmark() with dataframe-first pipeline, metrics, and tests#3892
anshul23102 wants to merge 13 commits intoPecanProject:developfrom
anshul23102:feat/benchmark-runner

Conversation

@anshul23102
Copy link
Copy Markdown
Contributor

@anshul23102 anshul23102 commented Mar 24, 2026

Description

Adds a new run_benchmark() function as a database-free entry point for the
benchmark module. Unlike calc_benchmark(), this works without a BETYdb
connection, taking validated dataframes directly as input.

Files added:

  • R/run_benchmark.R — dataframe-first pipeline with bm_validate(),
    align_by_time(), compute_metrics(), and plot_time_series()
  • inst/testdata/sample_model.csv — sample model output
  • inst/testdata/sample_obs.csv — sample observations
  • tests/testthat/test-run_benchmark.R — unit tests for all four pipeline stages
  • README.md — updated with quickstart example

Motivation and Context

The existing calc_benchmark() requires a full database connection which
makes it hard to test and use standalone. This adds a lightweight,
dataframe-first entry point — run_benchmark(model_df, obs_df) — for users
who want to quickly benchmark model output against observations without any
database setup. The pipeline follows a four-stage design: validate → align →
compute metrics → plot.

Review Time Estimate

  • Immediately
  • Within one week
  • When possible

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • My change requires a change to the documentation.
  • My name is in the list of CITATION.cff
  • I agree that PEcAn Project may distribute my contribution under any or all of
    • the same license as the existing code,
    • and/or the BSD 3-clause license.
  • I have updated the CHANGELOG.md.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@anshul23102 anshul23102 changed the title feat(benchmark): add run_benchmark MVP with IO, alignment, metrics, and tests feat(benchmark): add run_benchmark() with dataframe-first pipeline, metrics, and tests Apr 1, 2026
@anshul23102
Copy link
Copy Markdown
Contributor Author

Hi @dlebauer, just wanted to give you a quick update on PR #3892. All CI checks are now passing and I've refactored the API to be dataframe-first (run_benchmark(model_df, obs_df)) which directly aligns with the GSoC proposal I submitted.
The pipeline now has four clean stages: validate, align, compute metrics, and plot, each usable independently or through the orchestrator. Would love your feedback whenever you get a chance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant