Fix pandas 3.0 / xarray 2025+ compat across groupby, _xarray, run, netcdf#321
Open
benmsanderson wants to merge 5 commits into
Open
Fix pandas 3.0 / xarray 2025+ compat across groupby, _xarray, run, netcdf#321benmsanderson wants to merge 5 commits into
benmsanderson wants to merge 5 commits into
Conversation
…itional indexing) pandas 3.0 introduced two changes that scmdata 0.18 trips on for any multi-scenario ScmRun: 1. Default StringDtype inference. String columns now come back as pd.StringDtype rather than object. RunGroupBy.__init__ called numpy.issubdtype(col.dtype, numpy.number) to detect numeric meta columns; on StringDtype this raises 'TypeError: Cannot interpret <StringDtype(...)> as a data type'. Route the check through pd.api.types.is_numeric_dtype instead, which returns False for StringDtype and True for numeric dtypes. 2. Removal of Series positional integer indexing. _xarray._many_to_one ended with checker.groupby(col2).count().max()[0]. max() on a DataFrame returns a label-indexed Series and pandas 3.0 removed positional integer indexing on those, so [0] raises 'KeyError: 0'. Use .iloc[0]: same semantics, explicit positional. Both calls are exercised by every multi-scenario ScmRun. The second in particular blocks ScmRun.to_nc entirely on pandas 3.0, so any downstream that streams scenarios to disk (e.g. openscm-runner's NetCDFChunkWriter) currently cannot run. The fixes are backward-compatible: pd.api.types.is_numeric_dtype and Series.iloc[0] have been pandas's canonical APIs since well before pandas 2.0.
benmsanderson
added a commit
to benmsanderson/openscm-runner
that referenced
this pull request
May 23, 2026
Mirror of scripts/run_rcmip_fair2.py for the CICEROSCMPY2 adapter: runs every SSP in the RCMIP fixture (ssp119, ssp126, ssp245, ssp370 and the two lowNTCF variants, ssp434, ssp460, ssp534-over, ssp585) against N posterior members of a CICERO-SCM v2.x distribution and prints a per-scenario 2100 GSAT / CO2 / ERF summary. Defaults to splice mode (user emissions + bundled ssp245 historical), which is the path the demo uses. Pass --cicero-bundle-dir to switch to bundle mode (Marit RCMIP-aligned setup) where gaspam and conc files are resolved per-scenario from inside the bundle directory. Smoke-tested end-to-end against draw_samples_500.json with 20 members: ~44 s for 10 scenarios x 20 members on a single thread. 2100 GSAT medians are systematically warmer than the FaIRv2 numbers on the same protocol (e.g. ssp245 3.77 K vs FaIR 2.63 K, ssp585 6.72 K vs FaIR 4.82 K). The CICEROSCM bundle's ECS distribution is wider than FaIR's, and the 20-member subset is small relative to the full 500-member posterior, so the offset is consistent with the expected inter-model spread. Results are kept in memory: the NetCDFChunkWriter path currently trips a scmdata-pandas-3 incompatibility (fixed in PR #11 / upstream openscm/scmdata#321), so writer support stays out of this script until those land in main.
pandas 3.0 makes DataFrame.values return a read-only ndarray, so the existing self._df.values[:] = ... idiom in convert_unit, _binary_op and _unary_op raises ValueError: assignment destination is read-only. Switch to .iloc[:, :] = ..., which goes through pandas' indexer rather than the underlying ndarray and so isn't affected by the read-only change. _binary_op and _unary_op additionally wrap the right-hand side in np.asarray(..., dtype=float) to preserve the prior silent bool-to- float cast that comparison ops (lt, eq, ne, etc.) relied on; .iloc is dtype-strict where the old .values write was not, so the explicit cast keeps the historical semantics of the result frame.
…ation xarray 2025+ deprecates the bare use_cftime kwarg on `xr.load_dataset`, recommending instead that callers pass an `xr.coders.CFDatetimeCoder` via `decode_times`. The deprecation emits a FutureWarning on every `ScmRun.from_nc()` / `nc_to_run` call, which floods downstream notebook output. Prefer the new API when `xr.coders.CFDatetimeCoder` is available and fall back to the bare kwarg on older xarray (< 2024.09) where the new coder does not exist.
Author
|
Hey @znicholls - this is ready when maintainers have a minute. Four targeted fixes. The openscm-runner PR needs this first if we want to avoid shims, happy to address anything you flag. -b |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Restores
scmdatacompatibility with pandas 3.0 and xarray 2025+. Four targeted patches, each scoped to a single source site, with regression tests and one shared changelog fragment.Patch 1 —
groupby.py: numeric-column detection onStringDtypeRunGroupBy.__init__callednumpy.issubdtype(col.dtype, numpy.number)to detect numeric meta columns. pandas 3.0 shipspd.StringDtypeas the default for inferred string columns, andnumpy.issubdtyperejectsStringDtype:Routed through
pandas.api.types.is_numeric_dtype, which returnsFalseforStringDtypeandTruefor numeric dtypes.Patch 2 —
_xarray.py:Seriespositional indexing_many_to_oneended withchecker.groupby(col2).count().max()[0]. The chained.max()returns a label-indexed Series, and pandas 3.0 removed positional integer indexing on those:Replaced with
.iloc[0]— explicit positional, same semantics.Patch 3 —
run.py: read-onlyDataFrame.valuesScmRun.convert_unit,_binary_op, and_unary_opall wrote results back viaself._df.values[:] = .... pandas 3.0 makesDataFrame.valuesreturn a read-only ndarray, so the in-place write raises:Switched all four sites to
self._df.iloc[:, :] = ..., which goes through pandas' indexer rather than the underlying ndarray. The_binary_op/_unary_opsites additionally wrap the RHS innp.asarray(..., dtype=float), because.ilocis dtype-strict where.values[:]was not, and the boolean arrays returned by comparison ops (lt,eq,ne, ...) need to keep landing asfloat64to preserve the prior behaviour.Patch 4 —
netcdf.py: deprecateduse_cftimekwarg_read_nccalledxr.load_dataset(fname, use_cftime=True). xarray 2025+ deprecates the bare kwarg in favour of passing anxr.coders.CFDatetimeCoderviadecode_times, firing aFutureWarningon everyScmRun.from_nc()call and flooding downstream notebook output.Routes through the new API when
xr.coders.CFDatetimeCoderis available, falling back to the bareuse_cftimeon xarray older than 2024.09.Impact
Each of the four patches blocks a different category of
ScmRunusage on pandas 3.0:ScmRun.groupby(and everything built on it).ScmRun.to_xarray/ScmRun.to_ncwith extras.ScmRun.convert_unit, all arithmetic operators, all comparison operators.After the four fixes the relevant code paths run cleanly on both pandas 2.x and pandas 3.x stacks; xarray 2025+ no longer emits the
use_cftimedeprecation.Tests
Added regression tests at the patch sites:
tests/unit/test_xarray.py: direct unit tests for_many_to_oneon one-to-one and many-to-one inputs.tests/unit/test_netcdf.py: afrom_ncround-trip that asserts nouse_cftimeFutureWarningis emitted.The existing
convert_unit,_binary_op, and_unary_optest suites (~80 tests) all pass under pandas 3.0 with the patch — and were the diagnostic that surfaced the bool-array dtype edge case during porting.Backward compatibility:
pandas.api.types.is_numeric_dtype,Series.iloc[0],DataFrame.iloc[:, :] = ..., andxr.coders.CFDatetimeCoder(with the fallback) have all been available since well before the lower bounds inpyproject.toml.Driver
This PR is the dependency-side fix that unblocks the AR7-cycle modernisation work on openscm-runner. Once it's released, the runner deletes its in-tree
_scmdata_patchesshim and pinsscmdata>=<the-release>.