fix(utils): use units::ud_convert directly for non-difftime conversions#3928
fix(utils): use units::ud_convert directly for non-difftime conversions#3928anshul23102 wants to merge 4 commits intoPecanProject:developfrom
Conversation
|
Thanks for the PR! Can you please tell us how you tested this and whether the speed difference in a real workflow was comparable to @dlebauer's benchmark result? |
|
Tested locally with
The new approach is ~18x faster with zero memory allocation, consistent with @dlebauer's benchmark in #3927. The |
|
Well yes, repeating the same test case is going to give the same results. That's why I said "real workflow": Inside PEcAn Even before we get to testing on a PEcAn workflow, we can observe in benchmark results that the memory allocation difference is only seen the first time you call it in a session, and the headline "20x" number only holds for very simple and small inputs. Converting a vector of 10 numbers instead of a scalar cuts DetailsThat said, we might still want to make this change! We use ud_convert so many places in PEcAn that even a context-dependent speedup could still pay off if it eases a bottleneck in an important place. But the flip side is that because we rely so heavily on ud_convert in so many places, the bar is high for changes: We want to be VERY sure they won't break anything. I know we have some existing test coverage of ud_convert, but it's worth considering if there are any additional test cases needed to ensure this change didn't miss any corner cases. |
…time conversions add: For non-difftime inputs, replace the set_units/set_units/drop_units chain with a direct call to units::ud_convert, which is ~20x faster per dlebauer's benchmark in PecanProject#3927. The difftime path is unchanged.
Addresses reviewer feedback: existing tests only covered scalar inputs. Adds tests for numeric vectors (small and length-1000), NA passthrough, and integer input to improve coverage of real PEcAn usage patterns.
Instead of calling units::ud_convert element-wise on the full vector, compute the linear factor and offset once using two scalar calls, then apply with base R arithmetic. This is ~750x faster for large vectors (e.g. n=8760 hourly met data: 30ms -> 40us) while remaining correct for all affine unit conversions that udunits supports.
…roach The affine factor/offset approach caused floating-point precision errors in datetime conversions (e.g. datetime2cf) where large offsets like 'seconds since 1970-01-01' cause catastrophic cancellation. Direct units::ud_convert handles precision internally in C.
|
I investigated the real call sites in So the honest summary is: this PR gives a real speedup (~10x) for scalar calls but no meaningful gain for large vectors. The CI failure on R 4.2 ( Happy to close this if the scalar-only speedup isn't worth it, or keep it open if you think it's still a net improvement. |
Yeligay8
left a comment
There was a problem hiding this comment.
ud_convert() now correctly processes vector inputs instead of assuming scalar values.
Added test:
expect_equal(ud_convert(c(0, 10, 100), "degC", "K"), c(273.15, 283.15, 373.15))
Recursive Conversion Logic:
return(units::ud_convert(as.numeric(x), u1, u2))
This ensures consistent handling when units do not match u1.
Function now preserves NA values during conversion.
Added test:
expect_equal(result, c(100, NA, 300))
Support for Integer Inputs: Ensures integer vectors are handled correctly.
Added test:
expect_equal(ud_convert(1L, "m", "cm"), 100)
Explicit test ensures output length matches input:
expect_length(result, 1000)
Removed code from original code:
Direct Unit Assignment to x1
x1 <- units::set_units(as.numeric(x), value = u1, mode = "standard")
Replaced with recursive call to ud_convert() for cleaner handling.
Intermediate Variable Handling
Old approach:
x2 <- units::set_units(x1, value = u2, mode = "standard")
units::drop_units(x2)
Refactored into a more streamlined flow while keeping:
x2 <- units::set_units(x1, value = u2, mode = "standard")
units::drop_units(x2)
Removes redundant steps and improves readability.
put units
Description
For non-difftime inputs, replace the slow
set_units/set_units/drop_unitschain with a direct call tounits::ud_convert. The difftime path is unchanged.Closes #3927
Motivation and Context
units::ud_convertis ~10x faster than the originalset_unitschain for scalar inputs. An affine factor/offset approach was attempted for large-vector speedup but caused floating-point precision errors in datetime conversions and was reverted.Benchmark (scalar, real usage pattern)
set_unitschain)units::ud_convert)Types of changes
Checklist