Skip to content

Stoner stochastic graphing overhaul#37

Open
weshinsley wants to merge 14 commits into
masterfrom
VIMC-9252
Open

Stoner stochastic graphing overhaul#37
weshinsley wants to merge 14 commits into
masterfrom
VIMC-9252

Conversation

@weshinsley
Copy link
Copy Markdown
Contributor

@weshinsley weshinsley commented Apr 30, 2026

A pretty horrible PR really with a lot of churn to achieve some specific things. It might be best to just review the finished stochastic_file and stochastic_graph files (and maybe read the vignette) - and not treat this as a diff, as it's so messy,

Anywyay - it achieves the following, which kind of had to come in one go, or something else would break...

  • stochastic_file has a new command stone_stochastic_make_meta which makes a meta.csv in the root of the stochastic files (\wpia-hn2.hpc.dide.ic.ac.uk\vimc_stochastics), giving meta-data about the groups, diseases, touchstones, scenarios, outcomes and countries, so we know what we've got at a glance, rather than having to do a nasty trawl.

  • stochastic_explorer (the shiny app) uses the meta file instead of having to do exactly that nasty trawl, which saves about 20 seconds on startup, and makes everything a bit less sluggish generally. Some churn in utils.R with how it finds the info.

  • stochastic_graph can now produce two graphs at once with the same y-axis scale, in order to compare estimates in different touchstones, or by different modellin groups (keeping all other selections the same) - with the additional possibility of being able to plot the difference (often called impact) between two scenarios. This also was quite a churny update.

  • stochastic_graph can also plot with age as the x-axis, and filter which years to reduce - as well as plotting by time on the x-axis, and choosing which ages to reduce. Basically this is all flexed up.

  • All the above is available from the shiny app....

  • And written up in the vignette.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 1, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (4d10598) to head (ac7cbb7).
⚠️ Report is 7 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff             @@
##           master       #37      +/-   ##
===========================================
+ Coverage   98.01%   100.00%   +1.98%     
===========================================
  Files          23        22       -1     
  Lines        2568      2640      +72     
===========================================
+ Hits         2517      2640     +123     
+ Misses         51         0      -51     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@weshinsley weshinsley marked this pull request as ready for review May 5, 2026 16:20
@weshinsley weshinsley changed the title Vimc 9252 Stoner stochastic graphing overhaul May 6, 2026
Copy link
Copy Markdown

@pratikunterwegs pratikunterwegs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've put down some comments. I wasn't able to run the tests due to an error in a script somewhere. I won't hold up merging this as GHA checks seem to pass, and this seems to work for @kgaythorpe overall - or at least she's approved it.

Some design notes: I would try going for a separation of concerns and split off data processing from plotting in stone_stochastic_graph. It might be good to break this fn up into smaller ones too.

I wonder whether it might be worth shifting to {ggplot2}-based plotting as well.

Comment thread R/stochastic_files.R
##' @importFrom utils write.csv
##' @param path The root folder of the stochastic data.

stone_stochastic_make_meta <- function(path) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we check that path exists and is write-able? checkmate::test_path_for_output() or something similar would work.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment thread R/stochastic_files.R
Comment on lines +305 to +310
scenarios <- unique(unlist(lapply(files, `[[`, 2)))
df <- data.frame()
for (scenario in scenarios) {
matches <- files[unlist(lapply(files, `[[`, 2)) == scenario]
countries <- unique(unlist(lapply(matches, `[[`, 3)))
countries <- gsub(".pq", "", countries)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we get a comment or a section in the fn docs with what's intended here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments throughout that section...

Comment thread R/stochastic_files.R

stone_stochastic_make_meta <- function(path) {

explore_files <- function(touchstone, folder, disease, group) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we get docs on the expectations for these args? How are these injected into this scope? Is the expectation that they're present in the global scope?

It might be worth adding defaults to the fn signature - perhaps drawing from package constants?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And perhaps some input checking so users know how and why their fn calls fail would be great.

Copy link
Copy Markdown
Contributor Author

@weshinsley weshinsley May 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explore_files and touchstone_meta are only called from the code further down in stone_stochastic_make_meta - the user won't ever call them.

stone_stochastic_make_meta assumes the stochastic file share is perfectly curated (which is an interesting assumption), so it can work out all of these things automatically from the folder and file names.

I'll do some more commenting to clarify

Comment thread R/stochastic_files.R
first <- file.path(path, touchstone, folder, files[1])
ds <- arrow::open_dataset(first)
outcomes <- ds$schema$names
outcomes <- outcomes[!outcomes %in% c("run_id", "disease", "year", "age",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a candidate for a package constant...

Comment thread R/stochastic_files.R
}

touchstones <- basename(list.dirs(paste0(path, "/"), recursive = FALSE))
res <- data.table::rbindlist(lapply(touchstones, touchstone_meta))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we apply this pattern - rbindlist() applied to a functional over a list, to the loop above too? No real reason other than it's nice.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, why not. It's tidier...

Comment thread R/stochastic_graphs.R
Comment on lines +68 to +78
stone_stochastic_graph <- function(base,
touchstones, disease, groups, country,
scenarios, outcome,
xaxis = "time",
filter = NULL,
by_cohort = FALSE, log = FALSE,
packit_id = NULL, packit_file = NULL,
include_stochastics = TRUE,
include_quantiles = TRUE,
include_mean = TRUE,
include_median = TRUE,
scenario2 = NULL) {
include_median = TRUE) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we get input checks on these args, esp. touchstones, disease etc.?

Copy link
Copy Markdown
Contributor Author

@weshinsley weshinsley May 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Semi-meaningful errors come out a bit later at the "file-not-found" level - but yes, I'll see if I can add something a bit better...

Comment thread R/stochastic_graphs.R
Comment on lines +9 to +12
##' A feature-rich-ish graph plotting function, which can show the stochastic
##' line for an outcome for different runs, the mean, median and quantiles.
##' If two scenarios are specified, then the burden difference between
##' scenarios is plotted. Additionally, if multiple groups or multiple
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to split the single scenario and scenario comparison functions into two? Might make it more modular.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The differences in code between the single/comparison graphs ends up very slight - just y-axis/title labelling, a different line in get_burden_difference, so... not sure...

Comment thread R/stochastic_graphs.R
Comment on lines +67 to +68

stone_stochastic_graph <- function(base,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we get a @return tag saying what this fn returns? Seems to be a list with base graph elements. I'm not against base plotting per se, but if the list elements are going to be saved or re-used it seems like a good use case for {ggplot2} and gg-class elements.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep - @return tag done - it is indeed a list of exactly 1 or 2 base plots, which is a bit odd. ggplot2 and combining the graphs in a panel would probably be the nicer way, but ggplot isn't quite in my hitting zone yet, so I may resort to some more skilled ggplotters for some snazzy plots.

In the meantime, up to two base plots, stochastic_explorer can then replay the plots to put them separately in the place it wants. RStudio plots the last (or only) one, so the user has to call plot again, and can only see them one at a time, which is a bit non-ideal.

For now, I've documented it, and maybe a later PR to fix it more properly.

Comment thread vignettes/Stochastics.Rmd
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that this file presents pre-canned outputs stored in ../figures/, which is fine if these are generated from data that doesn't live in the pkg and/or can't be easily imported - or can it? Either/or? One option would be to add some example data as package data, which would also be a good templtate when developing more functions.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, for now I plotted examples using real data in the file share and canned the results; I think I probably wouldn't want to have any genuine model data in the package, but yes, could manufacturer some interesting enough fake data which would make the vignette a better test - and would mean I wouldn't have to re-can if we snazz the plots up with ggplot2! Perhaps will push the discussion to the same time as ggplot2 improvements.

Comment thread R/stochastic_graphs.R
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I'd push for the data processing and plotting to be separated. Functions downstream of data processing could be coded as pipe-friendly, so that operations can be chained together in vignettes etc.

@pratikunterwegs
Copy link
Copy Markdown

Thanks @weshinsley - happy to sit with VIMC folks and go over needs for this, for a future PR ofc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants