Stoner stochastic graphing overhaul#37
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #37 +/- ##
===========================================
+ Coverage 98.01% 100.00% +1.98%
===========================================
Files 23 22 -1
Lines 2568 2640 +72
===========================================
+ Hits 2517 2640 +123
+ Misses 51 0 -51 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
pratikunterwegs
left a comment
There was a problem hiding this comment.
I've put down some comments. I wasn't able to run the tests due to an error in a script somewhere. I won't hold up merging this as GHA checks seem to pass, and this seems to work for @kgaythorpe overall - or at least she's approved it.
Some design notes: I would try going for a separation of concerns and split off data processing from plotting in stone_stochastic_graph. It might be good to break this fn up into smaller ones too.
I wonder whether it might be worth shifting to {ggplot2}-based plotting as well.
| ##' @importFrom utils write.csv | ||
| ##' @param path The root folder of the stochastic data. | ||
|
|
||
| stone_stochastic_make_meta <- function(path) { |
There was a problem hiding this comment.
Could we check that path exists and is write-able? checkmate::test_path_for_output() or something similar would work.
| scenarios <- unique(unlist(lapply(files, `[[`, 2))) | ||
| df <- data.frame() | ||
| for (scenario in scenarios) { | ||
| matches <- files[unlist(lapply(files, `[[`, 2)) == scenario] | ||
| countries <- unique(unlist(lapply(matches, `[[`, 3))) | ||
| countries <- gsub(".pq", "", countries) |
There was a problem hiding this comment.
Could we get a comment or a section in the fn docs with what's intended here?
There was a problem hiding this comment.
Added some comments throughout that section...
|
|
||
| stone_stochastic_make_meta <- function(path) { | ||
|
|
||
| explore_files <- function(touchstone, folder, disease, group) { |
There was a problem hiding this comment.
Could we get docs on the expectations for these args? How are these injected into this scope? Is the expectation that they're present in the global scope?
It might be worth adding defaults to the fn signature - perhaps drawing from package constants?
There was a problem hiding this comment.
And perhaps some input checking so users know how and why their fn calls fail would be great.
There was a problem hiding this comment.
explore_files and touchstone_meta are only called from the code further down in stone_stochastic_make_meta - the user won't ever call them.
stone_stochastic_make_meta assumes the stochastic file share is perfectly curated (which is an interesting assumption), so it can work out all of these things automatically from the folder and file names.
I'll do some more commenting to clarify
| first <- file.path(path, touchstone, folder, files[1]) | ||
| ds <- arrow::open_dataset(first) | ||
| outcomes <- ds$schema$names | ||
| outcomes <- outcomes[!outcomes %in% c("run_id", "disease", "year", "age", |
There was a problem hiding this comment.
I see a candidate for a package constant...
| } | ||
|
|
||
| touchstones <- basename(list.dirs(paste0(path, "/"), recursive = FALSE)) | ||
| res <- data.table::rbindlist(lapply(touchstones, touchstone_meta)) |
There was a problem hiding this comment.
Could we apply this pattern - rbindlist() applied to a functional over a list, to the loop above too? No real reason other than it's nice.
There was a problem hiding this comment.
Yes, why not. It's tidier...
| stone_stochastic_graph <- function(base, | ||
| touchstones, disease, groups, country, | ||
| scenarios, outcome, | ||
| xaxis = "time", | ||
| filter = NULL, | ||
| by_cohort = FALSE, log = FALSE, | ||
| packit_id = NULL, packit_file = NULL, | ||
| include_stochastics = TRUE, | ||
| include_quantiles = TRUE, | ||
| include_mean = TRUE, | ||
| include_median = TRUE, | ||
| scenario2 = NULL) { | ||
| include_median = TRUE) { |
There was a problem hiding this comment.
Could we get input checks on these args, esp. touchstones, disease etc.?
There was a problem hiding this comment.
Semi-meaningful errors come out a bit later at the "file-not-found" level - but yes, I'll see if I can add something a bit better...
| ##' A feature-rich-ish graph plotting function, which can show the stochastic | ||
| ##' line for an outcome for different runs, the mean, median and quantiles. | ||
| ##' If two scenarios are specified, then the burden difference between | ||
| ##' scenarios is plotted. Additionally, if multiple groups or multiple |
There was a problem hiding this comment.
Would it be better to split the single scenario and scenario comparison functions into two? Might make it more modular.
There was a problem hiding this comment.
The differences in code between the single/comparison graphs ends up very slight - just y-axis/title labelling, a different line in get_burden_difference, so... not sure...
|
|
||
| stone_stochastic_graph <- function(base, |
There was a problem hiding this comment.
Could we get a @return tag saying what this fn returns? Seems to be a list with base graph elements. I'm not against base plotting per se, but if the list elements are going to be saved or re-used it seems like a good use case for {ggplot2} and gg-class elements.
There was a problem hiding this comment.
Yep - @return tag done - it is indeed a list of exactly 1 or 2 base plots, which is a bit odd. ggplot2 and combining the graphs in a panel would probably be the nicer way, but ggplot isn't quite in my hitting zone yet, so I may resort to some more skilled ggplotters for some snazzy plots.
In the meantime, up to two base plots, stochastic_explorer can then replay the plots to put them separately in the place it wants. RStudio plots the last (or only) one, so the user has to call plot again, and can only see them one at a time, which is a bit non-ideal.
For now, I've documented it, and maybe a later PR to fix it more properly.
There was a problem hiding this comment.
I see that this file presents pre-canned outputs stored in ../figures/, which is fine if these are generated from data that doesn't live in the pkg and/or can't be easily imported - or can it? Either/or? One option would be to add some example data as package data, which would also be a good templtate when developing more functions.
There was a problem hiding this comment.
Yeah, for now I plotted examples using real data in the file share and canned the results; I think I probably wouldn't want to have any genuine model data in the package, but yes, could manufacturer some interesting enough fake data which would make the vignette a better test - and would mean I wouldn't have to re-can if we snazz the plots up with ggplot2! Perhaps will push the discussion to the same time as ggplot2 improvements.
There was a problem hiding this comment.
Overall I'd push for the data processing and plotting to be separated. Functions downstream of data processing could be coded as pipe-friendly, so that operations can be chained together in vignettes etc.
|
Thanks @weshinsley - happy to sit with VIMC folks and go over needs for this, for a future PR ofc. |
A pretty horrible PR really with a lot of churn to achieve some specific things. It might be best to just review the finished stochastic_file and stochastic_graph files (and maybe read the vignette) - and not treat this as a diff, as it's so messy,
Anywyay - it achieves the following, which kind of had to come in one go, or something else would break...
stochastic_file has a new command
stone_stochastic_make_metawhich makes a meta.csv in the root of the stochastic files (\wpia-hn2.hpc.dide.ic.ac.uk\vimc_stochastics), giving meta-data about the groups, diseases, touchstones, scenarios, outcomes and countries, so we know what we've got at a glance, rather than having to do a nasty trawl.stochastic_explorer (the shiny app) uses the meta file instead of having to do exactly that nasty trawl, which saves about 20 seconds on startup, and makes everything a bit less sluggish generally. Some churn in utils.R with how it finds the info.
stochastic_graph can now produce two graphs at once with the same y-axis scale, in order to compare estimates in different touchstones, or by different modellin groups (keeping all other selections the same) - with the additional possibility of being able to plot the difference (often called impact) between two scenarios. This also was quite a churny update.
stochastic_graph can also plot with age as the x-axis, and filter which years to reduce - as well as plotting by time on the x-axis, and choosing which ages to reduce. Basically this is all flexed up.
All the above is available from the shiny app....
And written up in the vignette.