JSON metrics: use serde, add --sample-name flag#11
Open
werner291 wants to merge 1 commit into
Open
Conversation
Addresses NKI-GCF#10. Changes: - Add serde + serde_json; derive Serialize on a new MetricsReport struct whose field names match the existing JSON keys exactly. - Add --sample-name (-s) CLI flag. When provided, SAMPLE_NAME is included in the JSON output; when omitted, the field is absent (backward-compatible). - Fix: the old hand-rolled write_json emitted NaN for FRACTION_DUPLICATION when no reads were processed (0/0). NaN is not valid JSON and would break strict parsers. Now emits null instead. - Drive-by: type-annotate an empty vec in bktree tests to resolve a type-inference ambiguity introduced by serde_json's PartialEq impls. - Add 4 unit tests: serde-vs-legacy value equivalence, sample_name presence/absence, and the zero-reads null-fraction case.
0f5b7de to
57a7924
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hi, I'm Werner. I've got an interest in bioinformatics, especially in cancer genomics, so I figured I'd try to find some connections to the domain through some open-source contributions. I saw your issue #10 and figured I'd send in a PR.
Closes #10.
What this does
write_jsonwith serde (Rust's standard serialization library). The field names and numeric values are unchanged — a unit test parses both the old and new output as JSON and compares them field by field.--sample-name(-s) flag. When provided, aSAMPLE_NAMEfield is included in the JSON output. When omitted, the field is absent, so existing scripts that parse the JSON are unaffected.Things we noticed along the way
FRACTION_DUPLICATIONis 0/0 which the old code wrote asNaN— butNaNis not valid JSON and will break strict parsers. This PR emitsnullinstead. Happy to split it out if you'd prefer a separate fix.Sample output
Without
--sample-name:{"UNPAIRED_READS_EXAMINED":1000,"PAIRED_READS_EXAMINED":5000,...}With
--sample-name my_sample:{"SAMPLE_NAME":"my_sample","UNPAIRED_READS_EXAMINED":1000,...}