Skip to content

JSON metrics: use serde, add --sample-name flag#11

Open
werner291 wants to merge 1 commit into
NKI-GCF:masterfrom
werner291:json-metrics-serde
Open

JSON metrics: use serde, add --sample-name flag#11
werner291 wants to merge 1 commit into
NKI-GCF:masterfrom
werner291:json-metrics-serde

Conversation

@werner291

Copy link
Copy Markdown

Hi, I'm Werner. I've got an interest in bioinformatics, especially in cancer genomics, so I figured I'd try to find some connections to the domain through some open-source contributions. I saw your issue #10 and figured I'd send in a PR.

Closes #10.

What this does

  • Replaces the manual JSON formatting in write_json with serde (Rust's standard serialization library). The field names and numeric values are unchanged — a unit test parses both the old and new output as JSON and compares them field by field.
  • Adds a --sample-name (-s) flag. When provided, a SAMPLE_NAME field is included in the JSON output. When omitted, the field is absent, so existing scripts that parse the JSON are unaffected.

Things we noticed along the way

  • NaN in JSON output: When no reads are processed, FRACTION_DUPLICATION is 0/0 which the old code wrote as NaN — but NaN is not valid JSON and will break strict parsers. This PR emits null instead. Happy to split it out if you'd prefer a separate fix.
  • Test coverage: We couldn't verify end-to-end behavior because there are no integration tests with BAM fixtures. The unit tests confirm the JSON values match, but we had no way to run the full pipeline and diff the output. If it would be useful, I'd be happy to help set up a small integration test with a sample BAM file.

Sample output

Without --sample-name:

{"UNPAIRED_READS_EXAMINED":1000,"PAIRED_READS_EXAMINED":5000,...}

With --sample-name my_sample:

{"SAMPLE_NAME":"my_sample","UNPAIRED_READS_EXAMINED":1000,...}

Addresses NKI-GCF#10. Changes:

- Add serde + serde_json; derive Serialize on a new MetricsReport struct
  whose field names match the existing JSON keys exactly.
- Add --sample-name (-s) CLI flag. When provided, SAMPLE_NAME is included
  in the JSON output; when omitted, the field is absent (backward-compatible).
- Fix: the old hand-rolled write_json emitted NaN for FRACTION_DUPLICATION
  when no reads were processed (0/0). NaN is not valid JSON and would break
  strict parsers. Now emits null instead.
- Drive-by: type-annotate an empty vec in bktree tests to resolve a
  type-inference ambiguity introduced by serde_json's PartialEq impls.
- Add 4 unit tests: serde-vs-legacy value equivalence, sample_name
  presence/absence, and the zero-reads null-fraction case.
@werner291 werner291 force-pushed the json-metrics-serde branch from 0f5b7de to 57a7924 Compare April 13, 2026 10:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optionally output metrics as json for easy parsing

1 participant