Skip to content

feat: enhance biallelic_snps_to_plink with sex calls, phenotypes, and…#1051

Open
31puneet wants to merge 13 commits intomalariagen:masterfrom
31puneet:fix/issue-730-plink-enhancements
Open

feat: enhance biallelic_snps_to_plink with sex calls, phenotypes, and…#1051
31puneet wants to merge 13 commits intomalariagen:masterfrom
31puneet:fix/issue-730-plink-enhancements

Conversation

@31puneet
Copy link
Copy Markdown
Contributor

@31puneet 31puneet commented Mar 5, 2026

Overview

Fixes #730

What this PR does

  • Adds sex calls to the .fam file from sample_metadata() (M→1, F→2, unknown→0)
  • Makes n_snps optional — defaults to None, using all available SNPs
  • Adds custom output filenames via new output_name parameter
  • Adds phenotype support via new phenotypes parameter (user-provided mapping)
  • Fixes chromosome numbering to PLINK conventions (2R→1, 2L→2, 3R→3, 3L→4, X→23)
  • Item 5 (family tree) deferred as noted in the issue

Impact

  • No breaking changes — existing calls with explicit n_snps work identically
  • New parameters are optional with backward-compatible defaults

Files changed

  • malariagen_data/anoph/to_plink.py — main implementation
  • malariagen_data/anoph/plink_params.py — new parameter type aliases
  • tests/anoph/test_plink_converter.py — updated + 3 new tests

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 19, 2026

Codecov Report

❌ Patch coverage is 95.00000% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 89.55%. Comparing base (db8e6df) to head (caaf4b5).
⚠️ Report is 39 commits behind head on master.

Files with missing lines Patch % Lines
malariagen_data/anoph/to_plink.py 94.11% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1051      +/-   ##
==========================================
- Coverage   89.99%   89.55%   -0.44%     
==========================================
  Files          51       51              
  Lines        5735     5919     +184     
==========================================
+ Hits         5161     5301     +140     
- Misses        574      618      +44     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@31puneet
Copy link
Copy Markdown
Contributor Author

Hi @jonbrenas PTAL
I have resolved the merge conflicts.

@jonbrenas
Copy link
Copy Markdown
Collaborator

Thanks @31puneet, I think your mapping of chromosomes will work fine for gambiae, not so much for the other species. Which leads me to believe that it should be part of the config file for each species. What do you think?

@31puneet
Copy link
Copy Markdown
Contributor Author

Hi @jonbrenas, that's a great point, the hardcoded mapping won't work for Af1 or Adir1 since they have different contig layouts.
I think the cleanest approach would be to define a PLINK_CHROM_MAP in each species module, mapping contig names to PLINK chromosome codes.
Then pass it through the constructor to PlinkConverter.
This way each species controls its own mapping. Would that approach work, or would you prefer it in the JSON config on GCS?

@31puneet
Copy link
Copy Markdown
Contributor Author

Hi @jonbrenas PTAL

@31puneet
Copy link
Copy Markdown
Contributor Author

Hi @jonbrenas Thanks for the approval, you were worried about the other classes (Af1, Adir1) mapping do you want me to implement that ?

@jonbrenas
Copy link
Copy Markdown
Collaborator

Thanks, @31puneet. The problem is that the config files are part of the resources, not the API. What you could do is add the mapping to each class, which should work for now.

@31puneet
Copy link
Copy Markdown
Contributor Author

What you could do is add the mapping to each class, which should work for now.

Done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Adding more functionalities to the data converter for use with plink

2 participants