Skip to content

BTE MVP1 template changes#107

Merged
maximusunc merged 5 commits into
mainfrom
cx-template-changes
Jun 26, 2026
Merged

BTE MVP1 template changes#107
maximusunc merged 5 commits into
mainfrom
cx-template-changes

Conversation

@colleenXu

Copy link
Copy Markdown
Contributor

Max, please try out these new MVP1 templates (listed in the template_groups.json diff) and let me know what you think (fewer results? better results? faster?). If this is likely an improvement, maybe it'd be good to get this into CI ASAP for testing and the "Test" environ deployment?

Background: ~ 1 month ago, we discussed how BTE's gene-intermediate template was the culprit blowing up on MVP1 queries and returning way too many results (238k for asthma MONDO:0004979). I was tasked with investigating why and adjusting the template to return fewer results. I discovered that the Gene-DiseaseOrPheno hop had issues (for asthma, 1 pred returned >1000 edges from geneticskp, and the other had no MetaEdges). So I wrote new templates to hit Gene-DiseaseOrPheno MetaEdges that are more strong/causal and from fairly reliable resources (HPOA, G2P, AGR). See the commit messages for more details.

Some info for testing:

  • the associated_variant_contributes template is still liable to blow up and returns ~39k results for asthma right now (with subclassing turned off in Tier 0?). But...that's still better than before? The Gene-DiseaseOrPheno hop is reasonable (returns 198-199 edges/intermediates from HPOA); it's the 2nd Chem-Gene hop that blows up. I didn't work on that hop 😖 (ran out of time). Based on a quick look, a lot of that comes from CTD (26663 / 54626 edges). But I was told it's not possible to constrain the source (ideally for a template/QEdge) right now...
  • the other templates don't return any results for asthma. I instead tested them using:
    • affects_increases: OMIM:615190
    • affects_decreases: MONDO:0032942
    • has_phenotype: MONDO:0001068 (Gene-Disease edges from AGR). For asthma, this template's Gene-Disease hop does return 1 edge from HPOA.

Max told the BTE devs that this template was returning too many results - "238k results for asthma (MONDO:0004979)".
I investigated and found that Gene-DoP edge had issues: (1) the gene_associated_with_condition predicate was returning many edges from geneticskp (not original intent of template), leading to many intermediate nodes; and (2) the biomarker edge was returning nothing due to no matching MetaEdges.
I reviewed DINGO KGX metadata and found MetaEdges that retrieve "more causal" genes from fairly reliable resources (HPOA, G2P). So I adjusted the Gene-DoP edge to hit those edges specifically and not geneticskp or semmeddb (using qualifier-constraints).
For the Chem-Gene hop, I also made a change: removed the "regulates" predicate since "affects" (now its parent) covers its cases.
The Gene-has_phenotype-DoP hop hits MetaEdges from AGR and HPOA that seem fairly reliable. There aren't many currently, <3000.
For the Chem-Gene hop, I removed the "regulates" predicate since "affects" (now its parent) covers its cases.
A good proportion of G2P's Gene-Disease edges have directional qualifiers: loss or non-loss (diff types of gain) of function.
For these, more directional Chem-affects-Gene edges can be used to try to find potential drugs that counteract the gene variant effect.

But these templates are so specific that they will only find results in a subset of diseases (ex: no results for asthma MONDO:0004979).
They are covered/emcompassed by the broader template Chem-affects-Gene-associated_variant_contributes-DoP.json.
this is a defunct property. waiting for TRAPI 2.0 to reintroduce the behavior (COLLATE)
put all together in 1 folder, so only the templates currently used are on the top level
@colleenXu colleenXu requested a review from maximusunc June 25, 2026 09:02
@codecov

codecov Bot commented Jun 25, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 47.70%. Comparing base (f06c367) to head (038c497).
⚠️ Report is 23 commits behind head on main.
see 6 files with indirect coverage changes


Continue to review full report in Codecov by Harness.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e09b904...038c497. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@maximusunc

Copy link
Copy Markdown
Collaborator

I ran Asthma against this and the largest query returned 20k, which is much more manageable. It, of course, runs faster because we're not handling as many results. I can't speak to whether the results are any "better", but I'd like to merge in and deploy to CI so that we can see any differences in the automated tests.

@maximusunc maximusunc merged commit 667a9fd into main Jun 26, 2026
2 checks passed
@maximusunc maximusunc deleted the cx-template-changes branch June 26, 2026 17:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants