Skip to content

Commit 852e0ae

Browse files
Update rules docs
1 parent d9530ea commit 852e0ae

2 files changed

Lines changed: 222 additions & 0 deletions

File tree

docs/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ installation
1111
usage
1212
Parameter API <params_doc>
1313
output
14+
rules_parser
1415
contributing
1516
changelog_include
1617
```

docs/rules_parser.md

Lines changed: 221 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,221 @@
1+
# Rule Expression Grammar and Parser
2+
3+
DRAM uses a small, explicit expression language to define **rules**, **conditions**, and **logical combinations** of features.
4+
This page documents the grammar, available operators and functions, and an important design decision regarding **boolean operator ambiguity**.
5+
6+
---
7+
8+
## Overview
9+
10+
Rule expressions are used throughout DRAM to:
11+
12+
* Summarize (distillate) annotated genes into a summarized sheet (see distill_metals.tsv for examples)
13+
* Evaluate complex rules for gene traits (see trait_rules.tsv for examples)
14+
* Build visualizations (see dram_viz for examples)
15+
16+
The grammar is intentionally **restrictive** to avoid ambiguous interpretations that can easily arise in boolean logic.
17+
18+
---
19+
20+
## Expression Components
21+
22+
The lowest-level building blocks of expressions are genes:
23+
24+
```text
25+
KXXXXX (KEGG Orthology ID)
26+
```
27+
28+
These can be evaluated indivudally as presence/absence in your annotation sheet and then combined using boolean logic on other genes to get totaly trait presence/absence or scores.
29+
30+
### Boolean Operators
31+
32+
| Operator | Meaning | Notes |
33+
| -------- | ----------- | --------------------------------------------- |
34+
| `&` | Logical AND | May only be chained with other `&` operators |
35+
| `\|` | Logical OR | May only be chained with other `\|` operators |
36+
37+
❗ Mixing `&` and `\|` **requires explicit grouping**.
38+
39+
40+
### Grouping
41+
42+
Square brackets are used to explicitly group expressions:
43+
44+
```text
45+
[KXXXXX & KXXXXX] | KXXXXX
46+
```
47+
48+
Brackets may contain **any valid expression**, including nested boolean logic.
49+
50+
### Step Expressions
51+
52+
Comma-separated expressions define **step rules**:
53+
54+
```text
55+
KXXXXX & KXXXXX, KXXXXX | KXXXXX
56+
```
57+
58+
These steps represent a sequence or a pathway of evaluations. This can be fed into functions that operate on multiple steps, such as looking for a percentage of steps satisfied.
59+
60+
---
61+
62+
## Core Design Principle: No Implicit Boolean Precedence
63+
64+
DRAM **does not allow mixing `|` (OR) and `&` (AND) operators without explicit grouping**.
65+
66+
This means expressions like:
67+
68+
```text
69+
A | B | C & D
70+
```
71+
72+
are **invalid** and will result in a parsing error.
73+
74+
Instead, users must write:
75+
76+
```text
77+
A | B | [C & D]
78+
```
79+
80+
or:
81+
82+
```text
83+
[A | B | C] & D
84+
```
85+
86+
### Why this matters
87+
88+
In many languages, `&` has higher precedence than `|`, but this is:
89+
90+
* Often misunderstood
91+
* Easy to misread in complex biological rules
92+
* A frequent source of subtle bugs
93+
94+
DRAM therefore **disallows implicit precedence** and requires explicit grouping with brackets (`[...]`) whenever boolean operators are mixed.
95+
96+
---
97+
98+
## Functions
99+
100+
Functions operate on expressions or values and may appear anywhere an expression is allowed.
101+
102+
### General Form
103+
104+
```text
105+
function_name(arg1, arg2, ...)
106+
```
107+
108+
Arguments may be:
109+
110+
* Expressions
111+
* Numbers
112+
* Strings
113+
* Identifiers
114+
115+
---
116+
117+
### Functions
118+
119+
- not
120+
- percent
121+
- at_least
122+
- column_count_values
123+
124+
125+
#### not
126+
127+
Negates a boolean expression.
128+
129+
```text
130+
not(A & B)
131+
```
132+
133+
#### percent
134+
135+
Calculates percentage of steps satisfied and takes in a threshold value. Returns true if the percentage of satisfied steps meets or exceeds the threshold.
136+
137+
```text
138+
percent(60, [A & B, C | D, E])
139+
```
140+
141+
#### at_least
142+
143+
Checks if at least a certain number of steps are satisfied.
144+
145+
```text
146+
at_least(2, [A & B, C | D, E])
147+
```
148+
149+
#### column_count_values
150+
151+
152+
```text
153+
/**
154+
* Counts the number of values in a specified column and evaluates conditions based on the provided operations and thresholds.
155+
*
156+
* @param column_name The name of the column to be analyzed.
157+
* @param value_op The operation to be used for comparing the column values against the value_threshold (e.g., "gt", "ge", "lt", "le", "eq", "ne").
158+
* @param value_threshold The threshold value to compare the column values against using value_op.
159+
* @param count_op The operation to be used for comparing the counted values against the count_threshold (e.g., "gt", "ge", "lt", "le", "eq", "ne").
160+
* @param count_threshold The threshold value to compare the counted values against using count_op.
161+
* @return boolean Returns true if the conditions defined by value_op and count_op are satisfied; otherwise, returns false.
162+
*/
163+
164+
column_count_values("column_name", "value_op", value_threshold, "count_op", count_threshold)
165+
```
166+
167+
---
168+
169+
### Piped Function Usage
170+
171+
Filter functions may be used with the pipe operator:
172+
173+
```text
174+
not(filter_contains(kegg_description,"nitrate reductase")) -> column_count_values(heme_regulatory_motif_count,ge,4,ge,3)
175+
```
176+
177+
These filter functions can be used to prefilter data before applying counting or other operations. Filter functions are under the `filter_` namespace.
178+
179+
- filter_contains
180+
- filter_compare
181+
182+
#### filter_contains
183+
184+
Checks if a specified column contains a given substring.
185+
186+
```text
187+
filter_contains(column_name, substring)
188+
```
189+
190+
#### filter_compare
191+
192+
Compares values in a specified column against a threshold using a given operation.
193+
194+
```text
195+
filter_compare(column_name, operation, threshold)
196+
```
197+
198+
---
199+
200+
## Valid vs Invalid Examples
201+
202+
### ✅ Valid
203+
204+
```text
205+
A | B | C
206+
A & B & C
207+
[A & B] | C
208+
A | [B & C]
209+
filter_contains(col, "substring A") -> column_count_values(col,ge,2,ge,1)
210+
```
211+
212+
---
213+
214+
### ❌ Invalid
215+
216+
```text
217+
A | B | C & D
218+
A & B | C
219+
A | B & C | D
220+
[A, B, C] -> percent(50)
221+
```

0 commit comments

Comments
 (0)