|
| 1 | +# Rule Expression Grammar and Parser |
| 2 | + |
| 3 | +DRAM uses a small, explicit expression language to define **rules**, **conditions**, and **logical combinations** of features. |
| 4 | +This page documents the grammar, available operators and functions, and an important design decision regarding **boolean operator ambiguity**. |
| 5 | + |
| 6 | +--- |
| 7 | + |
| 8 | +## Overview |
| 9 | + |
| 10 | +Rule expressions are used throughout DRAM to: |
| 11 | + |
| 12 | +* Summarize (distillate) annotated genes into a summarized sheet (see distill_metals.tsv for examples) |
| 13 | +* Evaluate complex rules for gene traits (see trait_rules.tsv for examples) |
| 14 | +* Build visualizations (see dram_viz for examples) |
| 15 | + |
| 16 | +The grammar is intentionally **restrictive** to avoid ambiguous interpretations that can easily arise in boolean logic. |
| 17 | + |
| 18 | +--- |
| 19 | + |
| 20 | +## Expression Components |
| 21 | + |
| 22 | +The lowest-level building blocks of expressions are genes: |
| 23 | + |
| 24 | +```text |
| 25 | +KXXXXX (KEGG Orthology ID) |
| 26 | +``` |
| 27 | + |
| 28 | +These can be evaluated indivudally as presence/absence in your annotation sheet and then combined using boolean logic on other genes to get totaly trait presence/absence or scores. |
| 29 | + |
| 30 | +### Boolean Operators |
| 31 | + |
| 32 | +| Operator | Meaning | Notes | |
| 33 | +| -------- | ----------- | --------------------------------------------- | |
| 34 | +| `&` | Logical AND | May only be chained with other `&` operators | |
| 35 | +| `\|` | Logical OR | May only be chained with other `\|` operators | |
| 36 | + |
| 37 | +❗ Mixing `&` and `\|` **requires explicit grouping**. |
| 38 | + |
| 39 | + |
| 40 | +### Grouping |
| 41 | + |
| 42 | +Square brackets are used to explicitly group expressions: |
| 43 | + |
| 44 | +```text |
| 45 | +[KXXXXX & KXXXXX] | KXXXXX |
| 46 | +``` |
| 47 | + |
| 48 | +Brackets may contain **any valid expression**, including nested boolean logic. |
| 49 | + |
| 50 | +### Step Expressions |
| 51 | + |
| 52 | +Comma-separated expressions define **step rules**: |
| 53 | + |
| 54 | +```text |
| 55 | +KXXXXX & KXXXXX, KXXXXX | KXXXXX |
| 56 | +``` |
| 57 | + |
| 58 | +These steps represent a sequence or a pathway of evaluations. This can be fed into functions that operate on multiple steps, such as looking for a percentage of steps satisfied. |
| 59 | + |
| 60 | +--- |
| 61 | + |
| 62 | +## Core Design Principle: No Implicit Boolean Precedence |
| 63 | + |
| 64 | +DRAM **does not allow mixing `|` (OR) and `&` (AND) operators without explicit grouping**. |
| 65 | + |
| 66 | +This means expressions like: |
| 67 | + |
| 68 | +```text |
| 69 | +A | B | C & D |
| 70 | +``` |
| 71 | + |
| 72 | +are **invalid** and will result in a parsing error. |
| 73 | + |
| 74 | +Instead, users must write: |
| 75 | + |
| 76 | +```text |
| 77 | +A | B | [C & D] |
| 78 | +``` |
| 79 | + |
| 80 | +or: |
| 81 | + |
| 82 | +```text |
| 83 | +[A | B | C] & D |
| 84 | +``` |
| 85 | + |
| 86 | +### Why this matters |
| 87 | + |
| 88 | +In many languages, `&` has higher precedence than `|`, but this is: |
| 89 | + |
| 90 | +* Often misunderstood |
| 91 | +* Easy to misread in complex biological rules |
| 92 | +* A frequent source of subtle bugs |
| 93 | + |
| 94 | +DRAM therefore **disallows implicit precedence** and requires explicit grouping with brackets (`[...]`) whenever boolean operators are mixed. |
| 95 | + |
| 96 | +--- |
| 97 | + |
| 98 | +## Functions |
| 99 | + |
| 100 | +Functions operate on expressions or values and may appear anywhere an expression is allowed. |
| 101 | + |
| 102 | +### General Form |
| 103 | + |
| 104 | +```text |
| 105 | +function_name(arg1, arg2, ...) |
| 106 | +``` |
| 107 | + |
| 108 | +Arguments may be: |
| 109 | + |
| 110 | +* Expressions |
| 111 | +* Numbers |
| 112 | +* Strings |
| 113 | +* Identifiers |
| 114 | + |
| 115 | +--- |
| 116 | + |
| 117 | +### Functions |
| 118 | + |
| 119 | +- not |
| 120 | +- percent |
| 121 | +- at_least |
| 122 | +- column_count_values |
| 123 | + |
| 124 | + |
| 125 | +#### not |
| 126 | + |
| 127 | +Negates a boolean expression. |
| 128 | + |
| 129 | +```text |
| 130 | +not(A & B) |
| 131 | +``` |
| 132 | + |
| 133 | +#### percent |
| 134 | + |
| 135 | +Calculates percentage of steps satisfied and takes in a threshold value. Returns true if the percentage of satisfied steps meets or exceeds the threshold. |
| 136 | + |
| 137 | +```text |
| 138 | +percent(60, [A & B, C | D, E]) |
| 139 | +``` |
| 140 | + |
| 141 | +#### at_least |
| 142 | + |
| 143 | +Checks if at least a certain number of steps are satisfied. |
| 144 | + |
| 145 | +```text |
| 146 | +at_least(2, [A & B, C | D, E]) |
| 147 | +``` |
| 148 | + |
| 149 | +#### column_count_values |
| 150 | + |
| 151 | + |
| 152 | +```text |
| 153 | +/** |
| 154 | + * Counts the number of values in a specified column and evaluates conditions based on the provided operations and thresholds. |
| 155 | + * |
| 156 | + * @param column_name The name of the column to be analyzed. |
| 157 | + * @param value_op The operation to be used for comparing the column values against the value_threshold (e.g., "gt", "ge", "lt", "le", "eq", "ne"). |
| 158 | + * @param value_threshold The threshold value to compare the column values against using value_op. |
| 159 | + * @param count_op The operation to be used for comparing the counted values against the count_threshold (e.g., "gt", "ge", "lt", "le", "eq", "ne"). |
| 160 | + * @param count_threshold The threshold value to compare the counted values against using count_op. |
| 161 | + * @return boolean Returns true if the conditions defined by value_op and count_op are satisfied; otherwise, returns false. |
| 162 | + */ |
| 163 | +
|
| 164 | +column_count_values("column_name", "value_op", value_threshold, "count_op", count_threshold) |
| 165 | +``` |
| 166 | + |
| 167 | +--- |
| 168 | + |
| 169 | +### Piped Function Usage |
| 170 | + |
| 171 | +Filter functions may be used with the pipe operator: |
| 172 | + |
| 173 | +```text |
| 174 | +not(filter_contains(kegg_description,"nitrate reductase")) -> column_count_values(heme_regulatory_motif_count,ge,4,ge,3) |
| 175 | +``` |
| 176 | + |
| 177 | +These filter functions can be used to prefilter data before applying counting or other operations. Filter functions are under the `filter_` namespace. |
| 178 | + |
| 179 | +- filter_contains |
| 180 | +- filter_compare |
| 181 | + |
| 182 | +#### filter_contains |
| 183 | + |
| 184 | +Checks if a specified column contains a given substring. |
| 185 | + |
| 186 | +```text |
| 187 | +filter_contains(column_name, substring) |
| 188 | +``` |
| 189 | + |
| 190 | +#### filter_compare |
| 191 | + |
| 192 | +Compares values in a specified column against a threshold using a given operation. |
| 193 | + |
| 194 | +```text |
| 195 | +filter_compare(column_name, operation, threshold) |
| 196 | +``` |
| 197 | + |
| 198 | +--- |
| 199 | + |
| 200 | +## Valid vs Invalid Examples |
| 201 | + |
| 202 | +### ✅ Valid |
| 203 | + |
| 204 | +```text |
| 205 | +A | B | C |
| 206 | +A & B & C |
| 207 | +[A & B] | C |
| 208 | +A | [B & C] |
| 209 | +filter_contains(col, "substring A") -> column_count_values(col,ge,2,ge,1) |
| 210 | +``` |
| 211 | + |
| 212 | +--- |
| 213 | + |
| 214 | +### ❌ Invalid |
| 215 | + |
| 216 | +```text |
| 217 | +A | B | C & D |
| 218 | +A & B | C |
| 219 | +A | B & C | D |
| 220 | +[A, B, C] -> percent(50) |
| 221 | +``` |
0 commit comments