diff --git a/README.md b/README.md index dd5ba60..908b0a1 100644 --- a/README.md +++ b/README.md @@ -332,7 +332,7 @@ Lockstep versioning — one dispatch bumps every product to the same `vX.Y.Z`. T - [ ] *(deferred to Phase 8)* Full-text search with BM25 + hybrid retrieval **Possible extras** *(no committed phase)* -- `HAVING`, `IN (subquery)`, `BETWEEN`, `GLOB` / `REGEXP`, `GROUP_CONCAT`, window functions +- `IN (subquery)`, `BETWEEN`, `GLOB` / `REGEXP`, `GROUP_CONCAT`, window functions - Composite and expression indexes (with cost analysis) - Alternate storage engines — LSM/SSTable for write-heavy workloads alongside the B-Tree - Benchmarks against SQLite diff --git a/docs/design-decisions.md b/docs/design-decisions.md index 5ef3bf4..87bd714 100644 --- a/docs/design-decisions.md +++ b/docs/design-decisions.md @@ -398,7 +398,7 @@ A hidden primary that the registry itself owns sidesteps both problems: every `* **Decision.** In [`eval_predicate`](../src/sql/executor.rs), a `WHERE` expression evaluating to `NULL` is treated as `false` — the row does *not* match. -**Why.** Matches SQL's three-valued logic in spirit: `NULL` propagates through comparisons, and a `WHERE` requires a definitely-true predicate. Doing strict 3VL would mean threading an explicit `Option` / "unknown" state through the evaluator. For a query surface that doesn't have `HAVING` or aggregate post-filters, implicit coercion to `false` at the `WHERE` boundary is equivalent for every statement we execute. +**Why.** Matches SQL's three-valued logic in spirit: `NULL` propagates through comparisons, and a `WHERE` requires a definitely-true predicate. Doing strict 3VL would mean threading an explicit `Option` / "unknown" state through the evaluator. Implicit coercion to `false` at the filter boundary is equivalent for every statement we execute; `HAVING` (SQLR-52) reuses the same collapse — a group whose predicate evaluates to `NULL` is dropped. **Cost.** Diverges subtly from strict SQL on edge cases involving `NULL` through `NOT` / `AND` / `OR`. If this matters later, the evaluator can be upgraded to 3VL without touching callers. diff --git a/docs/roadmap.md b/docs/roadmap.md index 8c95d6b..1e92747 100644 --- a/docs/roadmap.md +++ b/docs/roadmap.md @@ -734,7 +734,7 @@ Promotes the plan to a canonical user-facing reference at [`docs/concurrent-writ The remaining items — actually open, not retroactively rewritten: - Subqueries (scalar, `IN (SELECT ...)`, correlated) and CTEs (`WITH`, recursive) -- `HAVING` (post-aggregation filter) +- ~~`HAVING` (post-aggregation filter)~~ ✅ Shipped (SQLR-52) — group-row filter after aggregation; references GROUP BY keys, aggregate aliases, and direct aggregate calls (hidden-slot computation for HAVING-only aggregates). `HAVING` without `GROUP BY` stays rejected in v0. - `CASE WHEN … THEN … END`, `BETWEEN`, `GLOB`, `REGEXP`, `LIKE … ESCAPE ''` - Aggregates / `GROUP BY` / `DISTINCT` *over* joins (needs a single executor pass that knows about multiple input streams) - Multi-column / expression `ORDER BY`, `OFFSET`, `NULLS FIRST/LAST` diff --git a/docs/sql-engine.md b/docs/sql-engine.md index 72dc8b8..6b30299 100644 --- a/docs/sql-engine.md +++ b/docs/sql-engine.md @@ -166,14 +166,20 @@ contributes to its group), so the executor takes a separate path: 4. Emit one output row per group, in projection order — bare-column slots emit the captured group-key value, aggregate slots emit `AggState::finalize()`. -5. Apply DISTINCT (post-projection dedup), then ORDER BY (resolved +5. Apply `HAVING` (SQLR-52): the expression is lowered once — aggregate + calls become identifiers naming their output slot, with *hidden* + trailing slots appended for aggregates / GROUP BY keys referenced + only in HAVING — then evaluated per group row through a + `GroupRowScope` using the same expression evaluator as WHERE + (NULL-as-false). Hidden slots are stripped after filtering. +6. Apply DISTINCT (post-projection dedup), then ORDER BY (resolved against the *output* row by alias, bare column name, or aggregate display form), then LIMIT. Aggregate function names (`COUNT`/`SUM`/`AVG`/`MIN`/`MAX`) used in WHERE or any other scalar position get a friendly error redirecting the user -to the projection list (since `HAVING` isn't supported yet). DISTINCT -on `SUM`/`AVG`/`MIN`/`MAX` is rejected at parse time; only +to the projection list (`HAVING` is where post-aggregate filters go). +DISTINCT on `SUM`/`AVG`/`MIN`/`MAX` is rejected at parse time; only `COUNT(DISTINCT col)` is in v1. `LIKE` / `ILIKE` use a hand-rolled iterative two-pointer matcher in diff --git a/docs/supported-sql.md b/docs/supported-sql.md index abbd4a7..b659f68 100644 --- a/docs/supported-sql.md +++ b/docs/supported-sql.md @@ -184,6 +184,7 @@ FROM [AS ] [{INNER | LEFT [OUTER] | RIGHT [OUTER] | FULL [OUTER]} JOIN
[AS ] ON ]* [WHERE ] [GROUP BY [, , ...]] + [HAVING ] [ORDER BY [ASC|DESC]] [LIMIT ]; ``` @@ -204,6 +205,7 @@ COUNT([DISTINCT] ) -- counts non-NULL values, option - **`WHERE`**: any [expression](#expressions). Evaluated per row; NULL-as-false in WHERE context (three-valued logic collapsed to two-valued for filtering). Includes **`IS NULL`** / **`IS NOT NULL`** for explicit null tests, **`LIKE` / `NOT LIKE` / `ILIKE`** for pattern matching, and **`IN (list) / NOT IN (list)`** for set-membership against literal lists. - **`DISTINCT`**: `SELECT DISTINCT` deduplicates result rows after projection (and after aggregation, when both apply). `NULL` values compare equal to other `NULL`s for dedupe, matching SQL's DISTINCT semantic. - **`GROUP BY`**: one or more bare column names. Every non-aggregate item in the projection must appear in the `GROUP BY` list (the parser rejects the violation with a clear message). `GROUP BY ` without any aggregate behaves like an implicit `DISTINCT `. +- **`HAVING`** (SQLR-52): post-aggregation filter over the grouped output. `WHERE` filters rows before grouping; `HAVING` filters groups after aggregation. Requires `GROUP BY` (see [HAVING semantics](#having-semantics-sqlr-52)). - **Aggregates** (SQLR-3): `COUNT(*)`, `COUNT(col)`, `COUNT(DISTINCT col)`, `SUM(col)`, `AVG(col)`, `MIN(col)`, `MAX(col)`. `SUM` over an integer column stays `INTEGER` until a `REAL` input arrives or the running sum overflows `i64` (one-time promotion to `REAL`). `AVG` always returns `REAL` (or `NULL` on empty / all-NULL groups). `MIN` / `MAX` skip NULLs and use the same total order as `ORDER BY`. Aggregates over an empty table or empty group return `0` for `COUNT(*)` / `COUNT(col)` and `NULL` for the rest. - **`ORDER BY`**: single sort key, `ASC` (default) or `DESC`. For non-aggregating queries the key is any expression — including function calls — so KNN queries like `ORDER BY vec_distance_l2(embedding, [...]) LIMIT k` work end-to-end *(Phase 7b)*. For aggregating queries the key resolves against the *output* row by name: a bare identifier matches an alias or a `GROUP BY` column, and a function call like `COUNT(*)` matches an aggregate projection by its canonical display form. Sort key types must match across rows. - **`LIMIT`**: non-negative integer literal. `LIMIT 0` is valid (returns zero rows). When `DISTINCT` is in play, `LIMIT` is applied after deduplication so it counts unique rows. @@ -260,12 +262,28 @@ The executor includes a tiny optimizer: if the `WHERE` is exactly ` - Three-valued logic: if the LHS is `NULL`, the result is `NULL`; if the RHS list contains a `NULL` and no other entry matches, the result is `NULL`. In a `WHERE` both cases collapse to "row excluded", matching SQLite. - `IN (subquery)`, `IN UNNEST(...)`, and `BETWEEN` are not supported yet. +### `HAVING` semantics (SQLR-52) + +- Post-aggregation filter: groups whose `HAVING` expression evaluates to false or `NULL` are dropped (NULL-as-false, the same three-valued-logic collapse `WHERE` applies). +- **Requires `GROUP BY`.** The degenerate no-`GROUP-BY` single-group form SQLite allows is rejected with a clear `NotImplemented` — use `WHERE` for row-level filters. +- **What's in scope:** the `GROUP BY` key columns (their per-group values), aggregate output columns by alias (`SUM(salary) AS total … HAVING total > 100`), and aggregate calls written out directly (`HAVING COUNT(*) > 1`, matched case-insensitively by canonical display form). +- Aggregates and `GROUP BY` keys referenced **only** in `HAVING` work too — `SELECT dept FROM emp GROUP BY dept HAVING COUNT(*) > 1` computes the count without projecting it. +- Any other column reference is an error (matches SQLite: `HAVING` sees the grouped output, not the raw rows). +- The expression surface is the same as `WHERE`: comparisons, `AND` / `OR` / `NOT`, arithmetic, `IS [NOT] NULL`, `LIKE`, `IN (list)`. +- Runs after `WHERE` + aggregation, before `DISTINCT`, `ORDER BY`, and `LIMIT`. + +```sql +SELECT dept, COUNT(*) FROM emp GROUP BY dept HAVING COUNT(*) > 1; +SELECT dept, SUM(salary) AS total FROM emp GROUP BY dept HAVING total > 100000; +SELECT dept FROM emp GROUP BY dept HAVING COUNT(*) > 1 AND SUM(salary) > 100; +``` + ### What doesn't work - **Comma-separated FROM lists** (`FROM a, b`) — use an explicit `JOIN` / `CROSS JOIN`. `INNER` / `LEFT` / `RIGHT` / `FULL OUTER` / `CROSS` with `ON` / `USING` / `NATURAL` are all supported (see [JOIN semantics](#join-semantics-sqlr-5)) - **Aggregates** / **`GROUP BY`** / **`DISTINCT`** over a JOIN — pipe through a subquery once subqueries land - **Subqueries**, CTEs (`WITH`), views -- **`HAVING`** — pre-aggregation `WHERE` works; post-aggregation filtering does not yet +- **`HAVING` without `GROUP BY`** — the degenerate single-group form is rejected; `HAVING` with `GROUP BY` works (see [HAVING semantics](#having-semantics-sqlr-52)) - **`DISTINCT`** on `SUM` / `AVG` / `MIN` / `MAX` (only `COUNT(DISTINCT col)` is supported) - **`GROUP BY` on expressions** — bare column names only in v1 - **`LIKE … ESCAPE ''`**, **`IN (subquery)`**, **`BETWEEN`**, **`GLOB`**, **`REGEXP`** @@ -715,7 +733,7 @@ For context when you hit `NotImplemented`. See [Roadmap](roadmap.md) for when th - Views (`CREATE VIEW`) ### Aggregation & grouping -- `HAVING` — pre-aggregation `WHERE` works; post-aggregation filtering doesn't yet +- `HAVING` without `GROUP BY` — the degenerate single-group form; `HAVING` over grouped output works (SQLR-52) - `DISTINCT` on `SUM` / `AVG` / `MIN` / `MAX` (only `COUNT(DISTINCT col)` is supported) - `GROUP BY` on expressions — bare column names only - Other aggregate functions (`GROUP_CONCAT`, `STRING_AGG`, …) — only `COUNT` / `SUM` / `AVG` / `MIN` / `MAX` are wired diff --git a/src/sql/executor.rs b/src/sql/executor.rs index 6a65ab6..3d44b93 100644 --- a/src/sql/executor.rs +++ b/src/sql/executor.rs @@ -21,8 +21,8 @@ use crate::sql::db::table::{ use crate::sql::fts::{Bm25Params, PostingList}; use crate::sql::hnsw::{DistanceMetric, HnswIndex}; use crate::sql::parser::select::{ - AggregateArg, JoinConstraintKind, JoinType, OrderByClause, Projection, ProjectionItem, - ProjectionKind, SelectQuery, + AggregateArg, AggregateFn, JoinConstraintKind, JoinType, OrderByClause, Projection, + ProjectionItem, ProjectionKind, SelectQuery, parse_aggregate_call, }; // ----------------------------------------------------------------- @@ -279,8 +279,40 @@ pub fn execute_select_rows(query: SelectQuery, db: &Database) -> Result 1`). + // We append those as *hidden* trailing projection slots so the + // existing `aggregate_rows` accumulator computes them alongside + // the visible ones, then strip them after filtering. Aggregate + // calls in the HAVING tree are lowered to identifiers naming + // their output slot (`COUNT(*)` → identifier "COUNT(*)") so the + // shared expression evaluator can resolve them through a + // `GroupRowScope` like any other column. + let mut all_items = proj_items.clone(); + let having_expr = match &query.having { + Some(h) => { + for g in &query.group_by { + if !all_items + .iter() + .any(|i| i.output_name().eq_ignore_ascii_case(g)) + { + all_items.push(ProjectionItem { + kind: ProjectionKind::Column { + qualifier: None, + name: g.clone(), + }, + alias: None, + }); + } + } + Some(lower_having_expr(h, &mut all_items)?) + } + None => None, + }; + + // Validate aggregate column args (visible + HAVING-hidden). + for item in &all_items { if let ProjectionKind::Aggregate(call) = &item.kind && let AggregateArg::Column(c) = &call.arg && !table.contains_column(c.clone()) @@ -295,7 +327,18 @@ pub fn execute_select_rows(query: SelectQuery, db: &Database) -> Result = proj_items.iter().map(|i| i.output_name()).collect(); - let mut rows = aggregate_rows(table, &matching, &query.group_by, &proj_items)?; + let mut rows = aggregate_rows(table, &matching, &query.group_by, &all_items)?; + + if let Some(h) = &having_expr { + let all_columns: Vec = all_items.iter().map(|i| i.output_name()).collect(); + rows = filter_groups_by_having(rows, h, &all_columns)?; + } + // Drop the hidden HAVING-only slots back to the user-visible width. + if all_items.len() > proj_items.len() { + for row in &mut rows { + row.truncate(proj_items.len()); + } + } if query.distinct { rows = dedupe_rows(rows); @@ -3335,6 +3378,165 @@ fn aggregate_rows( Ok(rows) } +// ----------------------------------------------------------------- +// SQLR-52 — HAVING (post-aggregation filter) +// ----------------------------------------------------------------- + +/// Scope for evaluating a HAVING expression against one group's output +/// row. Column references resolve against the output column names — +/// GROUP BY keys, aggregate aliases, aggregate display forms like +/// `COUNT(*)` (the lowered shape `lower_having_expr` produces), and +/// the hidden HAVING-only slots appended by the executor. +struct GroupRowScope<'a> { + columns: &'a [String], + values: &'a [Value], +} + +impl RowScope for GroupRowScope<'_> { + fn lookup(&self, qualifier: Option<&str>, col: &str) -> Result { + // Output columns carry no table qualifier — `t.dept` in HAVING + // resolves by its column part, same as the aggregating ORDER BY. + let _ = qualifier; + self.columns + .iter() + .position(|c| c.eq_ignore_ascii_case(col)) + .map(|i| self.values[i].clone()) + .ok_or_else(|| { + SQLRiteError::Internal(format!( + "HAVING references '{col}', which is neither a GROUP BY column nor an \ + aggregate in scope" + )) + }) + } + + fn single_table_view(&self) -> Option<(&Table, i64)> { + None + } +} + +/// Rewrite a HAVING expression for group-row evaluation: every +/// aggregate call in the tree becomes an identifier naming its output +/// slot (`SUM(salary)` → identifier `"SUM(salary)"`), registering a +/// hidden projection slot for any aggregate not already in the SELECT +/// list so `aggregate_rows` computes it. Non-aggregate functions and +/// leaf expressions pass through untouched — the shared evaluator +/// handles (or rejects) them at filter time. +fn lower_having_expr(expr: &Expr, items: &mut Vec) -> Result { + Ok(match expr { + Expr::Function(func) => { + let is_aggregate = matches!( + func.name.0.as_slice(), + [ObjectNamePart::Identifier(ident)] if AggregateFn::from_name(&ident.value).is_some() + ); + if !is_aggregate { + return Ok(expr.clone()); + } + let call = parse_aggregate_call(func)?; + let display = call.display_name(); + // Resolvable already? Identifier lookup goes by output + // column name, so an unaliased projection of the same + // aggregate (output name == display form) suffices. An + // *aliased* one doesn't — its output name is the alias — + // so the call still gets a hidden slot of its own. + let already_known = items + .iter() + .any(|i| i.output_name().eq_ignore_ascii_case(&display)); + if !already_known { + items.push(ProjectionItem { + kind: ProjectionKind::Aggregate(call), + alias: None, + }); + } + Expr::Identifier(Ident::new(display)) + } + Expr::Nested(inner) => Expr::Nested(Box::new(lower_having_expr(inner, items)?)), + Expr::UnaryOp { op, expr: inner } => Expr::UnaryOp { + op: *op, + expr: Box::new(lower_having_expr(inner, items)?), + }, + Expr::BinaryOp { left, op, right } => Expr::BinaryOp { + left: Box::new(lower_having_expr(left, items)?), + op: op.clone(), + right: Box::new(lower_having_expr(right, items)?), + }, + Expr::IsNull(inner) => Expr::IsNull(Box::new(lower_having_expr(inner, items)?)), + Expr::IsNotNull(inner) => Expr::IsNotNull(Box::new(lower_having_expr(inner, items)?)), + Expr::InList { + expr: lhs, + list, + negated, + } => Expr::InList { + expr: Box::new(lower_having_expr(lhs, items)?), + list: list + .iter() + .map(|e| lower_having_expr(e, items)) + .collect::>>()?, + negated: *negated, + }, + Expr::Like { + negated, + any, + expr: lhs, + pattern, + escape_char, + } => Expr::Like { + negated: *negated, + any: *any, + expr: Box::new(lower_having_expr(lhs, items)?), + pattern: Box::new(lower_having_expr(pattern, items)?), + escape_char: escape_char.clone(), + }, + Expr::ILike { + negated, + any, + expr: lhs, + pattern, + escape_char, + } => Expr::ILike { + negated: *negated, + any: *any, + expr: Box::new(lower_having_expr(lhs, items)?), + pattern: Box::new(lower_having_expr(pattern, items)?), + escape_char: escape_char.clone(), + }, + // Leaves (identifiers, literals) and unsupported shapes pass + // through; the evaluator produces its own error for the latter. + other => other.clone(), + }) +} + +/// Keep only the groups whose HAVING expression evaluates truthy. +/// NULL collapses to false — same three-valued-logic coercion the +/// WHERE path applies. +fn filter_groups_by_having( + rows: Vec>, + having: &Expr, + columns: &[String], +) -> Result>> { + let mut out = Vec::with_capacity(rows.len()); + for row in rows { + let scope = GroupRowScope { + columns, + values: &row, + }; + let keep = match eval_expr_scope(having, &scope)? { + Value::Bool(b) => b, + Value::Null => false, + Value::Integer(i) => i != 0, + other => { + return Err(SQLRiteError::Internal(format!( + "HAVING clause must evaluate to boolean, got {}", + other.to_display_string() + ))); + } + }; + if keep { + out.push(row); + } + } + Ok(out) +} + /// SELECT DISTINCT post-pass. Walks the rows once with a `HashSet` of /// row-keys, preserving first-occurrence order. NULL == NULL for /// dedupe purposes, which matches the SQL DISTINCT semantic. @@ -4231,6 +4433,226 @@ mod tests { assert!(err.is_err(), "aggregates must not be allowed in WHERE"); } + // --------------------------------------------------------------------- + // SQLR-52 — HAVING (post-aggregation filter) + // --------------------------------------------------------------------- + // + // seed_employees groups: eng × 3 (salaries 100, 120, 100 → SUM 320), + // sales × 2 (90, NULL → SUM 90), ops × 1 (80). Groups materialize in + // first-occurrence order: eng, sales, ops. + + #[test] + fn having_count_filters_groups() { + let db = seed_employees(); + let r = run_rows( + &db, + "SELECT dept, COUNT(*) FROM emp GROUP BY dept HAVING COUNT(*) > 1;", + ); + // ops (1 member) drops; hidden HAVING slots must not leak into + // the output width. + assert_eq!(r.columns, vec!["dept".to_string(), "COUNT(*)".to_string()]); + let got: Vec<(String, i64)> = r + .rows + .iter() + .map(|row| (row[0].to_display_string(), expect_int(&row[1]))) + .collect(); + assert_eq!(got, vec![("eng".to_string(), 3), ("sales".to_string(), 2)]); + } + + #[test] + fn having_sum_threshold() { + let db = seed_employees(); + let r = run_rows( + &db, + "SELECT dept, SUM(salary) FROM emp GROUP BY dept HAVING SUM(salary) > 100;", + ); + assert_eq!(r.rows.len(), 1); + assert_eq!(r.rows[0][0].to_display_string(), "eng"); + assert_eq!(r.rows[0][1], Value::Integer(320)); + } + + #[test] + fn having_references_aggregate_alias() { + let db = seed_employees(); + let r = run_rows( + &db, + "SELECT dept, SUM(salary) AS total FROM emp GROUP BY dept HAVING total > 100;", + ); + assert_eq!(r.columns, vec!["dept".to_string(), "total".to_string()]); + assert_eq!(r.rows.len(), 1); + assert_eq!(r.rows[0][1], Value::Integer(320)); + } + + #[test] + fn having_aggregate_not_in_projection() { + let db = seed_employees(); + // COUNT(*) only exists in HAVING — computed via a hidden slot, + // stripped before output. + let r = run_rows( + &db, + "SELECT dept FROM emp GROUP BY dept HAVING COUNT(*) > 1;", + ); + assert_eq!(r.columns, vec!["dept".to_string()]); + let depts: Vec = r + .rows + .iter() + .map(|row| row[0].to_display_string()) + .collect(); + assert_eq!(depts, vec!["eng".to_string(), "sales".to_string()]); + } + + #[test] + fn having_group_key_not_in_projection() { + let db = seed_employees(); + // dept only exists in GROUP BY + HAVING, not the SELECT list. + let r = run_rows( + &db, + "SELECT COUNT(*) FROM emp GROUP BY dept HAVING dept = 'eng';", + ); + assert_eq!(r.columns, vec!["COUNT(*)".to_string()]); + assert_eq!(r.rows.len(), 1); + assert_eq!(r.rows[0][0], Value::Integer(3)); + } + + #[test] + fn having_compound_and_predicate() { + let db = seed_employees(); + let r = run_rows( + &db, + "SELECT dept FROM emp GROUP BY dept \ + HAVING COUNT(*) > 1 AND SUM(salary) > 100;", + ); + // eng passes both; sales passes COUNT but fails SUM (90). + assert_eq!(r.rows.len(), 1); + assert_eq!(r.rows[0][0].to_display_string(), "eng"); + } + + #[test] + fn having_composes_with_order_by_and_limit() { + let db = seed_employees(); + let r = run_rows( + &db, + "SELECT dept, COUNT(*) AS n FROM emp GROUP BY dept \ + HAVING n >= 1 ORDER BY n DESC LIMIT 2;", + ); + let got: Vec<(String, i64)> = r + .rows + .iter() + .map(|row| (row[0].to_display_string(), expect_int(&row[1]))) + .collect(); + assert_eq!(got, vec![("eng".to_string(), 3), ("sales".to_string(), 2)]); + } + + #[test] + fn having_can_exclude_every_group() { + let db = seed_employees(); + let r = run_rows( + &db, + "SELECT dept FROM emp GROUP BY dept HAVING COUNT(*) > 99;", + ); + assert_eq!(r.rows.len(), 0); + } + + #[test] + fn having_null_aggregate_collapses_to_false() { + let mut db = seed_employees(); + // mkt's only salary is NULL → SUM(salary) is NULL → NULL > 0 is + // unknown → group excluded (NULL-as-false, same as WHERE). + crate::sql::process_command( + "INSERT INTO emp (name, dept, salary) VALUES ('Zoe', 'mkt', NULL);", + &mut db, + ) + .unwrap(); + let r = run_rows( + &db, + "SELECT dept FROM emp GROUP BY dept HAVING SUM(salary) > 0;", + ); + let depts: Vec = r + .rows + .iter() + .map(|row| row[0].to_display_string()) + .collect(); + assert_eq!( + depts, + vec!["eng".to_string(), "sales".to_string(), "ops".to_string()], + "mkt (all-NULL salaries) must be filtered out" + ); + } + + #[test] + fn having_lowercase_function_form_matches() { + let db = seed_employees(); + let r = run_rows( + &db, + "SELECT dept FROM emp GROUP BY dept HAVING count(*) > 1;", + ); + assert_eq!(r.rows.len(), 2); + } + + #[test] + fn having_without_group_by_is_rejected() { + let mut db = seed_employees(); + let err = + crate::sql::process_command("SELECT COUNT(*) FROM emp HAVING COUNT(*) > 0;", &mut db); + match err { + Err(SQLRiteError::NotImplemented(msg)) => assert!( + msg.contains("HAVING without GROUP BY"), + "unexpected message: {msg}" + ), + other => panic!("expected NotImplemented, got {other:?}"), + } + } + + #[test] + fn having_unknown_column_is_rejected() { + let mut db = seed_employees(); + // `name` is neither a GROUP BY key nor an aggregate — typed error, + // not a silent NULL like the legacy single-table WHERE leniency. + let err = crate::sql::process_command( + "SELECT dept, COUNT(*) FROM emp GROUP BY dept HAVING name = 'Alice';", + &mut db, + ); + match err { + Err(e) => { + let msg = e.to_string(); + assert!( + msg.contains("HAVING references"), + "unexpected message: {msg}" + ); + } + Ok(_) => panic!("HAVING on an out-of-scope column must error"), + } + } + + #[test] + fn having_over_join_rejected_for_all_flavors() { + // Aggregates / GROUP BY over JOIN results are a separate increment + // (SQLR-6); until it lands, every JOIN flavor + GROUP BY + HAVING + // must surface the clean NotImplemented — no wrong results. + for flavor in ["INNER", "LEFT OUTER", "RIGHT OUTER", "FULL OUTER"] { + let sql = format!( + "SELECT customers.name, COUNT(*) FROM customers \ + {flavor} JOIN orders ON customers.id = orders.customer_id \ + GROUP BY customers.name HAVING COUNT(*) > 1;" + ); + let err = crate::sql::process_command(&sql, &mut seed_join_fixture()); + match err { + Err(SQLRiteError::NotImplemented(msg)) => { + assert!(msg.contains("JOIN"), "{flavor}: unexpected message: {msg}") + } + other => panic!("{flavor}: expected NotImplemented, got {other:?}"), + } + } + } + + /// Helper: unwrap an integer `Value` in HAVING tests. + fn expect_int(v: &Value) -> i64 { + match v { + Value::Integer(i) => *i, + other => panic!("expected integer value, got {other:?}"), + } + } + // --------------------------------------------------------------------- // SQLR-5 — JOINs (INNER / LEFT OUTER / RIGHT OUTER / FULL OUTER) // --------------------------------------------------------------------- diff --git a/src/sql/parser/select.rs b/src/sql/parser/select.rs index 23d5c4a..deabf97 100644 --- a/src/sql/parser/select.rs +++ b/src/sql/parser/select.rs @@ -27,7 +27,7 @@ impl AggregateFn { } } - fn from_name(name: &str) -> Option { + pub(crate) fn from_name(name: &str) -> Option { match name.to_ascii_lowercase().as_str() { "count" => Some(AggregateFn::Count), "sum" => Some(AggregateFn::Sum), @@ -226,6 +226,11 @@ pub struct SelectQuery { pub distinct: bool, /// `GROUP BY a, b` — bare column names. Empty = no GROUP BY. pub group_by: Vec, + /// SQLR-52 — raw sqlparser HAVING expression, evaluated by the + /// executor against each group's output row after aggregation. + /// Parser-level invariant: `Some` implies `group_by` is non-empty + /// (HAVING without GROUP BY is rejected in v0). + pub having: Option, } impl SelectQuery { @@ -272,11 +277,6 @@ impl SelectQuery { )); } }; - if having.is_some() { - return Err(SQLRiteError::NotImplemented( - "HAVING is not supported yet".to_string(), - )); - } // SQLR-3: parse GROUP BY into a list of bare column names. // GroupByExpr::Expressions(v, _) with an empty v is the "no // GROUP BY" shape; non-empty means we've got grouping. Reject @@ -309,6 +309,19 @@ impl SelectQuery { } }; + // SQLR-52 — HAVING is the post-aggregation filter, so it only + // makes sense against grouped output. SQLite allows the + // degenerate no-GROUP-BY single-group form, but the Phase 9e + // executor's grouping pipeline assumes an explicit GROUP BY; + // reject the degenerate shape rather than special-casing it. + if having.is_some() && group_by_cols.is_empty() { + return Err(SQLRiteError::NotImplemented( + "HAVING without GROUP BY is not supported in v0; use WHERE for row-level \ + filters or restructure with a subquery" + .to_string(), + )); + } + let (table_name, table_alias, joins) = extract_from_clause(from)?; let projection = parse_projection(projection)?; let order_by = parse_order_by(order_by.as_ref())?; @@ -364,6 +377,7 @@ impl SelectQuery { limit, distinct: distinct_flag, group_by: group_by_cols, + having: having.clone(), }) } } @@ -577,7 +591,7 @@ fn parse_projection_expr(expr: &Expr, alias: Option) -> Result Result { +pub(crate) fn parse_aggregate_call(func: &sqlparser::ast::Function) -> Result { // Function name: only unqualified names like COUNT(...). Qualified // names like `pkg.fn(...)` are out of scope. let name = match func.name.0.as_slice() { diff --git a/web/src/app/docs/page.tsx b/web/src/app/docs/page.tsx index 0511dda..29cc2dc 100644 --- a/web/src/app/docs/page.tsx +++ b/web/src/app/docs/page.tsx @@ -337,6 +337,7 @@ export default function DocsPage() { FROM employees{"\n"} WHERE active = TRUE{"\n"} GROUP BY dept{"\n"} + HAVING COUNT(*) > 1{"\n"} ORDER BY COUNT(*){" "} DESC; @@ -346,8 +347,10 @@ export default function DocsPage() { NOT LIKE / ILIKE use SQLite-style ASCII case folding.{" "} IN (literal-list) uses three-valued logic.{" "} - HAVING isn’t supported yet — wrap the aggregate - in a subquery once subqueries land. + HAVING filters groups after aggregation — it can + reference GROUP BY columns, aggregate aliases, or + aggregate calls directly (HAVING COUNT(*) > 1), + and requires GROUP BY.

ALTER TABLE / DROP / VACUUM

diff --git a/web/src/components/roadmap.tsx b/web/src/components/roadmap.tsx index 97bd531..0304c2d 100644 --- a/web/src/components/roadmap.tsx +++ b/web/src/components/roadmap.tsx @@ -182,7 +182,7 @@ const PHASES: Phase[] = [ desc: "Smaller, well-scoped follow-ups that slot in where they make sense — see the canonical roadmap doc for the full list.", bullets: [ - "HAVING · BETWEEN · CASE WHEN · scalar functions (LENGTH / UPPER / COALESCE / …)", + "BETWEEN · CASE WHEN · scalar functions (LENGTH / UPPER / COALESCE / …)", "Subqueries + CTEs · GROUP BY / DISTINCT over JOINs · multi-column ORDER BY · OFFSET · UNION", "Composite + expression indexes · CREATE VIEW / TRIGGER · FOREIGN KEY / CHECK", "MVCC checkpoint drain (re-enable Mvcc → Wal downgrade) · indexes under MVCC", diff --git a/web/src/components/sql-ref.tsx b/web/src/components/sql-ref.tsx index 0272e26..3bf7747 100644 --- a/web/src/components/sql-ref.tsx +++ b/web/src/components/sql-ref.tsx @@ -106,6 +106,7 @@ const SQL_REF: Row[] = [ "MIN", "MAX", "GROUP BY
", + "HAVING", ], }, { @@ -169,7 +170,7 @@ const SQL_REF: Row[] = [ const NOT_YET = [ "subqueries", "CTEs (WITH)", - "HAVING", + "HAVING without GROUP BY", "CASE WHEN", "BETWEEN", "GLOB / REGEXP",