Skip to content

[PySpark] - Add partial support for window function#318

Open
mariotaddeucci wants to merge 5 commits into
duckdb:mainfrom
mariotaddeucci:feat/pyspark-window-function
Open

[PySpark] - Add partial support for window function#318
mariotaddeucci wants to merge 5 commits into
duckdb:mainfrom
mariotaddeucci:feat/pyspark-window-function

Conversation

@mariotaddeucci
Copy link
Copy Markdown
Contributor

@mariotaddeucci mariotaddeucci commented Feb 16, 2026

Initial support for window functions by introducing a PySpark-like Window API and a WindowSpec class.

The implementation is partial: partitioning, basic ordering-by-names, and frame specification (rowsBetween / rangeBetween) are supported and converted to SQL window clauses, but extracting ordering direction from Column expressions is not yet implemented (see the TODO in WindowSpec._columns_as_str). Using Column expressions in orderBy currently raises a ContributionsAcceptedError.

Additionally, a set of functions was added for convenient use with .over(window):

  • row_number
  • rank
  • dense_rank
  • cume_dist
  • percent_rank
  • lag
  • lead
  • nth_value

@mariotaddeucci mariotaddeucci changed the title [PySpark] - Add window function support [wip][PySpark] - Add window function support Feb 17, 2026
@mariotaddeucci
Copy link
Copy Markdown
Contributor Author

I don't know how but the next step is to implement proper handling of Column expressions in ordering (including asc/desc).

@mariotaddeucci mariotaddeucci marked this pull request as ready for review February 17, 2026 15:23
@mariotaddeucci mariotaddeucci changed the title [wip][PySpark] - Add window function support [PySpark] - Add partial support for window function Feb 17, 2026
@evertlammerts evertlammerts force-pushed the feat/pyspark-window-function branch from 06feddd to 9718dcd Compare March 18, 2026 17:04
Copilot AI review requested due to automatic review settings March 18, 2026 17:04
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review is ineligible. To be eligible to request a review, you need a paid Copilot license, or your organization must enable Copilot code review.

@aaron-ang
Copy link
Copy Markdown

aaron-ang commented May 11, 2026

Hi @binste I would love to see this land. Thanks for the foundation here.

If you're not actively working on it, happy to pick it up. I can:

  1. rebase onto current main
  2. resolve the conflict in functions.py (a few new aggregate funcs landed adjacent to this block)
  3. finish the TODO in WindowSpec._columns_as_str so orderBy(col.desc()) / orderBy(col.asc()) works instead of raising ContributionsAcceptedError.

The plan is to introspect the underlying Expression produced by Column.desc() / Column.asc() and emit the matching DESC / ASC (and NULLS FIRST/LAST for *_nulls_* variants) into the SQL OVER() clause.

@evertlammerts: is this PR still on the roadmap? If a fresh take is welcome, I can put up a continuation PR crediting the original author.

@binste
Copy link
Copy Markdown

binste commented May 11, 2026

Hi @aaron-ang ! Maybe a mix-up, I think I was never involved in this PR (and am also not a duckdb maintainer) but fully agree that it would be great to see this land. Good luck and thanks for picking it up!

@aaron-ang
Copy link
Copy Markdown

aaron-ang commented May 11, 2026

hi @binste, apologies for the mixup. i meant to tag @mariotaddeucci.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants