Skip to content

fix: type scalar UDF returns as Arrow arrays#1528

Open
BharatDeva wants to merge 1 commit intoapache:mainfrom
BharatDeva:fix/udf-return-type-hints
Open

fix: type scalar UDF returns as Arrow arrays#1528
BharatDeva wants to merge 1 commit intoapache:mainfrom
BharatDeva:fix/udf-return-type-hints

Conversation

@BharatDeva
Copy link
Copy Markdown

Which issue does this PR close?

Closes #1516.

Rationale for this change

The scalar UDF typing currently binds the callable return type to pyarrow.DataType. That makes type checkers expect UDF implementations to return a data type object, even though scalar UDF callables return Arrow arrays containing values of the declared return type.

This shows up with the existing examples/python-udf.py pattern, where a function annotated as returning pa.Array is rejected by mypy.

What changes are included in this PR?

This PR updates the scalar UDF type hints in python/datafusion/user_defined.py so that:

  • the UDF callable return type is bounded to pa.Array
  • ScalarUDF.__init__ accepts a pa.Field for the resolved return field
  • the decorator helper accepts the public pa.DataType | pa.Field return-field input

Runtime behavior is unchanged.

Are these changes tested?

Yes. I ran the following local checks:

  • uv tool run ruff@0.15.1 check --config pyproject.toml python/datafusion/user_defined.py examples/python-udf.py
    • Result: All checks passed!
  • uv tool run ruff@0.15.1 format --check --config pyproject.toml python/datafusion/user_defined.py examples/python-udf.py
    • Result: 2 files already formatted
  • git diff --check
    • Result: passed with no whitespace errors

Are there any user-facing changes?

Yes, for static typing only. User-defined scalar functions that return Arrow arrays should type-check more accurately. There is no runtime API change.

LLM-generated code disclosure

This type-hint update was prepared with assistance from OpenAI Codex and manually reviewed before submission.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Type annotation requires UDFs to return a type instead of an array

1 participant