Skip to content

[c++] Add nullability support for data type in c++#525

Open
charlesdong1991 wants to merge 7 commits intoapache:mainfrom
charlesdong1991:issue-510
Open

[c++] Add nullability support for data type in c++#525
charlesdong1991 wants to merge 7 commits intoapache:mainfrom
charlesdong1991:issue-510

Conversation

@charlesdong1991
Copy link
Copy Markdown
Contributor

Purpose

Preserve nullability when constructing Fluss schemas through the C++ binding. Previously, the C++ DataType class had no nullable flag, so all types defaulted to nullable regardless of what the user specified.

Linked issue: close #510

Tests

All tests have passed locally

Documentation

Docs are updated accordingly.

Copy link
Copy Markdown
Contributor

@leekeiabstraction leekeiabstraction left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TY for the PR, left a question.

Comment thread bindings/cpp/src/types.rs Outdated
Comment on lines +418 to +419
nullable: true,
element_nullable: true,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate why and the impact of hardcoding nullability here? Also other places where element nullability is hardcoded to true.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hardcoding here is on purpose actually, since the spec struct has 2 nullability concept, where nullable is the nullability of this constructed type where element_nullable is for the leaf element of an array.

and function like build_array_type_from_leaf recursively calls to construct leaf which is scaler, so element_nullable is unused.

ArrayWriter it only needs the type for encoding i believe, and putting default as true is aligned with Fluss default, which means all types are nullable unless explicitly marked.

I think probably best to add some comments in case of questions in future, WDYT? @leekeiabstraction

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think probably best to add some comments in case of questions in future

Sounds good to me.

@charlesdong1991
Copy link
Copy Markdown
Contributor Author

thanks for the question. i replied, let me know if further question @leekeiabstraction

if find time, would be great to give a look @fresh-borzoni @leekeiabstraction 🙏

Copy link
Copy Markdown
Member

@fresh-borzoni fresh-borzoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@charlesdong1991 Ty for the PR, overall LGTM
Left some comments, PTAL

Comment thread bindings/cpp/src/types.rs Outdated
Comment thread bindings/cpp/src/lib.rs Outdated
element_data_type: i32,
element_precision: i32,
element_scale: i32,
element_nullable: bool,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we split FfiDataTypeSpec to Scalar/Array, this field can go away too = just make array_nullability one entry longer (last = leaf)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, i think it's a nice idea to split as this is added a bit "ugly" for compatibility, i think splitting will be cleaner!

Comment thread bindings/cpp/src/ffi_converter.hpp Outdated
bool nullable = true;
if (static_cast<size_t>(i) < flat.array_nullability.size()) {
nullable = flat.array_nullability[static_cast<size_t>(i)] != 0;
} else if (i == 0) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but do we have the case for this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this shouldn't be the case anymore after splitting

Comment thread bindings/cpp/src/types.rs Outdated
array_nesting: u32,
array_nullability: Vec<u8>,
nullable: bool,
element_nullable: bool,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it only for backward compatibility, mb we can drop it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it's for backward compatibility, let me check this drop option! 👍

@charlesdong1991
Copy link
Copy Markdown
Contributor Author

thanks for great suggestion @fresh-borzoni i split the array from scaler, PTAL!

Copy link
Copy Markdown
Member

@fresh-borzoni fresh-borzoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@charlesdong1991 Ty, LGTM 👍

Comment thread bindings/cpp/src/lib.rs
Copy link
Copy Markdown
Member

@fresh-borzoni fresh-borzoni May 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: doc comment on structurally_compatible (lib.rs:3617-3618) says "C++ callers never control nullability explicitly" - no longer true after this PR.
Worth rewording to reflect the actual reason we ignore nullability: the Rust-side element type is always reconstructed as nullable (encoding doesn't depend on it), so comparing the bit is meaningless

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! 👍 rephrased in 62832ae @fresh-borzoni

@fresh-borzoni
Copy link
Copy Markdown
Member

@leekeiabstraction Do you have any further comments on this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[c++] Add nullability support to DataType in C++ binding

3 participants