[c++] Add nullability support for data type in c++#525
[c++] Add nullability support for data type in c++#525charlesdong1991 wants to merge 7 commits intoapache:mainfrom
Conversation
leekeiabstraction
left a comment
There was a problem hiding this comment.
TY for the PR, left a question.
| nullable: true, | ||
| element_nullable: true, |
There was a problem hiding this comment.
Can you elaborate why and the impact of hardcoding nullability here? Also other places where element nullability is hardcoded to true.
There was a problem hiding this comment.
hardcoding here is on purpose actually, since the spec struct has 2 nullability concept, where nullable is the nullability of this constructed type where element_nullable is for the leaf element of an array.
and function like build_array_type_from_leaf recursively calls to construct leaf which is scaler, so element_nullable is unused.
ArrayWriter it only needs the type for encoding i believe, and putting default as true is aligned with Fluss default, which means all types are nullable unless explicitly marked.
I think probably best to add some comments in case of questions in future, WDYT? @leekeiabstraction
There was a problem hiding this comment.
I think probably best to add some comments in case of questions in future
Sounds good to me.
|
thanks for the question. i replied, let me know if further question @leekeiabstraction if find time, would be great to give a look @fresh-borzoni @leekeiabstraction 🙏 |
fresh-borzoni
left a comment
There was a problem hiding this comment.
@charlesdong1991 Ty for the PR, overall LGTM
Left some comments, PTAL
| element_data_type: i32, | ||
| element_precision: i32, | ||
| element_scale: i32, | ||
| element_nullable: bool, |
There was a problem hiding this comment.
If we split FfiDataTypeSpec to Scalar/Array, this field can go away too = just make array_nullability one entry longer (last = leaf)
There was a problem hiding this comment.
yeah, i think it's a nice idea to split as this is added a bit "ugly" for compatibility, i think splitting will be cleaner!
| bool nullable = true; | ||
| if (static_cast<size_t>(i) < flat.array_nullability.size()) { | ||
| nullable = flat.array_nullability[static_cast<size_t>(i)] != 0; | ||
| } else if (i == 0) { |
There was a problem hiding this comment.
but do we have the case for this?
There was a problem hiding this comment.
this shouldn't be the case anymore after splitting
| array_nesting: u32, | ||
| array_nullability: Vec<u8>, | ||
| nullable: bool, | ||
| element_nullable: bool, |
There was a problem hiding this comment.
is it only for backward compatibility, mb we can drop it?
There was a problem hiding this comment.
yes, it's for backward compatibility, let me check this drop option! 👍
|
thanks for great suggestion @fresh-borzoni i split the array from scaler, PTAL! |
fresh-borzoni
left a comment
There was a problem hiding this comment.
@charlesdong1991 Ty, LGTM 👍
There was a problem hiding this comment.
Nit: doc comment on structurally_compatible (lib.rs:3617-3618) says "C++ callers never control nullability explicitly" - no longer true after this PR.
Worth rewording to reflect the actual reason we ignore nullability: the Rust-side element type is always reconstructed as nullable (encoding doesn't depend on it), so comparing the bit is meaningless
There was a problem hiding this comment.
Good catch! 👍 rephrased in 62832ae @fresh-borzoni
|
@leekeiabstraction Do you have any further comments on this PR? |
Purpose
Preserve nullability when constructing Fluss schemas through the C++ binding. Previously, the C++
DataTypeclass had no nullable flag, so all types defaulted to nullable regardless of what the user specified.Linked issue: close #510
Tests
All tests have passed locally
Documentation
Docs are updated accordingly.