Skip to content

Add VECTOR and BSON data types support#16

Open
okramarenko wants to merge 13 commits intoadd10supportfrom
add_vector
Open

Add VECTOR and BSON data types support#16
okramarenko wants to merge 13 commits intoadd10supportfrom
add_vector

Conversation

@okramarenko
Copy link
Copy Markdown
Collaborator

@okramarenko okramarenko commented Apr 28, 2026

Note

Medium Risk
Adds new protocol metadata parsing and session initialization for extended data types, and changes parameter/bulk copy serialization paths; mistakes could surface as incorrect type mapping or runtime format errors when reading/writing binary payloads.

Overview
Adds end-to-end support for SingleStore extended data types VECTOR and BSON when enable_extended_types_metadata is enabled (SingleStore >= 8.5.28), including a new connection string option Enable Extended Data Types (default true) and session setup/re-setup to turn the metadata on.

Updates protocol/type mapping to parse extended column metadata (ExtendedTypeCode, vector dimensions/element type), expose new provider types SingleStoreDbType.Bson/Vector, and return vectors as typed ReadOnlyMemory<T> via new Vector*ColumnReaders (with payload length validation).

Extends write paths: parameters can now infer Vector from numeric arrays (without confusing byte[] vs sbyte[]), serialize vectors as literals for text queries and as binary for prepared statements, and bulk copy auto-generates the proper UNHEX(...):>VECTOR(...) / :>BSON expressions using schema metadata; adds extensive side-by-side tests for vector/bson querying, parameter round-trips, and bulk copy.

Reviewed by Cursor Bugbot for commit d354db9. Bugbot is set up for automated code reviews on this repo. Configure here.

Comment thread src/SingleStoreConnector/Protocol/Payloads/ColumnDefinitionPayload.cs Outdated
Comment thread src/SingleStoreConnector/Core/TypeMapper.cs Outdated
Comment thread src/SingleStoreConnector/SingleStoreParameter.cs
valueBytes.CopyTo(bytes, i * sizeof(double));
}
return bytes;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big-endian double conversion missing NET5 optimization path

Low Severity

ConvertDoublesToBytes uses the BitConverter.GetBytes/Array.Reverse fallback on big-endian platforms, but ConvertFloatsToBytes (and the corresponding VectorFloat64ColumnReader) both use BinaryPrimitives.WriteDoubleLittleEndian/ReadDoubleLittleEndian under #if NET5_0_OR_GREATER. The double converter is functionally correct but inconsistent with every other conversion method in the PR.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 64c0676. Configure here.

Comment thread src/SingleStoreConnector/Core/ServerSession.cs Outdated
((column.ColumnFlags & ColumnFlags.Blob) != 0 || column.ColumnType is ColumnType.TinyBlob or ColumnType.Blob or ColumnType.MediumBlob or ColumnType.LongBlob);
IsLong = mySqlDbType != SingleStoreDbType.Vector &&
column.ColumnLength > 255 &&
((column.ColumnFlags & ColumnFlags.Blob) != 0 || column.ColumnType is ColumnType.TinyBlob or ColumnType.Blob or ColumnType.MediumBlob or ColumnType.LongBlob);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BSON column not excluded from IsLong like Vector

Low Severity

The IsLong calculation explicitly excludes SingleStoreDbType.Vector but does not exclude SingleStoreDbType.Bson. Since BSON is transported as ColumnType.Blob with columnSize: uint.MaxValue, it will have column.ColumnLength > 255 and the Blob flag set, causing IsLong to be true. If the intent was to treat these new extended types consistently (similar to how Vector is excluded), BSON may also need exclusion here. If BSON being IsLong = true is intentional since BSON documents can genuinely be large, this can be dismissed.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 08a4139. Configure here.

throw new FormatException(
$"Expected 5 additional bytes for VECTOR extended metadata, but only {remainingExtendedBytes} remained.");

VectorDimensions = reader.ReadUInt32();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zero VectorDimensions treated as valid instead of unspecified

Medium Severity

VectorDimensions is read as a raw uint via reader.ReadUInt32(), so if the server sends 0 to indicate "unspecified dimensions," it's stored as the non-null value 0 rather than null. This causes ValidateLength to expect exactly 0 bytes of data (rejecting any actual vector payload with a FormatException), SingleStoreBulkCopy to generate invalid SQL like VECTOR(0, F32), and ColumnSize to be set to 0.

Additional Locations (2)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 0c9c0b5. Configure here.

/// Gets or sets the expected element type name for a VECTOR parameter (e.g., "F32", "I64").
/// Used for validation when sending VECTOR data to the server.
/// </summary>
public string? VectorElementTypeName { get; set; }
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused VectorDimensions and VectorElementTypeName on SingleStoreParameter

Low Severity

The new public properties VectorDimensions and VectorElementTypeName on SingleStoreParameter are declared and copied in the copy constructor, but never read by any serialization or validation logic. A grep across the connector source confirms they only appear in their own definition and copy constructor — they are not consumed by AppendSqlString, AppendBinary, or any other code path. The doc comments claim they are "used for validation when sending VECTOR data to the server," but no such validation exists.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit a7f39e4. Configure here.

{
await CloseAsync(changeState: true, ioBehavior ?? AsyncIOBehavior).ConfigureAwait(false);
throw;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ResetConnectionAsync loses extended data types session setting

High Severity

The public ResetConnectionAsync method sends COM_RESET_CONNECTION which resets all session variables, including enable_extended_types_metadata. Unlike the connection pool's TryResetConnectionAsync (which re-enables the setting after reset), this public method never restores it. After calling ResetConnectionAsync(), VECTOR and BSON columns silently fall back to raw binary handling, losing their typed representations.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 0bda7ee. Configure here.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes using default mode and found 1 potential issue.

There are 6 total unresolved issues (including 5 from previous reviews).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit d354db9. Configure here.

{
await CloseAsync(changeState: true, ioBehavior ?? AsyncIOBehavior).ConfigureAwait(false);
throw;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double extended-types initialization for pooled connections

Medium Severity

InitializeSessionAsync sends SET SESSION enable_extended_types_metadata = TRUE on every OpenAsync, but TryResetConnectionAsync in ServerSession already sends the same command when a pooled connection is reset. Since EnableExtendedDataTypes defaults to true, every connection open from the pool incurs an extra database round-trip that is redundant for pooled sessions. Only brand-new (non-pooled) sessions actually need the call from InitializeSessionAsync.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit d354db9. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant