Skip to content

AvroSchemaConverter silently drops typed_value when converting shredded variant schemas #3477

@nssalian

Description

@nssalian

Describe the bug, including details regarding any error messages, version, and platform.

I noticed this on the parquet-cli while checking the schema for a shredded parquet file.

Essentially, given this Parquet schema:

message table {
  required group data (VARIANT(1)) {
    required binary metadata;
    optional binary value;
    optional group typed_value {
      required group name {
        optional binary value;
        optional binary typed_value (STRING);
      }
      required group age {
        optional binary value;
        optional int32 typed_value (INTEGER(8,true));
      }
    }
  }
}

AvroSchemaConverter.convert() produces:

{
  "type": "record",
  "name": "data",
  "fields": [
    {"name": "metadata", "type": "bytes"},
    {"name": "value", "type": "bytes"}
  ]
}

typed_value is missing

Expected: The converter should convert all children of the VARIANT group (including typed_value when present)

Version: 1.18.0-SNAPSHOT

I can add a fix to this myself if there are no objections. If we want to keep this behavior, we should make a note.

Component(s)

Avro

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions