Skip to content

dataset push: tabular pre-flight silently collapses duplicate header columns and accepts zero-data-row CSVs #73

@saadqbal

Description

@saadqbal

Severity: MEDIUM — two tabular CSV problems pass dry-run with no warning. Found during the #67 stress sweep.

(a) Duplicate header columns silently collapsed

printf 'a,a,label\n1,2,x\n3,4,y\n' > tab/duphdr/data.csv
tracebloc dataset push ./tab/duphdr ... --category tabular_classification --dry-run
# → "Inferred schema for 2 column(s)"  ✔   (the duplicate `a` was silently dropped)

Root cause: InferSchema() (internal/push/tabular.go:212) builds schema as a map[string]string keyed by column name, so the second a overwrites the first — a,a,label → 2 columns, with one column's data silently misaligned/dropped. No detection or warning.

(b) Header-only / zero-data-row CSV passes

printf 'a,b,label\n' > tab/headeronly/data.csv     # header, no rows
tracebloc dataset push ./tab/headeronly ... --dry-run
# → warning only about column typing; ✔ Dry-run complete

A CSV with zero data rows types every column nullable-FLOAT and passes. The "no values in the sample" warning is about typing, not "your dataset has 0 rows" — so an empty dataset gets a green check.

Expected

  • Detect repeated names in the header and error (or warn) — Error: duplicate column "a" in header; column names must be unique.
  • Detect a row count of 0 after the header and warn/error — Error: data.csv has a header but no data rows.

Part of #67.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions