Skip to content

dataset push: --label-column is never checked against the CSV header — wrong name writes orphaned rows then fails registration #69

@saadqbal

Description

@saadqbal

Severity: HIGH — a single typo in --label-column produces a confusing partial-write failure, not a clear local error. Found during the #67 stress sweep.

Repro

CSV header is a,b,target, but the user passes --label-column label (typo / wrong name):

printf 'a,b,target\n1,2,x\n3,4,y\n' > tab/notlabel/data.csv
tracebloc dataset push ./tab/notlabel --no-input -n tracebloc-templates \
  --category tabular_classification --table qa_stress_notlabel --intent train --label-column label

Observed

  1. Local dry-run / pre-flight is green ("✔ Dry-run complete", summary shows label column: label).
  2. The ingestor's Data / Table Name / Duplicate validators all pass ("All validations passed successfully").
  3. Then during ingestion the rows are written to MySQL, and registration fails:
WARNING Specified label_column 'label' not found in record
ERROR Error sending batch to API: HTTP 400: [{"label":["This field may not be null."]}]
Error during ingestion: Backend rejected edge-label metadata; the dataset was NOT registered (its rows are already in the database).
Error: ingestion Job exited non-zero — see logs above   # exit 9

The failure message itself admits "its rows are already in the database" — i.e. an orphaned/partial write. (dataset rm does recover it, but the user has to know that.)

Root cause

internal/push/spec.go sets spec["label"] = a.LabelColumn (lines 254/296/299) with no validation that the column exists. InferSchema() in internal/push/tabular.go already reads the header — the label-column membership check is one comparison away and free.

Expected

During pre-flight (the step that already reads the CSV header), error if --label-column is not present in the header, e.g.:
Error: label column "label" not found in data.csv. Columns are: a, b, target.
For image categories the label lives in labels.csv — same check applies there.

Part of #67.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions