Severity: HIGH — a single typo in --label-column produces a confusing partial-write failure, not a clear local error. Found during the #67 stress sweep.
Repro
CSV header is a,b,target, but the user passes --label-column label (typo / wrong name):
printf 'a,b,target\n1,2,x\n3,4,y\n' > tab/notlabel/data.csv
tracebloc dataset push ./tab/notlabel --no-input -n tracebloc-templates \
--category tabular_classification --table qa_stress_notlabel --intent train --label-column label
Observed
- Local dry-run / pre-flight is green ("✔ Dry-run complete", summary shows
label column: label).
- The ingestor's Data / Table Name / Duplicate validators all pass ("All validations passed successfully").
- Then during ingestion the rows are written to MySQL, and registration fails:
WARNING Specified label_column 'label' not found in record
ERROR Error sending batch to API: HTTP 400: [{"label":["This field may not be null."]}]
Error during ingestion: Backend rejected edge-label metadata; the dataset was NOT registered (its rows are already in the database).
Error: ingestion Job exited non-zero — see logs above # exit 9
The failure message itself admits "its rows are already in the database" — i.e. an orphaned/partial write. (dataset rm does recover it, but the user has to know that.)
Root cause
internal/push/spec.go sets spec["label"] = a.LabelColumn (lines 254/296/299) with no validation that the column exists. InferSchema() in internal/push/tabular.go already reads the header — the label-column membership check is one comparison away and free.
Expected
During pre-flight (the step that already reads the CSV header), error if --label-column is not present in the header, e.g.:
Error: label column "label" not found in data.csv. Columns are: a, b, target.
For image categories the label lives in labels.csv — same check applies there.
Part of #67.
Severity: HIGH — a single typo in
--label-columnproduces a confusing partial-write failure, not a clear local error. Found during the #67 stress sweep.Repro
CSV header is
a,b,target, but the user passes--label-column label(typo / wrong name):Observed
label column: label).The failure message itself admits "its rows are already in the database" — i.e. an orphaned/partial write. (
dataset rmdoes recover it, but the user has to know that.)Root cause
internal/push/spec.gosetsspec["label"] = a.LabelColumn(lines 254/296/299) with no validation that the column exists.InferSchema()ininternal/push/tabular.goalready reads the header — the label-column membership check is one comparison away and free.Expected
During pre-flight (the step that already reads the CSV header), error if
--label-columnis not present in the header, e.g.:Error: label column "label" not found in data.csv. Columns are: a, b, target.For image categories the label lives in
labels.csv— same check applies there.Part of #67.