Severity: MEDIUM — hits the single most common tabular source (Excel / Windows "Save as CSV UTF-8", which prepend a BOM). Found during the #67 stress sweep.
Repro
printf '\xef\xbb\xbfa,b,label\n1,2,x\n3,4,y\n' > tab/bom/data.csv # BOM + header
tracebloc dataset push ./tab/bom --no-input -n tracebloc-templates \
--category tabular_classification --table qa_bom --intent train --label-column label --dry-run
# → ✔ Dry-run complete (no warning)
Observed / root cause
InferSchema() in internal/push/tabular.go reads the header with encoding/csv (which does not strip a leading BOM) and trims each name with strings.TrimSpace. TrimSpace does not remove U+FEFF (verified: unicode.IsSpace('') == false, bytes ef bb bf survive). So the first column's inferred name becomes <name> rather than <name>.
Consequences:
Expected
Strip a leading UTF-8 BOM from the first header cell during inference (e.g. strings.TrimPrefix(col, "") on the first column, or detect+strip at file read). Excel CSVs should "just work".
Part of #67.
Severity: MEDIUM — hits the single most common tabular source (Excel / Windows "Save as CSV UTF-8", which prepend a BOM). Found during the #67 stress sweep.
Repro
Observed / root cause
InferSchema()ininternal/push/tabular.goreads the header withencoding/csv(which does not strip a leading BOM) and trims each name withstrings.TrimSpace.TrimSpacedoes not remove U+FEFF (verified:unicode.IsSpace('') == false, bytesef bb bfsurvive). So the first column's inferred name becomes<name>rather than<name>.Consequences:
--label-column <name>won't match the BOM-prefixed name (and combined with dataset push: --label-column is never checked against the CSV header — wrong name writes orphaned rows then fails registration #69, that mismatch isn't caught locally → orphaned-row failure at ingest).name).Expected
Strip a leading UTF-8 BOM from the first header cell during inference (e.g.
strings.TrimPrefix(col, "")on the first column, or detect+strip at file read). Excel CSVs should "just work".Part of #67.