diff --git a/README.md b/README.md index 02a7319..3d36102 100644 --- a/README.md +++ b/README.md @@ -59,6 +59,8 @@ tracebloc dataset push ./my-data \ --label-column label ``` +> **`tracebloc: command not found` after installing?** The binary installs to `~/.local/bin` when `/usr/local/bin` isn't writable, and an already-running shell won't see the new PATH entry until you open a new terminal (or `. ~/.bashrc`). See **[Troubleshooting installation](docs/troubleshooting.md)**. + What that runs under the curtain: 1. Reads kubeconfig, discovers the parent `tracebloc/client` release in the cluster diff --git a/docs/troubleshooting.md b/docs/troubleshooting.md new file mode 100644 index 0000000..272a7ff --- /dev/null +++ b/docs/troubleshooting.md @@ -0,0 +1,110 @@ +# Troubleshooting + +## `tracebloc: command not found` after install + +The installer downloads a single binary and drops it in a `bin` directory. If +that directory isn't on your shell's `PATH`, the `tracebloc` command won't be +found even though the binary is installed. + +### Why it happens + +`install.sh` installs to `/usr/local/bin` when that's writable. The usual +unprivileged one-liner (`curl … | sh`) **can't** write there, so it falls back +to **`$HOME/.local/bin`** — which isn't on `PATH` in many setups. + +The installer adds `$HOME/.local/bin` to the rc file your shell actually reads +(`~/.bashrc`, `~/.zshrc`, `~/.bash_profile` on macOS, `~/.config/fish/config.fish`, +or `~/.profile`), but a shell that's **already running** won't see the change — +you need a new shell, or to re-load the rc file. + +On Linux desktops there's an extra trap: the stock `~/.profile` only adds +`~/.local/bin` to `PATH` **at login**, and only if the directory already existed +at login time. The installer creates it mid-session, and a new terminal window +is an interactive *non-login* shell that reads `~/.bashrc` (not `~/.profile`) — +so "open a new terminal" alone never triggers the `~/.profile` logic. That's why +the installer writes to `~/.bashrc` directly. + +### Fix it + +**1. Confirm it's only a PATH problem** — run the binary by its full path: + +```sh +~/.local/bin/tracebloc version # or: /usr/local/bin/tracebloc version +``` + +If that prints a version, the install is fine and this is purely `PATH`. + +**2. Pick up the PATH entry the installer added** — open a **new** terminal, or +re-load your rc file in the current one: + +```sh +. ~/.bashrc # bash on Linux +. ~/.zshrc # zsh (macOS default) +. ~/.bash_profile # bash on macOS +``` + +Then: + +```sh +tracebloc version +``` + +**3. If it's still not found**, check which shell you're in and that the entry +landed in the matching rc file: + +```sh +echo "$SHELL" +grep -n '.local/bin' ~/.bashrc ~/.zshrc ~/.bash_profile ~/.profile 2>/dev/null +``` + +If the line is in the wrong file (e.g. `~/.bashrc` while you actually run zsh, or +`~/.bashrc` on macOS where Terminal opens a login shell that reads +`~/.bash_profile`), add it to the right one: + +```sh +echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.zshrc # adjust file to your shell +. ~/.zshrc +``` + +### Cleanest alternative: install where PATH already points + +`/usr/local/bin` is on `PATH` for both login and non-login shells out of the box +on Linux and macOS. Installing there sidesteps the whole rc/login-shell question: + +```sh +# Move the binary you already have: +sudo mv ~/.local/bin/tracebloc /usr/local/bin/ + +# …or re-run the installer with write access to /usr/local/bin: +curl -fsSL https://github.com/tracebloc/cli/releases/latest/download/install.sh | sudo sh +``` + +### Windows (PowerShell) + +`install.ps1` installs to `%LOCALAPPDATA%\Programs\tracebloc` and adds it to your +**user** `PATH` automatically. The current PowerShell session won't refresh — +**open a new PowerShell window**, then: + +```powershell +tracebloc version +``` + +If it's still missing, confirm the entry and check nothing at Process/Machine +scope overrides it: + +```powershell +[Environment]::GetEnvironmentVariable('Path','User') -split ';' | Select-String tracebloc +``` + +## Server / SSH sessions + +Each SSH login is a *login* shell, so it reads `~/.profile` (or `~/.bash_profile` +/ `~/.bash_login` if one exists — bash reads only the **first** that's present). +If you have a `~/.bash_profile` that doesn't source `~/.profile` or `~/.bashrc`, +the PATH entry may be skipped. Either add the `export PATH=…` line to the file +your login shell actually reads, or use the `/usr/local/bin` approach above. + +## Still stuck? + +Open an issue at [github.com/tracebloc/cli/issues](https://github.com/tracebloc/cli/issues) +with your OS, shell (`echo "$SHELL"`), and the output of `ls -l ~/.local/bin/tracebloc`. diff --git a/internal/cli/dataset.go b/internal/cli/dataset.go index 80f7eba..1b63d92 100644 --- a/internal/cli/dataset.go +++ b/internal/cli/dataset.go @@ -483,7 +483,7 @@ contributors train against it without ever seeing the raw files.`)) } a.Spec.Schema = sch } else { - sch, skipped, ierr := push.InferSchema(layout.LabelsCSV) + sch, skipped, empty, ierr := push.InferSchema(layout.LabelsCSV) if ierr != nil { return &exitError{code: 3, err: fmt.Errorf("inferring schema from CSV: %w", ierr)} } @@ -495,6 +495,11 @@ contributors train against it without ever seeing the raw files.`)) _, _ = fmt.Fprintf(out, " (skipped framework-managed column(s): %s)\n", strings.Join(skipped, ", ")) } + if len(empty) > 0 { + _, _ = fmt.Fprintf(out, + " (warning: %d column(s) had no values in the sample and were typed FLOAT (nullable): %s)\n", + len(empty), strings.Join(empty, ", ")) + } } case push.IsImage(a.Spec.Category): // keypoint_detection needs --number-of-keypoints (dataset- diff --git a/internal/push/tabular.go b/internal/push/tabular.go index 8b10e74..d8aa87d 100644 --- a/internal/push/tabular.go +++ b/internal/push/tabular.go @@ -145,8 +145,10 @@ func ParseSchema(s string) (map[string]string, error) { // InferSchema reads the CSV header and a sample of rows and infers a // column→SQL-type map: all-integer columns → INT, otherwise // all-numeric → FLOAT, otherwise VARCHAR(255). Empty cells are -// ignored when judging a column; a column with no non-empty sampled -// value falls back to VARCHAR(255). +// ignored when judging a column; a column with NO non-empty sampled +// value is typed as a nullable FLOAT (not VARCHAR — an all-NULL VARCHAR +// is exactly what the ingestor's string validator rejects) and returned +// in `empty` so the caller can warn. // // It's a convenience so customers don't hand-write a --schema for the // common case. Non-numeric specials (timestamps, dates, booleans) @@ -155,20 +157,20 @@ func ParseSchema(s string) (map[string]string, error) { // Framework-managed columns (see reservedColumns — id, data_id, …) // are skipped and returned as the second value so the caller can tell // the customer they weren't included. -func InferSchema(csvPath string) (schema map[string]string, skipped []string, err error) { +func InferSchema(csvPath string) (schema map[string]string, skipped, empty []string, err error) { f, err := os.Open(csvPath) if err != nil { - return nil, nil, err + return nil, nil, nil, err } defer func() { _ = f.Close() }() r := csv.NewReader(f) header, err := r.Read() if err != nil { - return nil, nil, fmt.Errorf("reading CSV header from %s: %w", csvPath, err) + return nil, nil, nil, fmt.Errorf("reading CSV header from %s: %w", csvPath, err) } if len(header) == 0 { - return nil, nil, fmt.Errorf("CSV %s has no columns", csvPath) + return nil, nil, nil, fmt.Errorf("CSV %s has no columns", csvPath) } // Per-column running judgement. @@ -186,7 +188,7 @@ func InferSchema(csvPath string) (schema map[string]string, skipped []string, er break } if err != nil { - return nil, nil, fmt.Errorf("reading CSV row from %s: %w", csvPath, err) + return nil, nil, nil, fmt.Errorf("reading CSV row from %s: %w", csvPath, err) } for i := 0; i < len(header) && i < len(row); i++ { v := strings.TrimSpace(row[i]) @@ -221,9 +223,19 @@ func InferSchema(csvPath string) (schema map[string]string, skipped []string, er schema[col] = "INT" case sawValue[i] && couldBeFloat[i]: schema[col] = "FLOAT" + case !sawValue[i]: + // Entirely empty in the sample (e.g. an unmeasured analyte in a + // sparse panel). It can't be typed from data; default to a + // nullable FLOAT rather than VARCHAR — a tabular feature column + // is numeric far more often than text, and an all-NULL VARCHAR + // is exactly the shape the ingestor's string validator rejects. + // Reported in `empty` so the caller can warn / the user can + // --schema-override. + schema[col] = "FLOAT" + empty = append(empty, col) default: schema[col] = "VARCHAR(255)" } } - return schema, skipped, nil + return schema, skipped, empty, nil } diff --git a/internal/push/tabular_test.go b/internal/push/tabular_test.go index 322101c..28c6d02 100644 --- a/internal/push/tabular_test.go +++ b/internal/push/tabular_test.go @@ -66,7 +66,7 @@ func TestInferSchema(t *testing.T) { csv := writeFile(t, dir, "data.csv", "count,age,price,name\n1,30,9.99,alice\n2,40,19.5,bob\n") - schema, _, err := InferSchema(csv) + schema, _, _, err := InferSchema(csv) if err != nil { t.Fatalf("InferSchema: %v", err) } @@ -83,22 +83,26 @@ func TestInferSchema(t *testing.T) { } } -// TestInferSchema_EmptyColumnIsVarchar: a column with no non-empty -// sampled value can't be typed, so it falls back to VARCHAR(255) -// rather than being mislabeled INT/FLOAT. -func TestInferSchema_EmptyColumnIsVarchar(t *testing.T) { +// TestInferSchema_EmptyColumnIsFloat: a column with no non-empty sampled +// value can't be typed from data; it's returned as a nullable FLOAT (not +// VARCHAR — an all-NULL VARCHAR is what the ingestor's string validator +// rejects) and reported in the `empty` return so the caller can warn. +func TestInferSchema_EmptyColumnIsFloat(t *testing.T) { dir := t.TempDir() csv := writeFile(t, dir, "data.csv", "filled,empty\n1,\n2,\n") - schema, _, err := InferSchema(csv) + schema, _, empty, err := InferSchema(csv) if err != nil { t.Fatalf("InferSchema: %v", err) } - if schema["empty"] != "VARCHAR(255)" { - t.Errorf("schema[empty] = %q, want VARCHAR(255)", schema["empty"]) + if schema["empty"] != "FLOAT" { + t.Errorf("schema[empty] = %q, want FLOAT", schema["empty"]) } if schema["filled"] != "INT" { t.Errorf("schema[filled] = %q, want INT", schema["filled"]) } + if len(empty) != 1 || empty[0] != "empty" { + t.Errorf("empty = %v, want [empty]", empty) + } } // TestInferSchema_SkipsReservedColumns: a CSV with an `id` (or other @@ -109,7 +113,7 @@ func TestInferSchema_SkipsReservedColumns(t *testing.T) { dir := t.TempDir() csv := writeFile(t, dir, "data.csv", "id,feature_00,label\n1,1.5,0\n2,2.5,1\n") - schema, skipped, err := InferSchema(csv) + schema, skipped, _, err := InferSchema(csv) if err != nil { t.Fatalf("InferSchema: %v", err) } diff --git a/scripts/install.sh b/scripts/install.sh index 88f08b6..1af70f5 100755 --- a/scripts/install.sh +++ b/scripts/install.sh @@ -259,16 +259,91 @@ echo "Verify with:" echo " $PREFIX/$BINARY_NAME version" echo "" -# PATH guidance for the fallback case. -case ":$PATH:" in - *":$PREFIX:"*) ;; # already on PATH - *) - echo "Note: $PREFIX is not on \$PATH. Add this to your shell rc file:" - echo " export PATH=\"\$PATH:$PREFIX\"" - echo "" - ;; +# PATH handling. install.ps1 persists the PATH entry on Windows +# (SetEnvironmentVariable, User scope) — do the same on Unix by writing to +# the rc file the user's shell actually reads. The old print-only advice +# silently failed on Ubuntu: ~/.profile adds ~/.local/bin only at *login* +# and only if it already existed, but the installer creates it mid-session, +# so a new (non-login) terminal reading ~/.bashrc never picks it up. +# +# Decide whether to persist a PATH entry to the user's shell rc: +# - a user-local prefix (under $HOME — the unprivileged `curl | sh` fallback) +# ALWAYS needs one, even when a one-off `export` already put it on $PATH for +# this shell (that won't survive into a new terminal); +# - a non-$HOME prefix that ISN'T on $PATH (e.g. `--prefix /opt/tracebloc`) +# also needs one; +# - a non-$HOME prefix already on $PATH (e.g. the default /usr/local/bin) is +# on PATH for every shell and needs nothing. +# ($HOME is stripped of a trailing slash first so a HOME like "/home/u/" can't +# misclassify "/home/u/.local/bin" via a "/home/u//*" pattern it won't match.) +home_dir="${HOME%/}" +persist=no +case "$PREFIX" in + "$home_dir"/*) persist=yes ;; + *) case ":$PATH:" in *":$PREFIX:"*) ;; *) persist=yes ;; esac ;; esac +if [ "$persist" = "yes" ]; then + shell_name="$(basename "${SHELL:-sh}")" + case "$shell_name" in + zsh) rc="$HOME/.zshrc" ;; + bash) + # macOS Terminal opens a login shell (reads .bash_profile); + # Linux terminals are interactive non-login (read .bashrc). + if [ "$OS" = "darwin" ]; then rc="$HOME/.bash_profile"; else rc="$HOME/.bashrc"; fi + ;; + fish) rc="$HOME/.config/fish/config.fish" ;; + *) rc="$HOME/.profile" ;; + esac + + if [ "$shell_name" = "fish" ]; then + path_line="fish_add_path $PREFIX" + else + path_line="export PATH=\"$PREFIX:\$PATH\"" + fi + + # Track three outcomes precisely so the message can neither over- nor + # under-claim: already configured / freshly added / couldn't write. + state=failed + mkdir -p "$(dirname "$rc")" 2>/dev/null || true + # Idempotency: only an actual, non-comment PATH op that references $PREFIX + # counts as "already configured" — a bare comment or an unrelated line that + # merely mentions the dir must NOT pass, or we'd claim success while a new + # shell still can't find the binary (#61). Match PATH= / PATH+= / + # fish_add_path / zsh's path+=() (case-insensitive); the [^A-Za-z_] guard + # keeps PYTHONPATH=/MYPATH= out. + if grep -v '^[[:space:]]*#' "$rc" 2>/dev/null \ + | grep -iE '(^|[^A-Za-z_])path[+]?=|fish_add_path' \ + | grep -qF "$PREFIX"; then + state=present # rc already persists it — leave it alone + # Group the append so the redirection-open error (e.g. a read-only rc, or + # an unwritable parent dir) is suppressed too: `cmd >> "$rc" 2>/dev/null` + # leaks the shell's "Permission denied" because the >> open is attempted + # before 2>/dev/null applies. Wrapping in { ... } 2>/dev/null puts the + # stderr redirect in scope first. + elif { printf '\n# Added by the tracebloc CLI installer\n%s\n' "$path_line" >> "$rc"; } 2>/dev/null; then + state=added + fi + + echo "" + case "$state" in + added) + echo "Added $PREFIX to your PATH in $rc." + echo "Open a new terminal — or load it now: . \"$rc\"" + ;; + present) + echo "$PREFIX is already in your PATH config ($rc) — nothing to add." + echo "If a new terminal can't find it yet, open one — or load it now: . \"$rc\"" + ;; + *) + echo "Note: the installer couldn't update your shell config ($rc)." + echo "Add this line to it (or your shell's startup file), then open a new terminal:" + echo " $path_line" + ;; + esac + echo "" +fi + echo "First steps:" echo " $BINARY_NAME cluster info # confirm CLI can reach your cluster" echo " $BINARY_NAME dataset push --help # see the dominant flow"