Add CLI for text normalization with language and preset options by Karamouche · Pull Request #18 · gladiaio/normalization

Karamouche · 2026-04-15T18:59:43Z

What does this PR do?

Adds a normalize CLI entry point backed by normalization/cli.py and normalization/__main__.py
Registers it in pyproject.toml under [project.scripts] so it is available as gladia-normalization after install and as uvx gladia-normalization without a permanent install
Documents the CLI in the README alongside the existing Python usage section

Type of change

Summary by CodeRabbit

New Features
- Added a new gladia-normalization CLI command for STT text normalization with language selection, file or stdin input, custom presets, and pipeline inspection (--describe).
- README updated with usage examples, including non-permanent execution via uvx for temporary command runs.

coderabbitai · 2026-04-15T18:59:58Z

📝 Walkthrough

Walkthrough

Adds a command-line interface for the normalization package: a new normalization.cli module with a main() entrypoint, a package __main__.py, README CLI docs, and a pyproject.toml console script entry (gladia-normalization). No library public APIs were changed.

Changes

Cohort / File(s)	Summary
Documentation `README.md`	Added CLI usage section with examples for single-text normalization, `--language`, file input (`--file`), stdin piping, `--preset` YAML usage, and `--describe` pipeline inspection; includes alternative `uvx gladia-normalization` example.
CLI implementation `normalization/cli.py`, `normalization/__main__.py`	New CLI: `normalization.cli:main()` implemented with `argparse`; enumerates languages and presets, loads pipeline via `load_pipeline(preset, language)`, selects input from positional arg / `--file` / piped stdin, supports `--describe`, and prints normalized output. `__main__.py` invokes `normalization.cli.main()`.
Packaging `pyproject.toml`	Added `[project.scripts]` entry to expose `gladia-normalization` → `normalization.cli:main`. No other metadata or code changes.

Sequence Diagram(s)

sequenceDiagram
  participant User
  participant CLI as "normalization.cli"
  participant FS as "Presets (filesystem)"
  participant LangReg as "Language registry"
  participant Loader as "load_pipeline"
  participant Pipeline as "Normalization Pipeline"

  User->>CLI: invoke command (text / --file / stdin / --describe)
  CLI->>LangReg: enumerate available languages
  CLI->>FS: list available presets (yaml)
  CLI->>Loader: load_pipeline(preset, language)
  Loader->>Pipeline: initialize pipeline
  CLI->>Pipeline: pipeline.normalize(text) / pipeline.describe()
  Pipeline-->>CLI: normalized text / description
  CLI-->>User: print result

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

lrossillon-gladia

Poem

A rabbit hops to the command-line door,
Presses keys and hears the words restore,
Presets and languages twirl in a line,
Pipelines hum—input turns fine,
Normalized text, a tidy carrot core. 🐇✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: adding a CLI entry point for text normalization with language and preset configuration options.
Description check	✅ Passed	The PR description addresses the key changes and follows the template structure with clear sections, though marked checkboxes could be more specific.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/cli

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

normalization/cli.py (1)

20-70: Add focused CLI tests for new execution paths.

Please add tests for --describe, stdin input, positional input, and error flows (no text, invalid preset). This is the main new user-facing surface and would benefit from regression coverage.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@normalization/cli.py` around lines 20 - 70, Add focused unit tests for the
CLI entrypoint main covering all execution paths: (1) describe flag -> run main
with ["--describe", "--preset", ...] or equivalent and assert
pipeline.describe() JSON is printed (mock load_pipeline to return an object with
describe()); (2) positional text -> run main with a positional argument and
assert pipeline.normalize(...) was called and printed; (3) stdin input ->
simulate non-tty stdin content and run main with no positional arg and assert
pipeline.normalize called with stdin text; (4) error flows -> simulate missing
text with stdin as tty and assert parser.error is triggered, and simulate
load_pipeline raising FileNotFoundError for invalid preset and assert
parser.error is invoked; use mocks for load_pipeline and a fake pipeline
exposing describe() and normalize() to avoid real I/O and reference the main
function, load_pipeline, pipeline.describe, pipeline.normalize, and argparse
parser behavior when implementing tests.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@normalization/cli.py`:
- Around line 31-37: The CLI currently uses argparse choices for the
"--language" / "-l" option which prevents unknown language codes from reaching
load_pipeline(), breaking its fallback-to-default behavior; decide on intended
behavior and align both sides: either remove the choices parameter in the
argparse add_argument so unknown codes are passed through and let
load_pipeline(language) handle fallback, or keep choices and update
load_pipeline() to raise/handle errors consistently; locate the argument
definition in normalization/cli.py (the add_argument call for "--language"/"-l")
and the load_pipeline() function to implement the chosen approach and update
help text to reflect strict reject versus fallback behavior.
- Around line 53-57: The current try/except around load_pipeline only converts
FileNotFoundError to a CLI error; expand the handler to catch expected load-time
parse/validation errors (e.g. ValueError, TypeError, json.JSONDecodeError,
KeyError) so that load_pipeline(args.preset, args.language) failures are
surfaced via parser.error(str(exc)) instead of a traceback; keep unexpected
exceptions raising normally (or re-raise) and reference the load_pipeline call
and the existing parser.error usage when implementing the change.

In `@README.md`:
- Around line 90-94: Update the README example to include the required --from
flag so the uvx invocation points to the actual script entry point: use --from
normalize when calling uvx for the package named gladia-normalization; locate
the example that shows uvx gladia-normalization "It's $50 at 3:00PM" --language
en and change it to explicitly pass --from normalize to ensure the correct
script is executed.

---

Nitpick comments:
In `@normalization/cli.py`:
- Around line 20-70: Add focused unit tests for the CLI entrypoint main covering
all execution paths: (1) describe flag -> run main with ["--describe",
"--preset", ...] or equivalent and assert pipeline.describe() JSON is printed
(mock load_pipeline to return an object with describe()); (2) positional text ->
run main with a positional argument and assert pipeline.normalize(...) was
called and printed; (3) stdin input -> simulate non-tty stdin content and run
main with no positional arg and assert pipeline.normalize called with stdin
text; (4) error flows -> simulate missing text with stdin as tty and assert
parser.error is triggered, and simulate load_pipeline raising FileNotFoundError
for invalid preset and assert parser.error is invoked; use mocks for
load_pipeline and a fake pipeline exposing describe() and normalize() to avoid
real I/O and reference the main function, load_pipeline, pipeline.describe,
pipeline.normalize, and argparse parser behavior when implementing tests.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bb9e7a20-f611-45f3-a490-e50edbe11da9

📥 Commits

Reviewing files that changed from the base of the PR and between baec872 and 6e105fe.

📒 Files selected for processing (4)

README.md
normalization/__main__.py
normalization/cli.py
pyproject.toml

feat: add CLI for text normalization with language and preset options

6e105fe

Karamouche requested review from egenthon-cmd and lrossillon-gladia April 15, 2026 18:59

coderabbitai Bot reviewed Apr 15, 2026

View reviewed changes

Comment thread normalization/cli.py

Comment thread normalization/cli.py

Comment thread README.md

Karamouche added 3 commits April 15, 2026 15:37

feat: add support for normalizing text files via CLI

e032c2a

refactor: rename CLI command to 'gladia-normalization' for consistency

ca5947d

docs: update examples in README to avoid using $

f5288f8

Emma5099 approved these changes Apr 21, 2026

View reviewed changes

egenthon-cmd approved these changes Apr 21, 2026

View reviewed changes

Karamouche merged commit 165bd95 into main Apr 21, 2026
10 checks passed

Karamouche deleted the feat/cli branch April 21, 2026 15:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CLI for text normalization with language and preset options#18

Add CLI for text normalization with language and preset options#18
Karamouche merged 4 commits intomainfrom
feat/cli

Karamouche commented Apr 15, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Apr 15, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Karamouche commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Type of change

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Karamouche commented Apr 15, 2026 •

edited

Loading

coderabbitai Bot commented Apr 15, 2026 •

edited

Loading