Skip to content

Add CLI for text normalization with language and preset options#18

Merged
Karamouche merged 4 commits intomainfrom
feat/cli
Apr 21, 2026
Merged

Add CLI for text normalization with language and preset options#18
Karamouche merged 4 commits intomainfrom
feat/cli

Conversation

@Karamouche
Copy link
Copy Markdown
Collaborator

@Karamouche Karamouche commented Apr 15, 2026

What does this PR do?

  • Adds a normalize CLI entry point backed by normalization/cli.py and normalization/__main__.py
  • Registers it in pyproject.toml under [project.scripts] so it is available as gladia-normalization after install and as uvx gladia-normalization without a permanent install
  • Documents the CLI in the README alongside the existing Python usage section

Type of change

  • New language
  • Edit existing language (fix a replacement, tweak config, …)
  • New normalization step
  • Edit existing step (bug fix, behaviour change)
  • New preset version
  • Bug fix (other)
  • Refactor / docs / CI
  • Other

Summary by CodeRabbit

  • New Features
    • Added a new gladia-normalization CLI command for STT text normalization with language selection, file or stdin input, custom presets, and pipeline inspection (--describe).
    • README updated with usage examples, including non-permanent execution via uvx for temporary command runs.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 15, 2026

📝 Walkthrough

Walkthrough

Adds a command-line interface for the normalization package: a new normalization.cli module with a main() entrypoint, a package __main__.py, README CLI docs, and a pyproject.toml console script entry (gladia-normalization). No library public APIs were changed.

Changes

Cohort / File(s) Summary
Documentation
README.md
Added CLI usage section with examples for single-text normalization, --language, file input (--file), stdin piping, --preset YAML usage, and --describe pipeline inspection; includes alternative uvx gladia-normalization example.
CLI implementation
normalization/cli.py, normalization/__main__.py
New CLI: normalization.cli:main() implemented with argparse; enumerates languages and presets, loads pipeline via load_pipeline(preset, language), selects input from positional arg / --file / piped stdin, supports --describe, and prints normalized output. __main__.py invokes normalization.cli.main().
Packaging
pyproject.toml
Added [project.scripts] entry to expose gladia-normalizationnormalization.cli:main. No other metadata or code changes.

Sequence Diagram(s)

sequenceDiagram
  participant User
  participant CLI as "normalization.cli"
  participant FS as "Presets (filesystem)"
  participant LangReg as "Language registry"
  participant Loader as "load_pipeline"
  participant Pipeline as "Normalization Pipeline"

  User->>CLI: invoke command (text / --file / stdin / --describe)
  CLI->>LangReg: enumerate available languages
  CLI->>FS: list available presets (yaml)
  CLI->>Loader: load_pipeline(preset, language)
  Loader->>Pipeline: initialize pipeline
  CLI->>Pipeline: pipeline.normalize(text) / pipeline.describe()
  Pipeline-->>CLI: normalized text / description
  CLI-->>User: print result
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • lrossillon-gladia

Poem

A rabbit hops to the command-line door,
Presses keys and hears the words restore,
Presets and languages twirl in a line,
Pipelines hum—input turns fine,
Normalized text, a tidy carrot core. 🐇✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: adding a CLI entry point for text normalization with language and preset configuration options.
Description check ✅ Passed The PR description addresses the key changes and follows the template structure with clear sections, though marked checkboxes could be more specific.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/cli

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
normalization/cli.py (1)

20-70: Add focused CLI tests for new execution paths.

Please add tests for --describe, stdin input, positional input, and error flows (no text, invalid preset). This is the main new user-facing surface and would benefit from regression coverage.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@normalization/cli.py` around lines 20 - 70, Add focused unit tests for the
CLI entrypoint main covering all execution paths: (1) describe flag -> run main
with ["--describe", "--preset", ...] or equivalent and assert
pipeline.describe() JSON is printed (mock load_pipeline to return an object with
describe()); (2) positional text -> run main with a positional argument and
assert pipeline.normalize(...) was called and printed; (3) stdin input ->
simulate non-tty stdin content and run main with no positional arg and assert
pipeline.normalize called with stdin text; (4) error flows -> simulate missing
text with stdin as tty and assert parser.error is triggered, and simulate
load_pipeline raising FileNotFoundError for invalid preset and assert
parser.error is invoked; use mocks for load_pipeline and a fake pipeline
exposing describe() and normalize() to avoid real I/O and reference the main
function, load_pipeline, pipeline.describe, pipeline.normalize, and argparse
parser behavior when implementing tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@normalization/cli.py`:
- Around line 31-37: The CLI currently uses argparse choices for the
"--language" / "-l" option which prevents unknown language codes from reaching
load_pipeline(), breaking its fallback-to-default behavior; decide on intended
behavior and align both sides: either remove the choices parameter in the
argparse add_argument so unknown codes are passed through and let
load_pipeline(language) handle fallback, or keep choices and update
load_pipeline() to raise/handle errors consistently; locate the argument
definition in normalization/cli.py (the add_argument call for "--language"/"-l")
and the load_pipeline() function to implement the chosen approach and update
help text to reflect strict reject versus fallback behavior.
- Around line 53-57: The current try/except around load_pipeline only converts
FileNotFoundError to a CLI error; expand the handler to catch expected load-time
parse/validation errors (e.g. ValueError, TypeError, json.JSONDecodeError,
KeyError) so that load_pipeline(args.preset, args.language) failures are
surfaced via parser.error(str(exc)) instead of a traceback; keep unexpected
exceptions raising normally (or re-raise) and reference the load_pipeline call
and the existing parser.error usage when implementing the change.

In `@README.md`:
- Around line 90-94: Update the README example to include the required --from
flag so the uvx invocation points to the actual script entry point: use --from
normalize when calling uvx for the package named gladia-normalization; locate
the example that shows uvx gladia-normalization "It's $50 at 3:00PM" --language
en and change it to explicitly pass --from normalize to ensure the correct
script is executed.

---

Nitpick comments:
In `@normalization/cli.py`:
- Around line 20-70: Add focused unit tests for the CLI entrypoint main covering
all execution paths: (1) describe flag -> run main with ["--describe",
"--preset", ...] or equivalent and assert pipeline.describe() JSON is printed
(mock load_pipeline to return an object with describe()); (2) positional text ->
run main with a positional argument and assert pipeline.normalize(...) was
called and printed; (3) stdin input -> simulate non-tty stdin content and run
main with no positional arg and assert pipeline.normalize called with stdin
text; (4) error flows -> simulate missing text with stdin as tty and assert
parser.error is triggered, and simulate load_pipeline raising FileNotFoundError
for invalid preset and assert parser.error is invoked; use mocks for
load_pipeline and a fake pipeline exposing describe() and normalize() to avoid
real I/O and reference the main function, load_pipeline, pipeline.describe,
pipeline.normalize, and argparse parser behavior when implementing tests.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bb9e7a20-f611-45f3-a490-e50edbe11da9

📥 Commits

Reviewing files that changed from the base of the PR and between baec872 and 6e105fe.

📒 Files selected for processing (4)
  • README.md
  • normalization/__main__.py
  • normalization/cli.py
  • pyproject.toml

Comment thread normalization/cli.py
Comment thread normalization/cli.py
Comment thread README.md
@Karamouche Karamouche merged commit 165bd95 into main Apr 21, 2026
10 checks passed
@Karamouche Karamouche deleted the feat/cli branch April 21, 2026 15:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants