Add CLI for text normalization with language and preset options#18
Add CLI for text normalization with language and preset options#18Karamouche merged 4 commits intomainfrom
Conversation
📝 WalkthroughWalkthroughAdds a command-line interface for the normalization package: a new Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant CLI as "normalization.cli"
participant FS as "Presets (filesystem)"
participant LangReg as "Language registry"
participant Loader as "load_pipeline"
participant Pipeline as "Normalization Pipeline"
User->>CLI: invoke command (text / --file / stdin / --describe)
CLI->>LangReg: enumerate available languages
CLI->>FS: list available presets (yaml)
CLI->>Loader: load_pipeline(preset, language)
Loader->>Pipeline: initialize pipeline
CLI->>Pipeline: pipeline.normalize(text) / pipeline.describe()
Pipeline-->>CLI: normalized text / description
CLI-->>User: print result
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
normalization/cli.py (1)
20-70: Add focused CLI tests for new execution paths.Please add tests for
--describe, stdin input, positional input, and error flows (no text, invalid preset). This is the main new user-facing surface and would benefit from regression coverage.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@normalization/cli.py` around lines 20 - 70, Add focused unit tests for the CLI entrypoint main covering all execution paths: (1) describe flag -> run main with ["--describe", "--preset", ...] or equivalent and assert pipeline.describe() JSON is printed (mock load_pipeline to return an object with describe()); (2) positional text -> run main with a positional argument and assert pipeline.normalize(...) was called and printed; (3) stdin input -> simulate non-tty stdin content and run main with no positional arg and assert pipeline.normalize called with stdin text; (4) error flows -> simulate missing text with stdin as tty and assert parser.error is triggered, and simulate load_pipeline raising FileNotFoundError for invalid preset and assert parser.error is invoked; use mocks for load_pipeline and a fake pipeline exposing describe() and normalize() to avoid real I/O and reference the main function, load_pipeline, pipeline.describe, pipeline.normalize, and argparse parser behavior when implementing tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@normalization/cli.py`:
- Around line 31-37: The CLI currently uses argparse choices for the
"--language" / "-l" option which prevents unknown language codes from reaching
load_pipeline(), breaking its fallback-to-default behavior; decide on intended
behavior and align both sides: either remove the choices parameter in the
argparse add_argument so unknown codes are passed through and let
load_pipeline(language) handle fallback, or keep choices and update
load_pipeline() to raise/handle errors consistently; locate the argument
definition in normalization/cli.py (the add_argument call for "--language"/"-l")
and the load_pipeline() function to implement the chosen approach and update
help text to reflect strict reject versus fallback behavior.
- Around line 53-57: The current try/except around load_pipeline only converts
FileNotFoundError to a CLI error; expand the handler to catch expected load-time
parse/validation errors (e.g. ValueError, TypeError, json.JSONDecodeError,
KeyError) so that load_pipeline(args.preset, args.language) failures are
surfaced via parser.error(str(exc)) instead of a traceback; keep unexpected
exceptions raising normally (or re-raise) and reference the load_pipeline call
and the existing parser.error usage when implementing the change.
In `@README.md`:
- Around line 90-94: Update the README example to include the required --from
flag so the uvx invocation points to the actual script entry point: use --from
normalize when calling uvx for the package named gladia-normalization; locate
the example that shows uvx gladia-normalization "It's $50 at 3:00PM" --language
en and change it to explicitly pass --from normalize to ensure the correct
script is executed.
---
Nitpick comments:
In `@normalization/cli.py`:
- Around line 20-70: Add focused unit tests for the CLI entrypoint main covering
all execution paths: (1) describe flag -> run main with ["--describe",
"--preset", ...] or equivalent and assert pipeline.describe() JSON is printed
(mock load_pipeline to return an object with describe()); (2) positional text ->
run main with a positional argument and assert pipeline.normalize(...) was
called and printed; (3) stdin input -> simulate non-tty stdin content and run
main with no positional arg and assert pipeline.normalize called with stdin
text; (4) error flows -> simulate missing text with stdin as tty and assert
parser.error is triggered, and simulate load_pipeline raising FileNotFoundError
for invalid preset and assert parser.error is invoked; use mocks for
load_pipeline and a fake pipeline exposing describe() and normalize() to avoid
real I/O and reference the main function, load_pipeline, pipeline.describe,
pipeline.normalize, and argparse parser behavior when implementing tests.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: bb9e7a20-f611-45f3-a490-e50edbe11da9
📒 Files selected for processing (4)
README.mdnormalization/__main__.pynormalization/cli.pypyproject.toml
What does this PR do?
normalizeCLI entry point backed bynormalization/cli.pyandnormalization/__main__.pypyproject.tomlunder[project.scripts]so it is available asgladia-normalizationafter install and asuvx gladia-normalizationwithout a permanent installType of change
Summary by CodeRabbit