fix(dsl): avoid panic on unterminated string escape at EOF#3024
Conversation
Signed-off-by: Sai Asish Y <say.apm35@gmail.com>
|
I have read the CLA Document and I hereby sign the CLA You can retrigger this bot by commenting recheck in this Pull Request. Posted by the CLA Assistant Lite bot. |
📝 WalkthroughWalkthroughThe DSL lexer's string-escape handling now checks for end-of-input after consuming a backslash and breaks out of the escape loop instead of continuing to dispatch escape logic. A regression test verifies that lexing an unterminated string ending in a backslash returns a STRING token without panicking. ChangesLexer EOF Guard
Estimated code review effort: 1 (Trivial) | ~5 minutes Related Issues
Suggested labels: bug, lexer Suggested reviewers: tolgaOzen 🐰 A backslash trailed off into the night, 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
pkg/dsl/lexer/lexer_test.go (1)
1192-1200: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick winAlso assert on
tok.Literalto catch content-corruption regressions.This test only checks the token type, so it wouldn't catch the stray-backslash literal issue flagged in
lexer.go. Consider also assertingtok.Literal(e.g., expecting an empty string) once that fix lands.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/dsl/lexer/lexer_test.go` around lines 1192 - 1200, The Unterminated string test in lexer_test.go only verifies the token type, so it can miss literal corruption regressions from the lexer. Update the `Unterminated string ending in a backslash does not panic` test around `NewLexer` and `NextToken` to also assert `tok.Literal`, ideally expecting the corrected empty string literal, so it fails if the stray-backslash content returns again.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/dsl/lexer/lexer.go`:
- Around line 212-219: The string scanning logic in lexer.go’s string-reading
path is still re-adding the trailing backslash when EOF is reached after an
escape. Update the branch in the lexer’s string token accumulation (around the
readChar/break handling in the string literal parser) so the consumed backslash
is not included in the final token value; advance the tracking position before
breaking or otherwise exclude that segment from the final append. Verify the fix
in the string token builder used by the lexer so an input ending with a lone
backslash does not return a STRING literal containing a raw backslash.
---
Nitpick comments:
In `@pkg/dsl/lexer/lexer_test.go`:
- Around line 1192-1200: The Unterminated string test in lexer_test.go only
verifies the token type, so it can miss literal corruption regressions from the
lexer. Update the `Unterminated string ending in a backslash does not panic`
test around `NewLexer` and `NextToken` to also assert `tok.Literal`, ideally
expecting the corrected empty string literal, so it fails if the stray-backslash
content returns again.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 91c011a7-5866-4a07-8421-abf372f9a378
📒 Files selected for processing (2)
pkg/dsl/lexer/lexer.gopkg/dsl/lexer/lexer_test.go
| if l.ch == '\\' { | ||
| str += l.input[position:l.position] | ||
| l.readChar() // Skip the backslash | ||
| if l.ch == 0 { | ||
| // Backslash at end of input (unterminated escape); stop before | ||
| // position runs past the input. | ||
| break | ||
| } |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Stray backslash leaks into token literal on EOF-after-backslash.
The panic is fixed, but position isn't advanced before the new break, so the final str += l.input[position:l.position] (line 236) re-includes the just-consumed backslash character. For input "\, the returned STRING literal ends up containing a raw \ instead of being clean/empty, silently corrupting the parsed value instead of surfacing malformed input.
🐛 Proposed fix
l.readChar() // Skip the backslash
if l.ch == 0 {
// Backslash at end of input (unterminated escape); stop before
// position runs past the input.
+ position = l.position
break
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if l.ch == '\\' { | |
| str += l.input[position:l.position] | |
| l.readChar() // Skip the backslash | |
| if l.ch == 0 { | |
| // Backslash at end of input (unterminated escape); stop before | |
| // position runs past the input. | |
| break | |
| } | |
| if l.ch == '\\' { | |
| str += l.input[position:l.position] | |
| l.readChar() // Skip the backslash | |
| if l.ch == 0 { | |
| // Backslash at end of input (unterminated escape); stop before | |
| // position runs past the input. | |
| position = l.position | |
| break | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/dsl/lexer/lexer.go` around lines 212 - 219, The string scanning logic in
lexer.go’s string-reading path is still re-adding the trailing backslash when
EOF is reached after an escape. Update the branch in the lexer’s string token
accumulation (around the readChar/break handling in the string literal parser)
so the consumed backslash is not included in the final token value; advance the
tracking position before breaking or otherwise exclude that segment from the
final append. Verify the fix in the string token builder used by the lexer so an
input ending with a lone backslash does not return a STRING literal containing a
raw backslash.
Fixes #3004.
When the schema lexer reaches a backslash that is the final character of the input (an unterminated escape),
readCharsetsl.chto 0 at EOF andpositionis then advanced past the end of the input, so the trailingstr += l.input[position:l.position]slices out of bounds. The smallest trigger is a quote followed by a backslash ("\), reached from SchemaWrite viaParse(). This breaks out of the escape branch when EOF is hit beforepositionovershoots, so the malformed string lexes as a normal STRING token instead of panicking. Added a lexer test that panics without the guard.Summary by CodeRabbit