[claude-generated] fix(utils): prevent remove_think_tags from truncating responses with embedded <think> tags#2900
Conversation
…embedded <think> tags The previous regex `^(<think>.*?</think>|.*</think>)` with re.DOTALL would match everything from the start of the string to the last </think> when <think> tags appeared mid-text (e.g., in retrieved chunks). This caused legitimate response content before embedded <think> blocks to be silently discarded. The new implementation: 1. Handles orphaned </think> at the start (streaming case) by only matching non-tag characters before it. 2. Removes all complete <think>...</think> blocks anywhere in the text using non-greedy matching. Fixes HKUDS#2895 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: aa7c4dcee3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
The previous regex `^[^<]*</think>` failed when orphaned reasoning content contained `<` characters (e.g. math expressions like "2 < 3" or HTML snippets). Switch to a negative-lookahead pattern that skips `<think>` specifically rather than all `<` chars. Addresses review feedback from Codex on PR HKUDS#2900. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 28421fc180
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Tests now import from lightrag.utils when dependencies are available (CI), falling back to a local mirror only when the full package can't be loaded. This ensures CI catches any drift between the test copy and the production implementation. Addresses Codex P2 review on PR HKUDS#2900. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d6019bd240
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6de7efb35d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Summary
remove_think_tagsregex inlightrag/utils.pywhich caused false truncation of LLM responses when retrieved chunks contained<think>tags (issue [Bug]:Responses are truncated when retrieved chunks contain <think> tags #2895)^(<think>.*?</think>|.*</think>)withre.DOTALLwould greedily match everything from the start of the string to the last</think>, discarding legitimate content before embedded<think>blocks</think>at the start (streaming), and (2) complete<think>...</think>blocks anywhere in the textCloses #2895
Test plan
tests/test_remove_think_tags.pypass<think>tags)</think>prefix, no think tags, think tags in middle of text, multiple blocks, multiline blocks, empty blocks🤖 Generated with Claude Code