Skip to content

[claude-generated] fix(utils): prevent remove_think_tags from truncating responses with embedded <think> tags#2900

Merged
danielaskdd merged 4 commits intoHKUDS:mainfrom
sjhddh:fix/remove-think-tags-false-positive
Apr 11, 2026
Merged

[claude-generated] fix(utils): prevent remove_think_tags from truncating responses with embedded <think> tags#2900
danielaskdd merged 4 commits intoHKUDS:mainfrom
sjhddh:fix/remove-think-tags-false-positive

Conversation

@sjhddh
Copy link
Copy Markdown
Contributor

@sjhddh sjhddh commented Apr 5, 2026

Summary

  • Fixes the remove_think_tags regex in lightrag/utils.py which caused false truncation of LLM responses when retrieved chunks contained <think> tags (issue [Bug]:Responses are truncated when retrieved chunks contain <think> tags #2895)
  • The old regex ^(<think>.*?</think>|.*</think>) with re.DOTALL would greedily match everything from the start of the string to the last </think>, discarding legitimate content before embedded <think> blocks
  • The new implementation handles two cases separately: (1) orphaned </think> at the start (streaming), and (2) complete <think>...</think> blocks anywhere in the text
  • Added 10 unit tests covering normal usage, the bug scenario, edge cases, and multiline content

Closes #2895

Test plan

  • All 10 new tests in tests/test_remove_think_tags.py pass
  • Verified old regex fails on the reported bug case (mid-text <think> tags)
  • Verified new regex correctly handles: standard think blocks at start, orphaned </think> prefix, no think tags, think tags in middle of text, multiple blocks, multiline blocks, empty blocks

🤖 Generated with Claude Code

…embedded <think> tags

The previous regex `^(<think>.*?</think>|.*</think>)` with re.DOTALL
would match everything from the start of the string to the last </think>
when <think> tags appeared mid-text (e.g., in retrieved chunks). This
caused legitimate response content before embedded <think> blocks to be
silently discarded.

The new implementation:
1. Handles orphaned </think> at the start (streaming case) by only
   matching non-tag characters before it.
2. Removes all complete <think>...</think> blocks anywhere in the text
   using non-greedy matching.

Fixes HKUDS#2895

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@danielaskdd
Copy link
Copy Markdown
Collaborator

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: aa7c4dcee3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread lightrag/utils.py Outdated
The previous regex `^[^<]*</think>` failed when orphaned reasoning
content contained `<` characters (e.g. math expressions like "2 < 3"
or HTML snippets). Switch to a negative-lookahead pattern that skips
`<think>` specifically rather than all `<` chars.

Addresses review feedback from Codex on PR HKUDS#2900.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@danielaskdd
Copy link
Copy Markdown
Collaborator

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 28421fc180

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tests/test_remove_think_tags.py Outdated
Tests now import from lightrag.utils when dependencies are available
(CI), falling back to a local mirror only when the full package can't
be loaded. This ensures CI catches any drift between the test copy
and the production implementation.

Addresses Codex P2 review on PR HKUDS#2900.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@danielaskdd
Copy link
Copy Markdown
Collaborator

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d6019bd240

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread lightrag/utils.py
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@danielaskdd
Copy link
Copy Markdown
Collaborator

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6de7efb35d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tests/test_remove_think_tags.py
@danielaskdd danielaskdd merged commit e5f4061 into HKUDS:main Apr 11, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]:Responses are truncated when retrieved chunks contain <think> tags

2 participants