Skip to content

feat(tools): add HugeGraph AI DeepWiki assistant#355

Open
LRriver wants to merge 13 commits into
apache:mainfrom
LRriver:deepwiki-skill
Open

feat(tools): add HugeGraph AI DeepWiki assistant#355
LRriver wants to merge 13 commits into
apache:mainfrom
LRriver:deepwiki-skill

Conversation

@LRriver
Copy link
Copy Markdown
Contributor

@LRriver LRriver commented Jun 1, 2026

Purpose

Add an optional HugeGraph AI repository knowledge assistant under tools/ai. The assistant is intended for Claude Code and Codex users who want repository-scoped Q&A for https://github.com/apache/hugegraph-ai, using DeepWiki as the online knowledge source while caching wiki contents locally for repeated context search.

Changes

  • Add tools/ai/hugegraph-ai-deepwiki-skill as a standalone installable module.
  • Include Claude and Codex plugin manifests plus marketplace manifests.
  • Add bilingual installation and usage docs: README.md and README-zh.md.
  • Add a small DeepWiki MCP client CLI for structure, contents, context, ask, and tools.
  • Keep the assistant isolated from runtime code and project dependencies.

Verification

  • python3 -m json.tool on all new JSON manifests.
  • python3 -m py_compile tools/ai/hugegraph-ai-deepwiki-skill/plugins/hugegraph-ai-deepwiki-skill/skills/hugegraph-ai-deepwiki-skill/scripts/deepwiki_mcp.py.
  • uv run --extra dev ruff format --check tools/ai/hugegraph-ai-deepwiki-skill/plugins/hugegraph-ai-deepwiki-skill/skills/hugegraph-ai-deepwiki-skill/scripts/deepwiki_mcp.py.
  • uv run --extra dev ruff check tools/ai/hugegraph-ai-deepwiki-skill/plugins/hugegraph-ai-deepwiki-skill/skills/hugegraph-ai-deepwiki-skill/scripts/deepwiki_mcp.py.
  • claude plugin validate tools/ai/hugegraph-ai-deepwiki-skill.
  • claude plugin validate tools/ai/hugegraph-ai-deepwiki-skill/plugins/hugegraph-ai-deepwiki-skill.
  • Temporary Codex install via codex plugin marketplace add and codex plugin add.
  • DeepWiki smoke tests: structure, cached context, and online ask for apache/hugegraph-ai.

Compatibility

This is an optional tool module only. It does not change HugeGraph AI runtime behavior, public APIs, package dependencies, or default configuration.

Copilot AI review requested due to automatic review settings June 1, 2026 10:57
@dosubot dosubot Bot added size:XL This PR changes 500-999 lines, ignoring generated files. enhancement New feature or request labels Jun 1, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds a new “HugeGraph AI DeepWiki” skill/plugin that can query the official DeepWiki MCP server, cache wiki contents locally, and provide repository-scoped Q&A and context search for apache/hugegraph-ai.

Changes:

  • Introduces a Python MCP client script (deepwiki_mcp.py) with commands for ask, structure, contents, context, and tools.
  • Adds repository profile mapping (references/repos.json) and agent/tool configuration (agents/openai.yaml).
  • Adds plugin packaging + documentation for Codex/Claude installs (plugin manifests, marketplace entries, READMEs, SKILL.md).

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tools/ai/hugegraph-ai-deepwiki-skill/plugins/hugegraph-ai-deepwiki-skill/skills/hugegraph-ai-deepwiki-skill/scripts/deepwiki_mcp.py Implements the MCP client, caching, and CLI workflows described in docs.
tools/ai/hugegraph-ai-deepwiki-skill/plugins/hugegraph-ai-deepwiki-skill/skills/hugegraph-ai-deepwiki-skill/references/repos.json Defines repository alias → repoName mapping used by the CLI and skill.
tools/ai/hugegraph-ai-deepwiki-skill/plugins/hugegraph-ai-deepwiki-skill/skills/hugegraph-ai-deepwiki-skill/agents/openai.yaml Declares the MCP dependency and default prompt wiring for the agent.
tools/ai/hugegraph-ai-deepwiki-skill/plugins/hugegraph-ai-deepwiki-skill/skills/hugegraph-ai-deepwiki-skill/SKILL.md Documents the intended workflow (context search first, then ask).
tools/ai/hugegraph-ai-deepwiki-skill/plugins/hugegraph-ai-deepwiki-skill/.codex-plugin/plugin.json Codex plugin manifest for distributing the skill.
tools/ai/hugegraph-ai-deepwiki-skill/plugins/hugegraph-ai-deepwiki-skill/.claude-plugin/plugin.json Claude plugin manifest for distributing the skill.
tools/ai/hugegraph-ai-deepwiki-skill/README.md Installation and usage docs (English).
tools/ai/hugegraph-ai-deepwiki-skill/README-zh.md Installation and usage docs (Chinese).
tools/ai/hugegraph-ai-deepwiki-skill/.claude-plugin/marketplace.json Marketplace entry for Claude plugin discovery.
tools/ai/hugegraph-ai-deepwiki-skill/.agents/plugins/marketplace.json Marketplace entry for agents/plugin discovery.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +137 to +145
max_seconds = float(os.environ.get("DEEPWIKI_MCP_STREAM_TIMEOUT", "120"))
deadline = time.monotonic() + max_seconds
timed_out = False

while True:
if time.monotonic() > deadline:
timed_out = True
break
raw_line = response.readline()
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 3a3e8e6.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up Python 3.9 socket timeout compatibility fix added in 8305e3d.

Comment on lines +201 to +203
try:
with urllib.request.urlopen(req, timeout=90) as response:
session_id = response.headers.get("Mcp-Session-Id")
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 3a3e8e6.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up Python 3.9 socket timeout compatibility fix added in 8305e3d.

Comment on lines +117 to +121
def write_text_atomic(path: Path, text: str) -> None:
path.parent.mkdir(parents=True, exist_ok=True)
tmp_path = path.with_suffix(path.suffix + ".tmp")
tmp_path.write_text(text, encoding="utf-8")
tmp_path.replace(path)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 3a3e8e6.

"Accept": "application/json, text/event-stream",
"Content-Type": "application/json",
"Mcp-Protocol-Version": self.protocol_version,
"User-Agent": "hugegraph-ai-deepwiki-skill/0.1.4",
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 3a3e8e6.

{
"protocolVersion": self.protocol_version,
"capabilities": {},
"clientInfo": {"name": "hugegraph-ai-deepwiki-skill", "version": "0.1.4"},
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 3a3e8e6.

Copy link
Copy Markdown
Member

@imbajin imbajin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking: yes. Summary: The cache hardening still fails recoverable cache cases and the new MCP client lacks regression tests for its failure-prone paths. Evidence: py_compile/JSON smoke passed; fake-client cache write failure reproduced a hard McpError.

return path.read_text(encoding="utf-8"), path, False

text = read_wiki_contents(client, repo_name)
write_text_atomic(path, text)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

‼️ Do not discard freshly fetched contents on cache write failure

Evidence: read_wiki_contents() runs before this write, but any OSError from write_text_atomic() is converted into McpError, so the command fails even though it already has the fresh wiki text. Impact: a broken cache directory makes context/contents unusable instead of falling back to the live result, which is the recovery path this change is trying to harden. Please return the fetched contents when only the cache write fails, and refetch when an existing cache read fails or is invalid.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. ensure_cached_contents() now refetches when an existing cache read fails, and if read_wiki_contents() succeeds but the cache write fails, it returns the fresh DeepWiki contents instead of raising McpError. Added regression coverage for both cache write fallback and invalid cache refetch.

return parsed


def read_sse_response(response: Any, expected_id: int | None) -> dict[str, Any]:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

‼️ Add regression tests for the MCP client failure paths

Evidence: this PR adds a custom MCP/SSE client with timeout handling, atomic cache writes, CLI validation, and context scoring, but no automated tests cover those behaviors. Impact: the same timeout/cache paths already needed follow-up fixes in this PR, and future changes can regress them while py_compile and smoke checks still pass. Please add focused tests for read_sse_response(), cache read/write failure fallback, and the cached-context scoring/selection behavior.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Added focused unittest coverage for read_sse_response() timeout handling, cache write fallback, invalid cache refetch, and cached-context scoring/selection. Local Ruff check/format, py_compile, and unit tests pass; the latest PR CI is also passing.

@LRriver
Copy link
Copy Markdown
Contributor Author

LRriver commented Jun 2, 2026

Blocking: yes. Summary: The cache hardening still fails recoverable cache cases and the new MCP client lacks regression tests for its failure-prone paths. Evidence: py_compile/JSON smoke passed; fake-client cache write failure reproduced a hard McpError.阻塞项:是。摘要:缓存加固仍无法处理可恢复的缓存故障场景,且新增的 MCP 客户端缺少针对其易故障路径的回归测试。证据:py_compile/JSON 冒烟测试通过;模拟客户端缓存写入失败复现了硬性 McpError 错误。

@imbajin Addressed the DeepWiki MCP review feedback in the latest deepwiki-skill commits. Summary:

  • Cache read failures now trigger a fresh read_wiki_contents() fetch.
  • Cache write failures no longer discard already fetched DeepWiki contents.
  • Added regression tests for SSE timeout handling, cache fallback/refetch, and cached-context selection.
  • Fixed Ruff format/lint issues from CI.

Local validation passed: uv run ruff check ., uv run ruff format --check ., python3 -m py_compile ..., and python3 -m unittest .../tests/test_deepwiki_mcp.py.

Latest apache/hugegraph-ai#355 CI is passing.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 11 changed files in this pull request and generated no new comments.

Copy link
Copy Markdown
Member

@imbajin imbajin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking: yes. Summary: two packaging/runtime edge cases remain in the DeepWiki skill. Evidence: git archive/static review plus a local partial-SSE timeout reproduction.

@@ -0,0 +1,559 @@
#!/usr/bin/env python3
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

‼️ Core script is omitted from source archives

Evidence: on current head 29aae41, git archive HEAD | tar -tf - includes tools/ai/hugegraph-ai-deepwiki-skill/plugins/hugegraph-ai-deepwiki-skill/tests/test_deepwiki_mcp.py but not this scripts/deepwiki_mcp.py file. The existing root .gitattributes has scripts/ export-ignore, so archive-based installs and release source tarballs ship the tests/manifests without the executable client. Please either move/rename the plugin script directory or add an explicit archive rule so the skill package is complete in source archives.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the latest deepwiki-skill head. Added explicit archive rules in the root .gitattributes for this skill’s scripts/ directory and files, and verified with git archive HEAD | tar -tf - that scripts/deepwiki_mcp.py is now included alongside the tests/manifests.

if expected_id is None or parsed.get("id") == expected_id:
return parsed

if data_lines:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Partial SSE timeouts still surface as parse errors

Evidence: a fake response that first returns b'data: {"jsonrpc":"2.0","id":7,\n' and then raises TimeoutError makes read_sse_response() raise McpError: DeepWiki MCP returned non-JSON content... instead of the timeout error path added above. The new timeout test only covers an immediate timeout before any bytes are buffered. Please treat the buffered partial event as a timeout when timed_out is true before parsing the trailing data_lines, and add a regression test for that branch.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the latest deepwiki-skill head. read_sse_response() now treats buffered partial SSE data as a timeout when the stream times out instead of parsing incomplete JSON. Added test_read_sse_response_reports_partial_event_timeout; local Ruff, py_compile, and unittest checks pass.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.

Comment on lines +21 to +24
## Requirements

- Python 3.9 or later for the bundled MCP client script.
- Network access to `https://mcp.deepwiki.com/mcp`.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in ad797fd. The English README now states Python 3.10+ for the bundled MCP client, matching the repository requires-python >=3.10 and Ruff target.

Comment on lines +21 to +24
## 前置要求

- Python 3.9 或更高版本,用于运行随附的 MCP 客户端脚本。
- 当前环境需要能访问 `https://mcp.deepwiki.com/mcp`。
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in ad797fd. The Chinese README now states Python 3.10+ for the bundled MCP client, matching this repository runtime requirement.

Comment on lines +12 to +16
- Source repository: `https://github.com/apache/hugegraph-ai`
- DeepWiki page: `https://deepwiki.com/apache/hugegraph-ai`
- MCP endpoint: `https://mcp.deepwiki.com/mcp`
- Default repository: `apache/hugegraph-ai`
- Runtime requirements: Python 3.9+ and network access to the MCP endpoint.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in ad797fd. SKILL.md now declares Python 3.10+ for the bundled MCP client.

Comment thread .gitattributes
Comment on lines +17 to +18
tools/ai/hugegraph-ai-deepwiki-skill/plugins/hugegraph-ai-deepwiki-skill/skills/hugegraph-ai-deepwiki-skill/scripts/ -export-ignore
tools/ai/hugegraph-ai-deepwiki-skill/plugins/hugegraph-ai-deepwiki-skill/skills/hugegraph-ai-deepwiki-skill/scripts/** -export-ignore
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partially addressed in ad797fd. I re-tested git archive without these rules and the archive omitted scripts/deepwiki_mcp.py while keeping the tests, so the negation rules are required for source/archive installs. I kept them and added a comment explaining that they preserve the DeepWiki skill MCP client in release/source archives.

Copy link
Copy Markdown
Member

@imbajin imbajin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking: no. Summary: The skill package is mostly verified, but the CLI repository argument rejects a repository form the docs surface. Evidence: unittest passed (5 tests), py_compile passed, archive includes the skill script, and local resolve_repo("apache/hugegraph-ai") reproduction fails.

return repos


def resolve_repo(alias_or_name: str) -> str:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Accept the documented repository form

Evidence: resolve_repo("hugegraph-ai") returns apache/hugegraph-ai, but resolve_repo("apache/hugegraph-ai") raises McpError: Unknown repository alias ...; the skill docs and metadata repeatedly present the canonical apache/hugegraph-ai value. Impact: users who copy the documented repository name into --repo hit a hard CLI error. Please either accept full owner/repo names as a pass-through or make the CLI/docs explicitly alias-only, and add a regression test for that contract.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in ad797fd. resolve_repo() now accepts both the documented canonical repository name apache/hugegraph-ai and the alias hugegraph-ai, while still limiting pass-through names to enabled repository profiles. Added test_resolve_repo_accepts_alias_and_full_repo_name for the contract.

Copy link
Copy Markdown
Contributor Author

@LRriver LRriver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Submitting pending review draft before posting the latest follow-up replies.

return parsed


def read_sse_response(response: Any, expected_id: int | None) -> dict[str, Any]:
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Added focused unittest coverage for read_sse_response() timeout handling, cache write fallback, invalid cache refetch, and cached-context scoring/selection. Local Ruff check/format, py_compile, and unit tests pass; the latest PR CI is also passing.

@@ -0,0 +1,559 @@
#!/usr/bin/env python3
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the latest deepwiki-skill head. Added explicit archive rules in the root .gitattributes for this skill’s scripts/ directory and files, and verified with git archive HEAD | tar -tf - that scripts/deepwiki_mcp.py is now included alongside the tests/manifests.

if expected_id is None or parsed.get("id") == expected_id:
return parsed

if data_lines:
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the latest deepwiki-skill head. read_sse_response() now treats buffered partial SSE data as a timeout when the stream times out instead of parsing incomplete JSON. Added test_read_sse_response_reports_partial_event_timeout; local Ruff, py_compile, and unittest checks pass.

Copy link
Copy Markdown
Member

@imbajin imbajin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking: no. Summary: One current-head lint failure remains in the bundled DeepWiki client. Evidence: uv run ruff check tools/ai/hugegraph-ai-deepwiki-skill fails with UP038 at deepwiki_mcp.py:323.

except (TimeoutError, socket.timeout) as exc: # noqa: UP041
raise McpError(f"DeepWiki MCP request timed out after {stream_timeout_seconds():.0f}s.") from exc
except urllib.error.URLError as exc:
if isinstance(exc.reason, (TimeoutError, socket.timeout)):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Fix the current Ruff failure

Evidence: uv run ruff check tools/ai/hugegraph-ai-deepwiki-skill fails on this head with UP038 at this line. Impact: the bundled skill package cannot pass the repository's Ruff checks locally. Please use the Python 3.10 union form that the project lint rule expects.

Suggested change
if isinstance(exc.reason, (TimeoutError, socket.timeout)):
if isinstance(exc.reason, TimeoutError | socket.timeout):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants