feat(third_party): add Dewey managed RAG pipeline example by lambdabaa · Pull Request #500 · anthropics/claude-cookbooks

lambdabaa · 2026-04-06T06:24:40Z

Summary

Adds a notebook demonstrating how to build production document Q&A using Dewey as a managed RAG backend alongside the Anthropic Python SDK.

Dewey handles the full ingestion pipeline (PDF conversion, section extraction, chunking, embedding) behind a single API, letting developers focus on the application layer rather than infrastructure assembly.

The notebook covers:

Uploading PDFs (three foundational AI papers from ArXiv) to a Dewey collection
Waiting for async ingestion and inspecting the extracted section hierarchy
Hybrid BM25 + vector search (RRF) with chunk-level citation metadata
Section-aware retrieval: scan section titles/summaries cheaply before loading full chunk content
Streaming agentic research endpoint powered by claude-sonnet-4-6 with tool-call trace and source attribution
BYOK (bring your own Anthropic key) for direct cost transparency
A RAG chat loop using Dewey retrieval + Anthropic SDK (claude-haiku-4-5-20251001) generation

Notebook location

third_party/Dewey/dewey_rag_pipeline.ipynb

Dependencies

meetdewey — Dewey Python SDK
anthropic — Anthropic Python SDK
requests — for downloading ArXiv PDFs

All installed via %pip install at the top of the notebook.

nidhishgajjar · 2026-04-14T21:36:18Z

Orb Code Review (powered by GLM 5.1 on Orb Cloud)

New third-party cookbook: Production Document Q&A with Dewey's Managed RAG Backend.

Observations

1. Well-structured RAG pipeline walkthrough (Positive)
The notebook progresses logically: create collection → upload documents → hybrid search → section-aware retrieval → agentic research → chat loop. Each step builds on the previous one.

2. Section-aware retrieval is a good pattern (Positive)
The "scan sections cheaply → fetch chunks only where needed" pattern is a valuable technique for large document Q&A. Good that this is highlighted.

3. API key handling (Low)

DEWEY_API_KEY = os.environ.get("DEWEY_API_KEY", "dwy_live_...")

The fallback value is clearly a placeholder, but using os.environ.get() with a default that will fail at runtime could be confusing. Consider using os.environ["DEWEY_API_KEY"] with no default to fail fast with a clear error message.

4. Polling loop has no timeout (Medium)

def wait_for_ready(collection_id, doc_ids, poll_interval=5.0):
    pending = set(doc_ids)
    while pending:
        ...

This loop will run indefinitely if documents are stuck in processing. Consider adding a max_wait parameter:

deadline = time.time() + max_wait
while pending and time.time() < deadline:

Summary

Well-structured RAG cookbook with a clear progression. The missing timeout in the polling loop is worth addressing before merge.

Assessment: approve (with suggestion to add polling timeout)

- Use os.environ[] instead of os.environ.get() with placeholder defaults for DEWEY_API_KEY and ANTHROPIC_API_KEY, so missing keys fail fast with a clear KeyError rather than silently using a broken placeholder - Add max_wait parameter (default 300s) and deadline to wait_for_ready(), raising TimeoutError if documents are stuck in processing indefinitely Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

lambdabaa added 3 commits April 5, 2026 23:23

feat(third_party): add Dewey README

9e40eaa

feat(third_party): add Dewey RAG pipeline notebook

93dd74d

feat(registry): add Dewey RAG pipeline entry

cd9f183

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(third_party): add Dewey managed RAG pipeline example#500

feat(third_party): add Dewey managed RAG pipeline example#500
lambdabaa wants to merge 4 commits intoanthropics:mainfrom
lambdabaa:dewey-rag-pipeline

lambdabaa commented Apr 6, 2026

Uh oh!

nidhishgajjar commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

lambdabaa commented Apr 6, 2026

Summary

Notebook location

Dependencies

Uh oh!

nidhishgajjar commented Apr 14, 2026

Observations

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants