Skip to content

feat(knowledge): thread metadata_filter through Knowledge.query and Crew.query_knowledge#5899

Open
adityasingh2400 wants to merge 1 commit into
crewAIInc:mainfrom
adityasingh2400:feat-knowledge-metadata-filter-5805
Open

feat(knowledge): thread metadata_filter through Knowledge.query and Crew.query_knowledge#5899
adityasingh2400 wants to merge 1 commit into
crewAIInc:mainfrom
adityasingh2400:feat-knowledge-metadata-filter-5805

Conversation

@adityasingh2400
Copy link
Copy Markdown

@adityasingh2400 adityasingh2400 commented May 21, 2026

Summary

Fixes #5805. KnowledgeStorage.search already accepted a metadata_filter, but the Knowledge.query / aquery / Crew.query_knowledge layer never forwarded it, so users could not narrow knowledge retrieval by document metadata at query time. This PR fills all four gaps from the issue:

  1. KnowledgeConfig.metadata_filter: new optional dict[str, Any] | None field. results_limit now has ge=1 and score_threshold is constrained to (0, 1] so invalid values are rejected before they reach storage (addresses the CodeRabbit review on the prior Devin attempt Fix #5805: Add metadata_filter support to Knowledge querying pipeline #5806).
  2. Knowledge.query / aquery: accept and forward metadata_filter to KnowledgeStorage.search / asearch. The agent path already calls these with **knowledge_config.model_dump(), so the new field flows through automatically when Agent.knowledge_config is set.
  3. Crew.query_knowledge / aquery_knowledge: accept and forward metadata_filter down to Knowledge.query / aquery.
  4. BaseKnowledgeSource._save_documents / _asave_documents: when self.metadata is non-empty, it is merged into each stored document via a new optional metadata arg on KnowledgeStorage.save / asave (previously this field was marked # Currently unused). The legacy no-metadata call path is preserved, so existing storage implementations and tests keep working.

Usage:

from crewai.knowledge.knowledge_config import KnowledgeConfig

config = KnowledgeConfig(
    results_limit=10,
    score_threshold=0.5,
    metadata_filter={"task": "translation"},
)
agent = Agent(
    knowledge_sources=[...],
    knowledge_config=config,
)

Backward compatible: existing code without knowledge_config or metadata_filter behaves identically. KnowledgeConfig() still produces the same defaults.

Changes

  • lib/crewai/src/crewai/knowledge/knowledge_config.py: add metadata_filter field; add ge=1 / gt=0,le=1 bounds.
  • lib/crewai/src/crewai/knowledge/knowledge.py: thread metadata_filter through query / aquery.
  • lib/crewai/src/crewai/crew.py: thread metadata_filter through query_knowledge / aquery_knowledge.
  • lib/crewai/src/crewai/knowledge/source/base_knowledge_source.py: forward self.metadata to storage in _save_documents / _asave_documents.
  • lib/crewai/src/crewai/knowledge/storage/base_knowledge_storage.py: extend the save / asave abstract contract with an optional metadata arg (single dict broadcast to all documents, or a list matched positionally).
  • lib/crewai/src/crewai/knowledge/storage/knowledge_storage.py: implement the new metadata arg, raising ValueError if a metadata list length does not match the documents list.

Tests

New file lib/crewai/tests/knowledge/test_knowledge_metadata_filter.py with 17 tests covering:

  • KnowledgeConfig defaults, metadata_filter round-trip, and bounds validation for results_limit and score_threshold.
  • Knowledge.query / aquery forward metadata_filter to a recording fake storage, including the None default path.
  • Crew.query_knowledge / aquery_knowledge forward metadata_filter end to end.
  • BaseKnowledgeSource._save_documents / _asave_documents pass self.metadata when set, omit it when empty, and raise without storage.
  • KnowledgeStorage.save / asave attach metadata to every BaseRecord and reject mismatched metadata-list lengths.

Two existing mock-call assertions were updated to include the new metadata_filter kwarg (test_async_knowledge.py, test_knowledge_searchresult.py).

uv run --project lib/crewai pytest lib/crewai/tests/knowledge/ -p no:randomly
======================== 63 passed, 1 warning in 5.60s =========================

(test_excel_knowledge_source, test_docling_source, and test_multiple_docling_sources are deselected because they require optional pandas / docling extras that are unrelated to this change.)

Review checklist

  • Confirm the new KnowledgeConfig bounds match what storage backends expect; pydantic raises ValidationError instead of letting an out-of-range value reach the vector store.
  • Confirm BaseKnowledgeSource._save_documents now persists self.metadata. The original # Currently unused comment is gone and the field is documented.
  • End-to-end check with a real backend (ChromaDB / Qdrant): index documents with source-level metadata, then query with metadata_filter={...} and confirm only matching documents are returned.

Closes #5805.

Summary by CodeRabbit

  • New Features

    • Introduced metadata filtering for knowledge queries, enabling users to attach metadata to knowledge sources and apply filters when retrieving information.
  • Tests

    • Added comprehensive test coverage for metadata attachment and filtering functionality across the knowledge system.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 21, 2026

Caution

Review failed

An error occurred during the review process. Please try again later.

📝 Walkthrough

Walkthrough

This PR implements metadata filtering support for knowledge retrieval by enabling configuration-time filter specification, forwarding filters through query layers, implementing storage-level metadata handling, and persisting source-level metadata to stored documents.

Changes

Metadata Filtering Feature

Layer / File(s) Summary
Metadata filtering API and configuration
lib/crewai/src/crewai/knowledge/knowledge_config.py, lib/crewai/src/crewai/knowledge/knowledge.py, lib/crewai/src/crewai/crew.py
KnowledgeConfig accepts optional metadata_filter field with validation constraints. Knowledge.query() and Knowledge.aquery() now accept and forward metadata_filter to storage. Crew.query_knowledge() and Crew.aquery_knowledge() similarly accept and forward metadata_filter to the knowledge layer.
Storage abstraction and metadata implementation
lib/crewai/src/crewai/knowledge/storage/base_knowledge_storage.py, lib/crewai/src/crewai/knowledge/storage/knowledge_storage.py
BaseKnowledgeStorage abstract methods updated to accept optional metadata (single dict or list). KnowledgeStorage implements _build_rag_documents helper to construct BaseRecord entries with metadata attachment, handling both single metadata applied to all documents and per-document metadata lists with length validation.
Knowledge source metadata persistence
lib/crewai/src/crewai/knowledge/source/base_knowledge_source.py
BaseKnowledgeSource adds metadata field (empty dict by default). _save_documents() and _asave_documents() now forward self.metadata to storage operations when non-empty, raising ValueError if storage is unset.
Comprehensive metadata filter test coverage
lib/crewai/tests/knowledge/test_knowledge_metadata_filter.py, lib/crewai/tests/knowledge/test_async_knowledge.py, lib/crewai/tests/knowledge/test_knowledge_searchresult.py
New test_knowledge_metadata_filter.py verifies metadata filter forwarding through Knowledge and Crew query layers, KnowledgeConfig validation, source metadata persistence, and KnowledgeStorage record construction. Existing tests updated to include metadata_filter=None in expected call signatures.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Metadata flows like carrots through rows,
Filtering knowledge where the config goes,
From Crew to Crew, through storage so true,
Each source remembers what it once knew! 🥕✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 36.73% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main change: threading metadata_filter through Knowledge.query and Crew.query_knowledge methods, which is the primary feature addition.
Linked Issues check ✅ Passed All four objectives from issue #5805 are fully addressed: KnowledgeConfig.metadata_filter added, Knowledge.query/aquery forward metadata_filter to storage, Crew.query_knowledge/aquery_knowledge accept and forward metadata_filter, and BaseKnowledgeSource persists metadata when saving documents.
Out of Scope Changes check ✅ Passed All changes are directly aligned with the linked issue requirements. Field validators added to KnowledgeConfig (ge=1, gt=0, le=1) and new test coverage are appropriate scope for this feature implementation.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@adityasingh2400
Copy link
Copy Markdown
Author

Hi maintainers, this PR is from a fork and the required CI workflows (Lint, Run Type Checks, Run Tests on 3.10 to 3.13) are sitting in the action_required state, which is the first-time-contributor gate. Could one of you approve the workflow runs so the required checks can complete and the PR can be merged? The change threads an optional metadata_filter through Knowledge.query and Crew.query_knowledge so callers can scope retrieval without subclassing, with tests covering the new path. Happy to rebase if you need anything else. Thanks.

…rew.query_knowledge

KnowledgeStorage.search already accepted a metadata_filter, but the
Knowledge.query / aquery / Crew.query_knowledge layer never forwarded
it, so users could not narrow knowledge retrieval by document metadata
at query time. Adds metadata_filter to KnowledgeConfig (with bounds
validators on results_limit and score_threshold), threads it through
the public query APIs, and merges BaseKnowledgeSource.metadata into
stored document metadata so filters can match what was indexed.

Fixes crewAIInc#5805
@adityasingh2400 adityasingh2400 force-pushed the feat-knowledge-metadata-filter-5805 branch from ce096c6 to 83a5e80 Compare May 23, 2026 13:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Knowledge metadata supported in storage but not configurable via KnowledgeConfig / Agent

1 participant