feat: add session-start prompt-cache preload for crew kickoff (#5921)#5922
Open
devin-ai-integration[bot] wants to merge 3 commits into
Open
feat: add session-start prompt-cache preload for crew kickoff (#5921)#5922devin-ai-integration[bot] wants to merge 3 commits into
devin-ai-integration[bot] wants to merge 3 commits into
Conversation
Contributor
Author
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
Add opt-in cache_preload and cache_preload_strategy parameters to the
Crew class that fire lightweight 1-token cache-warming probes against
each agent's system prompt at kickoff time. This warms the provider's
prompt cache (Anthropic, OpenAI prefix caching, etc.) before the first
real task runs, reducing first-step latency and cache-write costs.
Implementation:
- BaseLLM.preload_probe(): sends max_tokens=1 completion with the
agent's system prompt; failures are logged and never propagated
- Crew.cache_preload / Crew.cache_preload_strategy fields
- Crew._preload_caches() with three strategies:
* parallel: concurrent probes via ThreadPoolExecutor
* sequential: one-by-one in agent order
* shared_prefix: warm common prefix once then per-agent suffixes;
falls back to parallel when prefix < 1024 chars
The feature is opt-in (cache_preload=False by default) and only
activates for crews with 2+ agents.
Co-Authored-By: João <joao@crewai.com>
018b902 to
158d962
Compare
Co-Authored-By: João <joao@crewai.com>
|
|
||
| from unittest.mock import MagicMock, patch | ||
|
|
||
| import pytest |
| patch.object(crew, "_run_sequential_process", return_value=MagicMock()): | ||
| try: | ||
| crew.kickoff() | ||
| except Exception: |
| patch.object(crew, "_run_sequential_process", return_value=MagicMock()): | ||
| try: | ||
| crew.kickoff() | ||
| except Exception: |
| patch.object(crew, "_run_sequential_process", return_value=MagicMock()): | ||
| try: | ||
| crew.kickoff() | ||
| except Exception: |
- Use explicit type annotation for original_max_tokens in preload_probe - Use self.__setattr__ to avoid type mismatch with subclass fields - Replace hasattr checks with isinstance(agent.llm, BaseLLM) for proper type narrowing - Ensure _get_agent_system_prompt returns str without Any leak Co-Authored-By: João <joao@crewai.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements the session-start prompt-cache preload feature requested in #5921. Adds opt-in
cache_preloadandcache_preload_strategyparameters to theCrewclass that fire lightweight 1-token cache-warming probes against each agent's system prompt atcrew.kickoff()time.This warms the provider's prompt cache (Anthropic prompt caching, OpenAI prefix caching, Gemini context caching) before the first real task runs, reducing first-step latency and cache-write costs for multi-agent crews.
Changes:
BaseLLM.preload_probe(system_prompt)— new method on the base LLM class that temporarily setsmax_tokens=1andtemperature=0, then delegates toself.call(). This works with all LLM implementations (native providers + LiteLLM). Failures are logged as warnings and do not block execution.Crew.cache_preload(bool, defaultFalse) — opt-in flag to enable cache warming at kickoff.Crew.cache_preload_strategy(Literal, default"parallel") — three strategies:parallel— probes fired concurrently via a thread poolsequential— probes fired one-by-one in agent ordershared_prefix— detects the common system-prompt prefix across agents; if ≥ 1024 chars, warms it once before per-agent suffixes; falls back to parallel otherwiseCrew._preload_caches()— internal method called duringkickoff()afterprepare_kickoff()completes (agents are fully initialized) but before the process runs. Only activates for crews with 2+ agents.Crew._get_agent_system_prompt(agent)— helper that builds the exact system prompt for an agent usingPrompts.task_execution().Crew._common_prefix(strings)— utility to find the longest common character prefix.Usage:
Review & Testing Checklist for Human
BaseLLM.preload_probemethod correctly sends a 1-token completion viaself.call()and does not raise on API errors (reviewlib/crewai/src/crewai/llms/base_llm.py)_preload_cachesis called at the right point inkickoff()— afterprepare_kickoff()but before process execution (reviewlib/crewai/src/crewai/crew.py)shared_prefixstrategy correctly falls back to parallel when the common prefix is < 1024 charscache_preload=True) against Anthropic/OpenAI to confirm cache-warming reduces first-step latencycache_preload=False(default) does not change any existing behaviorNotes
cache_preload=Falseby default, no behavioral changes for existing userspreload_probeis defined onBaseLLMso it works with all LLM implementations (native OpenAI, Anthropic, Gemini, Azure, Bedrock, and LiteLLM fallback)Link to Devin session: https://app.devin.ai/sessions/cd612f749eca4b80af1ebea64e832a8f