Skip to content

feat: add session-start prompt-cache preload for crew kickoff (#5921)#5922

Open
devin-ai-integration[bot] wants to merge 3 commits into
mainfrom
devin/1779691235-cache-preload-kickoff
Open

feat: add session-start prompt-cache preload for crew kickoff (#5921)#5922
devin-ai-integration[bot] wants to merge 3 commits into
mainfrom
devin/1779691235-cache-preload-kickoff

Conversation

@devin-ai-integration
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot commented May 25, 2026

Summary

Implements the session-start prompt-cache preload feature requested in #5921. Adds opt-in cache_preload and cache_preload_strategy parameters to the Crew class that fire lightweight 1-token cache-warming probes against each agent's system prompt at crew.kickoff() time.

This warms the provider's prompt cache (Anthropic prompt caching, OpenAI prefix caching, Gemini context caching) before the first real task runs, reducing first-step latency and cache-write costs for multi-agent crews.

Changes:

  • BaseLLM.preload_probe(system_prompt) — new method on the base LLM class that temporarily sets max_tokens=1 and temperature=0, then delegates to self.call(). This works with all LLM implementations (native providers + LiteLLM). Failures are logged as warnings and do not block execution.
  • Crew.cache_preload (bool, default False) — opt-in flag to enable cache warming at kickoff.
  • Crew.cache_preload_strategy (Literal, default "parallel") — three strategies:
    • parallel — probes fired concurrently via a thread pool
    • sequential — probes fired one-by-one in agent order
    • shared_prefix — detects the common system-prompt prefix across agents; if ≥ 1024 chars, warms it once before per-agent suffixes; falls back to parallel otherwise
  • Crew._preload_caches() — internal method called during kickoff() after prepare_kickoff() completes (agents are fully initialized) but before the process runs. Only activates for crews with 2+ agents.
  • Crew._get_agent_system_prompt(agent) — helper that builds the exact system prompt for an agent using Prompts.task_execution().
  • Crew._common_prefix(strings) — utility to find the longest common character prefix.

Usage:

crew = Crew(
    agents=[a1, a2, a3],
    tasks=[t1, t2, t3],
    cache_preload=True,                    # opt-in
    cache_preload_strategy="parallel",     # or "sequential" / "shared_prefix"
)
crew.kickoff()

Review & Testing Checklist for Human

  • Verify the BaseLLM.preload_probe method correctly sends a 1-token completion via self.call() and does not raise on API errors (review lib/crewai/src/crewai/llms/base_llm.py)
  • Verify _preload_caches is called at the right point in kickoff() — after prepare_kickoff() but before process execution (review lib/crewai/src/crewai/crew.py)
  • Verify the shared_prefix strategy correctly falls back to parallel when the common prefix is < 1024 chars
  • Test with a real multi-agent crew (cache_preload=True) against Anthropic/OpenAI to confirm cache-warming reduces first-step latency
  • Confirm that cache_preload=False (default) does not change any existing behavior

Notes

  • Feature is fully opt-in — cache_preload=False by default, no behavioral changes for existing users
  • Single-agent crews skip preloading entirely (no-op)
  • preload_probe is defined on BaseLLM so it works with all LLM implementations (native OpenAI, Anthropic, Gemini, Azure, Bedrock, and LiteLLM fallback)
  • 22 new tests added covering all strategies, field defaults, kickoff integration, and edge cases
  • All 126 existing crew tests and 56 LLM tests continue to pass

Link to Devin session: https://app.devin.ai/sessions/cd612f749eca4b80af1ebea64e832a8f

@devin-ai-integration
Copy link
Copy Markdown
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

Comment thread tests/test_cache_preload.py Fixed
Comment thread lib/crewai/tests/test_cache_preload.py Fixed
Comment thread tests/test_cache_preload.py Fixed
Add opt-in cache_preload and cache_preload_strategy parameters to the
Crew class that fire lightweight 1-token cache-warming probes against
each agent's system prompt at kickoff time. This warms the provider's
prompt cache (Anthropic, OpenAI prefix caching, etc.) before the first
real task runs, reducing first-step latency and cache-write costs.

Implementation:
- BaseLLM.preload_probe(): sends max_tokens=1 completion with the
  agent's system prompt; failures are logged and never propagated
- Crew.cache_preload / Crew.cache_preload_strategy fields
- Crew._preload_caches() with three strategies:
  * parallel: concurrent probes via ThreadPoolExecutor
  * sequential: one-by-one in agent order
  * shared_prefix: warm common prefix once then per-agent suffixes;
    falls back to parallel when prefix < 1024 chars

The feature is opt-in (cache_preload=False by default) and only
activates for crews with 2+ agents.

Co-Authored-By: João <joao@crewai.com>
@devin-ai-integration devin-ai-integration Bot force-pushed the devin/1779691235-cache-preload-kickoff branch from 018b902 to 158d962 Compare May 25, 2026 07:01
Co-Authored-By: João <joao@crewai.com>

from unittest.mock import MagicMock, patch

import pytest
patch.object(crew, "_run_sequential_process", return_value=MagicMock()):
try:
crew.kickoff()
except Exception:
patch.object(crew, "_run_sequential_process", return_value=MagicMock()):
try:
crew.kickoff()
except Exception:
patch.object(crew, "_run_sequential_process", return_value=MagicMock()):
try:
crew.kickoff()
except Exception:
- Use explicit type annotation for original_max_tokens in preload_probe
- Use self.__setattr__ to avoid type mismatch with subclass fields
- Replace hasattr checks with isinstance(agent.llm, BaseLLM) for proper
  type narrowing
- Ensure _get_agent_system_prompt returns str without Any leak

Co-Authored-By: João <joao@crewai.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants