LCORE-2802: Bake embedding model into image and make HF_HUB_OFFLINE o…#2008
LCORE-2802: Bake embedding model into image and make HF_HUB_OFFLINE o…#2008alessandralanz wants to merge 3 commits into
Conversation
|
Warning Review limit reached
More reviews will be available in 3 minutes and 18 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughThe runtime image now pre-downloads an embedding model into a dedicated Hugging Face cache directory. The compose service environment now passes ChangesRuntime image and service environment
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@deploy/lightspeed-stack/Containerfile`:
- Around line 135-137: The baked Hugging Face model ID in the Containerfile does
not match the runtime default used by the Solr embedding configuration. Update
the model download step to use the same canonical identifier as the default in
constants so the artifact baked into the image matches what runtime resolves
when HF_HUB_OFFLINE=1 is set. Keep the identifiers aligned between the
Containerfile model fetch and the default embedding model constant.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 6e492bb9-dc3d-48d9-ac02-5580cf934c7e
📒 Files selected for processing (2)
deploy/lightspeed-stack/Containerfiledocker-compose-library.yaml
📜 Review details
⏰ Context from checks skipped due to timeout. (15)
- GitHub Check: radon
- GitHub Check: Pyright
- GitHub Check: unit_tests (3.12)
- GitHub Check: integration_tests (3.12)
- GitHub Check: integration_tests (3.13)
- GitHub Check: unit_tests (3.13)
- GitHub Check: spectral
- GitHub Check: mypy
- GitHub Check: E2E: server mode / ci / group 1
- GitHub Check: E2E: library mode / ci / group 1
- GitHub Check: E2E: library mode / ci / group 3
- GitHub Check: E2E: server mode / ci / group 3
- GitHub Check: E2E: library mode / ci / group 2
- GitHub Check: E2E: server mode / ci / group 2
- GitHub Check: E2E Tests for Lightspeed Evaluation job
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-05-20T08:09:30.641Z
Learnt from: max-svistunov
Repo: lightspeed-core/lightspeed-stack PR: 1580
File: docs/design/llama-stack-config-merge/poc-results/library-mode/synthesized-run.yaml:107-110
Timestamp: 2026-05-20T08:09:30.641Z
Learning: In Llama-stack config YAMLs, when defining a Llama Guard safety shield entry, set `provider_shield_id` to the *guard model identifier* (e.g., `meta-llama/Llama-Guard-3-8B`). Do not use a chat/generative model id (e.g., `openai/gpt-4o-mini`): a chat-model id (or `native_override`) indicates only an override landed and does **not** mean the safety shield is actually gating queries. Ensure any E2E coverage for the related implementation (JIRA/E2E tests) exercises a real Llama Guard model to verify that the shield is effective.
Applied to files:
docker-compose-library.yaml
🔇 Additional comments (1)
docker-compose-library.yaml (1)
61-64: LGTM!
| ENV HF_HOME=/app-root/.hf-models | ||
| RUN mkdir -p /app-root/.hf-models && \ | ||
| python3.12 -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('ibm-granite/granite-embedding-30m-english')" && \ |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Bake the same model ID the runtime defaults to.
Line 137 downloads ibm-granite/granite-embedding-30m-english, but src/constants.py:235-238 sets the default Solr embedding model to sentence-transformers/ibm-granite/granite-embedding-30m-english. With HF_HUB_OFFLINE=1 as the default path, OKP/Solr can still miss the baked artifact and fail when it resolves the configured default model ID. Use one canonical identifier in both places.
Suggested fix
ENV HF_HOME=/app-root/.hf-models
RUN mkdir -p /app-root/.hf-models && \
- python3.12 -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('ibm-granite/granite-embedding-30m-english')" && \
+ python3.12 -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('sentence-transformers/ibm-granite/granite-embedding-30m-english')" && \
chown -R 1001:1001 /app-root/.hf-models📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ENV HF_HOME=/app-root/.hf-models | |
| RUN mkdir -p /app-root/.hf-models && \ | |
| python3.12 -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('ibm-granite/granite-embedding-30m-english')" && \ | |
| ENV HF_HOME=/app-root/.hf-models | |
| RUN mkdir -p /app-root/.hf-models && \ | |
| python3.12 -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('sentence-transformers/ibm-granite/granite-embedding-30m-english')" && \ |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@deploy/lightspeed-stack/Containerfile` around lines 135 - 137, The baked
Hugging Face model ID in the Containerfile does not match the runtime default
used by the Solr embedding configuration. Update the model download step to use
the same canonical identifier as the default in constants so the artifact baked
into the image matches what runtime resolves when HF_HUB_OFFLINE=1 is set. Keep
the identifiers aligned between the Containerfile model fetch and the default
embedding model constant.
Description
Pre-downloads the
ibm-granite/granite-embedding-30m-englishembedding model (~61MB) into the Docker image at build time for OKP/Solr vector search.Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
Build verification:
docker compose -f docker-compose-library.yaml build lightspeed-stack: image builds successfully with model download completing in ~20sdocker run --rm --entrypoint ls lightspeed-stack-lightspeed-stack /app-root/.hf-models/hub/: confirmsmodels--ibm-granite--granite-embedding-30m-englishis present in the imageEnd-to-end verification (GitLab CI):
rag_chunkswith OKP content (scores ~1489), confirming the baked embedding model works correctlyHF_HUB_OFFLINEoverride verified:HF_HUB_OFFLINE=0correctly allows runtime downloads when neededSummary by CodeRabbit