Skip to content

LCORE-2802: Bake embedding model into image and make HF_HUB_OFFLINE o…#2008

Open
alessandralanz wants to merge 3 commits into
lightspeed-core:mainfrom
alessandralanz:embed-model-cache
Open

LCORE-2802: Bake embedding model into image and make HF_HUB_OFFLINE o…#2008
alessandralanz wants to merge 3 commits into
lightspeed-core:mainfrom
alessandralanz:embed-model-cache

Conversation

@alessandralanz

@alessandralanz alessandralanz commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Description

Pre-downloads the ibm-granite/granite-embedding-30m-english embedding model (~61MB) into the Docker image at build time for OKP/Solr vector search.

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement
  • Benchmarks improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: Claude (Claude Code CLI)
  • Generated by: N/A

Related Tickets & Documents

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Build verification:

  1. docker compose -f docker-compose-library.yaml build lightspeed-stack: image builds successfully with model download completing in ~20s
  2. docker run --rm --entrypoint ls lightspeed-stack-lightspeed-stack /app-root/.hf-models/hub/: confirms models--ibm-granite--granite-embedding-30m-english is present in the image

End-to-end verification (GitLab CI):

  • Full evaluation pipeline ran successfully against mimir OKP server: 97 conversations, 506 evaluations (303 Pass, 155 Fail, 15 Error, 33 Skipped)
  • RAG queries return populated rag_chunks with OKP content (scores ~1489), confirming the baked embedding model works correctly
  • Pipeline completed within the 2-hour timeout on GitLab CI runner

HF_HUB_OFFLINE override verified:

  • Default behavior (offline) unchanged when env var is not set
  • Setting HF_HUB_OFFLINE=0 correctly allows runtime downloads when needed

Summary by CodeRabbit

  • New Features
    • Preloads the embedding model used for vector search during image build, helping the app start without needing to fetch it later.
    • Adds support for a new runtime environment variable for OKP-related access.
  • Bug Fixes
    • Improves container behavior when Hugging Face Hub access is offline by making the setting configurable instead of fixed.
    • Ensures model cache files are stored with the correct ownership for reliable runtime use.

@coderabbitai

coderabbitai Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Warning

Review limit reached

@alessandralanz, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 3 minutes and 18 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 43fb118c-46fb-4c3b-957f-2919368f8c13

📥 Commits

Reviewing files that changed from the base of the PR and between acac105 and 9a9de52.

📒 Files selected for processing (1)
  • deploy/lightspeed-stack/Containerfile

Walkthrough

The runtime image now pre-downloads an embedding model into a dedicated Hugging Face cache directory. The compose service environment now passes RH_SERVER_OKP and makes HF_HUB_OFFLINE configurable from the host.

Changes

Runtime image and service environment

Layer / File(s) Summary
Bundle embedding model in runtime image
deploy/lightspeed-stack/Containerfile
HF_HOME now points to /app-root/.hf-models; the directory is created in the runtime stage, ibm-granite/granite-embedding-30m-english is downloaded during build, and the cache is chowned to 1001:1001.
Update service environment
docker-compose-library.yaml
The lightspeed-stack service environment now passes RH_SERVER_OKP from the host and reads HF_HUB_OFFLINE from the host with default -1 instead of a fixed 1.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: baking the embedding model into the image and making HF_HUB_OFFLINE overridable.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
✨ Simplify code
  • Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@deploy/lightspeed-stack/Containerfile`:
- Around line 135-137: The baked Hugging Face model ID in the Containerfile does
not match the runtime default used by the Solr embedding configuration. Update
the model download step to use the same canonical identifier as the default in
constants so the artifact baked into the image matches what runtime resolves
when HF_HUB_OFFLINE=1 is set. Keep the identifiers aligned between the
Containerfile model fetch and the default embedding model constant.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6e492bb9-dc3d-48d9-ac02-5580cf934c7e

📥 Commits

Reviewing files that changed from the base of the PR and between fe8dd40 and acac105.

📒 Files selected for processing (2)
  • deploy/lightspeed-stack/Containerfile
  • docker-compose-library.yaml
📜 Review details
⏰ Context from checks skipped due to timeout. (15)
  • GitHub Check: radon
  • GitHub Check: Pyright
  • GitHub Check: unit_tests (3.12)
  • GitHub Check: integration_tests (3.12)
  • GitHub Check: integration_tests (3.13)
  • GitHub Check: unit_tests (3.13)
  • GitHub Check: spectral
  • GitHub Check: mypy
  • GitHub Check: E2E: server mode / ci / group 1
  • GitHub Check: E2E: library mode / ci / group 1
  • GitHub Check: E2E: library mode / ci / group 3
  • GitHub Check: E2E: server mode / ci / group 3
  • GitHub Check: E2E: library mode / ci / group 2
  • GitHub Check: E2E: server mode / ci / group 2
  • GitHub Check: E2E Tests for Lightspeed Evaluation job
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-05-20T08:09:30.641Z
Learnt from: max-svistunov
Repo: lightspeed-core/lightspeed-stack PR: 1580
File: docs/design/llama-stack-config-merge/poc-results/library-mode/synthesized-run.yaml:107-110
Timestamp: 2026-05-20T08:09:30.641Z
Learning: In Llama-stack config YAMLs, when defining a Llama Guard safety shield entry, set `provider_shield_id` to the *guard model identifier* (e.g., `meta-llama/Llama-Guard-3-8B`). Do not use a chat/generative model id (e.g., `openai/gpt-4o-mini`): a chat-model id (or `native_override`) indicates only an override landed and does **not** mean the safety shield is actually gating queries. Ensure any E2E coverage for the related implementation (JIRA/E2E tests) exercises a real Llama Guard model to verify that the shield is effective.

Applied to files:

  • docker-compose-library.yaml
🔇 Additional comments (1)
docker-compose-library.yaml (1)

61-64: LGTM!

Comment thread deploy/lightspeed-stack/Containerfile Outdated
Comment on lines +135 to +137
ENV HF_HOME=/app-root/.hf-models
RUN mkdir -p /app-root/.hf-models && \
python3.12 -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('ibm-granite/granite-embedding-30m-english')" && \

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Bake the same model ID the runtime defaults to.

Line 137 downloads ibm-granite/granite-embedding-30m-english, but src/constants.py:235-238 sets the default Solr embedding model to sentence-transformers/ibm-granite/granite-embedding-30m-english. With HF_HUB_OFFLINE=1 as the default path, OKP/Solr can still miss the baked artifact and fail when it resolves the configured default model ID. Use one canonical identifier in both places.

Suggested fix
 ENV HF_HOME=/app-root/.hf-models
 RUN mkdir -p /app-root/.hf-models && \
-    python3.12 -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('ibm-granite/granite-embedding-30m-english')" && \
+    python3.12 -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('sentence-transformers/ibm-granite/granite-embedding-30m-english')" && \
     chown -R 1001:1001 /app-root/.hf-models
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
ENV HF_HOME=/app-root/.hf-models
RUN mkdir -p /app-root/.hf-models && \
python3.12 -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('ibm-granite/granite-embedding-30m-english')" && \
ENV HF_HOME=/app-root/.hf-models
RUN mkdir -p /app-root/.hf-models && \
python3.12 -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('sentence-transformers/ibm-granite/granite-embedding-30m-english')" && \
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deploy/lightspeed-stack/Containerfile` around lines 135 - 137, The baked
Hugging Face model ID in the Containerfile does not match the runtime default
used by the Solr embedding configuration. Update the model download step to use
the same canonical identifier as the default in constants so the artifact baked
into the image matches what runtime resolves when HF_HUB_OFFLINE=1 is set. Keep
the identifiers aligned between the Containerfile model fetch and the default
embedding model constant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant