LCORE-2802: Bake embedding model into image and make HF_HUB_OFFLINE o… by alessandralanz · Pull Request #2008 · lightspeed-core/lightspeed-stack

alessandralanz · 2026-06-26T20:16:29Z

Description

Pre-downloads the ibm-granite/granite-embedding-30m-english embedding model (~61MB) into the Docker image at build time for OKP/Solr vector search.

Type of change

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

Assisted-by: Claude (Claude Code CLI)
Generated by: N/A

Related Tickets & Documents

Related Issue #
Closes # LCORE-2802

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Please provide detailed steps to perform tests related to this code change.
How were the fix/results from this change verified? Please provide relevant screenshots or results.

Build verification:

docker compose -f docker-compose-library.yaml build lightspeed-stack: image builds successfully with model download completing in ~20s
docker run --rm --entrypoint ls lightspeed-stack-lightspeed-stack /app-root/.hf-models/hub/: confirms models--ibm-granite--granite-embedding-30m-english is present in the image

End-to-end verification (GitLab CI):

Full evaluation pipeline ran successfully against mimir OKP server: 97 conversations, 506 evaluations (303 Pass, 155 Fail, 15 Error, 33 Skipped)
RAG queries return populated rag_chunks with OKP content (scores ~1489), confirming the baked embedding model works correctly
Pipeline completed within the 2-hour timeout on GitLab CI runner

HF_HUB_OFFLINE override verified:

Default behavior (offline) unchanged when env var is not set
Setting HF_HUB_OFFLINE=0 correctly allows runtime downloads when needed

Summary by CodeRabbit

New Features
- Preloads the embedding model used for vector search during image build, helping the app start without needing to fetch it later.
- Adds support for a new runtime environment variable for OKP-related access.
Bug Fixes
- Improves container behavior when Hugging Face Hub access is offline by making the setting configurable instead of fixed.
- Ensures model cache files are stored with the correct ownership for reliable runtime use.

…verridable

coderabbitai · 2026-06-26T20:16:43Z

Warning

Review limit reached

@alessandralanz, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 3 minutes and 18 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 43fb118c-46fb-4c3b-957f-2919368f8c13

📥 Commits

Reviewing files that changed from the base of the PR and between acac105 and 9a9de52.

📒 Files selected for processing (1)

deploy/lightspeed-stack/Containerfile

Walkthrough

The runtime image now pre-downloads an embedding model into a dedicated Hugging Face cache directory. The compose service environment now passes RH_SERVER_OKP and makes HF_HUB_OFFLINE configurable from the host.

Changes

Runtime image and service environment

Layer / File(s)	Summary
Bundle embedding model in runtime image `deploy/lightspeed-stack/Containerfile`	`HF_HOME` now points to `/app-root/.hf-models`; the directory is created in the runtime stage, `ibm-granite/granite-embedding-30m-english` is downloaded during build, and the cache is chowned to `1001:1001`.
Update service environment `docker-compose-library.yaml`	The `lightspeed-stack` service environment now passes `RH_SERVER_OKP` from the host and reads `HF_HUB_OFFLINE` from the host with default `-1` instead of a fixed `1`.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main change: baking the embedding model into the image and making HF_HUB_OFFLINE overridable.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

✨ Simplify code

Create PR with simplified code

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@deploy/lightspeed-stack/Containerfile`:
- Around line 135-137: The baked Hugging Face model ID in the Containerfile does
not match the runtime default used by the Solr embedding configuration. Update
the model download step to use the same canonical identifier as the default in
constants so the artifact baked into the image matches what runtime resolves
when HF_HUB_OFFLINE=1 is set. Keep the identifiers aligned between the
Containerfile model fetch and the default embedding model constant.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6e492bb9-dc3d-48d9-ac02-5580cf934c7e

📥 Commits

Reviewing files that changed from the base of the PR and between fe8dd40 and acac105.

📒 Files selected for processing (2)

deploy/lightspeed-stack/Containerfile
docker-compose-library.yaml

📜 Review details

⏰ Context from checks skipped due to timeout. (15)

GitHub Check: radon
GitHub Check: Pyright
GitHub Check: unit_tests (3.12)
GitHub Check: integration_tests (3.12)
GitHub Check: integration_tests (3.13)
GitHub Check: unit_tests (3.13)
GitHub Check: spectral
GitHub Check: mypy
GitHub Check: E2E: server mode / ci / group 1
GitHub Check: E2E: library mode / ci / group 1
GitHub Check: E2E: library mode / ci / group 3
GitHub Check: E2E: server mode / ci / group 3
GitHub Check: E2E: library mode / ci / group 2
GitHub Check: E2E: server mode / ci / group 2
GitHub Check: E2E Tests for Lightspeed Evaluation job

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2026-05-20T08:09:30.641Z

Learnt from: max-svistunov
Repo: lightspeed-core/lightspeed-stack PR: 1580
File: docs/design/llama-stack-config-merge/poc-results/library-mode/synthesized-run.yaml:107-110
Timestamp: 2026-05-20T08:09:30.641Z
Learning: In Llama-stack config YAMLs, when defining a Llama Guard safety shield entry, set `provider_shield_id` to the *guard model identifier* (e.g., `meta-llama/Llama-Guard-3-8B`). Do not use a chat/generative model id (e.g., `openai/gpt-4o-mini`): a chat-model id (or `native_override`) indicates only an override landed and does **not** mean the safety shield is actually gating queries. Ensure any E2E coverage for the related implementation (JIRA/E2E tests) exercises a real Llama Guard model to verify that the shield is effective.

Applied to files:

docker-compose-library.yaml

🔇 Additional comments (1)

docker-compose-library.yaml (1)

61-64: LGTM!

coderabbitai · 2026-06-26T20:20:38Z

+ENV HF_HOME=/app-root/.hf-models
+RUN mkdir -p /app-root/.hf-models && \
+    python3.12 -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('ibm-granite/granite-embedding-30m-english')" && \


🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Bake the same model ID the runtime defaults to.

Line 137 downloads ibm-granite/granite-embedding-30m-english, but src/constants.py:235-238 sets the default Solr embedding model to sentence-transformers/ibm-granite/granite-embedding-30m-english. With HF_HUB_OFFLINE=1 as the default path, OKP/Solr can still miss the baked artifact and fail when it resolves the configured default model ID. Use one canonical identifier in both places.

Suggested fix

ENV HF_HOME=/app-root/.hf-models RUN mkdir -p /app-root/.hf-models && \ - python3.12 -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('ibm-granite/granite-embedding-30m-english')" && \ + python3.12 -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('sentence-transformers/ibm-granite/granite-embedding-30m-english')" && \ chown -R 1001:1001 /app-root/.hf-models

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

ENV HF_HOME=/app-root/.hf-models

RUN mkdir -p /app-root/.hf-models && \

python3.12 -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('ibm-granite/granite-embedding-30m-english')" && \

ENV HF_HOME=/app-root/.hf-models

RUN mkdir -p /app-root/.hf-models && \

python3.12 -c "from sentence_transformers import SentenceTransformer; SentenceTransformer('sentence-transformers/ibm-granite/granite-embedding-30m-english')" && \

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deploy/lightspeed-stack/Containerfile` around lines 135 - 137, The baked Hugging Face model ID in the Containerfile does not match the runtime default used by the Solr embedding configuration. Update the model download step to use the same canonical identifier as the default in constants so the artifact baked into the image matches what runtime resolves when HF_HUB_OFFLINE=1 is set. Keep the identifiers aligned between the Containerfile model fetch and the default embedding model constant.

…path

LCORE-2802: Bake embedding model into image and make HF_HUB_OFFLINE o…

acac105

…verridable

coderabbitai Bot reviewed Jun 26, 2026

View reviewed changes

alessandralanz added 2 commits June 26, 2026 16:44

LCORE-2802: Fix HF_HOME to be build-time only, not global ENV

4484530

LCORE-2802: Remove HF_HOME pass-through that breaks default HF cache …

9a9de52

…path

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LCORE-2802: Bake embedding model into image and make HF_HUB_OFFLINE o…#2008

LCORE-2802: Bake embedding model into image and make HF_HUB_OFFLINE o…#2008
alessandralanz wants to merge 3 commits into
lightspeed-core:mainfrom
alessandralanz:embed-model-cache

alessandralanz commented Jun 26, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 26, 2026 •

edited

Loading

Review limit reached

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

alessandralanz commented Jun 26, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Tools used to create PR

Related Tickets & Documents

Checklist before requesting a review

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

alessandralanz commented Jun 26, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 26, 2026 •

edited

Loading