Skip to content

Commit 5fe0edd

Browse files
committed
Fix runtime spacy model loading for non-root containers (#26753)
* fix: run pip installs as non-root user to enable runtime model downloads When the container runs with securityContext runAsUser: 1000 but no /etc/passwd entry for that UID, Python disables user site-packages. This causes spacy.load() to fail after downloading a model via pip --user, because the installed package is invisible to the import system. Creating the openmetadata user (UID 1000, GID 1000) before pip installs solves this: all packages are installed under ~/.local at build time, so the user site-packages directory already exists when the interpreter starts. Runtime downloads (e.g. spacy models) land in the same directory and are immediately importable. * Ensure `openmetadata` user owns `/ingestion` workdir * Add local bin to PATH
1 parent 08577ec commit 5fe0edd

2 files changed

Lines changed: 26 additions & 0 deletions

File tree

ingestion/operators/docker/Dockerfile

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,19 @@ WORKDIR ingestion/
7575
# Required for Airflow DockerOperator, as we need to run the workflows from a `python main.py` command in the container.
7676
COPY ingestion/operators/docker/*.py .
7777

78+
# Create a non-root user with a writable home directory.
79+
# When the container runs with securityContext runAsUser: 1000 but no
80+
# /etc/passwd entry for that UID, Python disables user site-packages.
81+
# This causes runtime failures when tools like spacy download models
82+
# via pip --user, as the installed packages are invisible to the import
83+
# system. Creating the user ensures Python recognises UID 1000 and
84+
# enables the standard ~/.local install path.
85+
RUN groupadd -g 1000 openmetadata && useradd -m -u 1000 -g 1000 openmetadata
86+
RUN chown -R openmetadata:openmetadata /ingestion
87+
ENV HOME=/home/openmetadata
88+
ENV PATH="/home/openmetadata/.local/bin:${PATH}"
89+
USER openmetadata
90+
7891
# Disable pip cache dir
7992
# https://pip.pypa.io/en/stable/topics/caching/#avoiding-caching
8093
ENV PIP_NO_CACHE_DIR=1

ingestion/operators/docker/Dockerfile.ci

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -77,6 +77,19 @@ COPY ingestion/ .
7777
COPY openmetadata-spec /openmetadata-spec
7878
COPY scripts/datamodel_generation.py /scripts/datamodel_generation.py
7979

80+
# Create a non-root user with a writable home directory.
81+
# When the container runs with securityContext runAsUser: 1000 but no
82+
# /etc/passwd entry for that UID, Python disables user site-packages.
83+
# This causes runtime failures when tools like spacy download models
84+
# via pip --user, as the installed packages are invisible to the import
85+
# system. Creating the user ensures Python recognises UID 1000 and
86+
# enables the standard ~/.local install path.
87+
RUN groupadd -g 1000 openmetadata && useradd -m -u 1000 -g 1000 openmetadata
88+
RUN chown -R openmetadata:openmetadata /ingestion
89+
ENV HOME=/home/openmetadata
90+
ENV PATH="/home/openmetadata/.local/bin:${PATH}"
91+
USER openmetadata
92+
8093
# Disable pip cache dir
8194
# https://pip.pypa.io/en/stable/topics/caching/#avoiding-caching
8295
ENV PIP_NO_CACHE_DIR=1

0 commit comments

Comments
 (0)