Fix gunicorn "Control server error" on kubernetes#7591
Fix gunicorn "Control server error" on kubernetes#7591
Conversation
…ling updates gunicorn 23.x introduced a control socket (gunicornc) that defaults to gunicorn.ctl relative to the working directory. Since pulpcore-content sets its CWD to WORKING_DIRECTORY (/var/lib/pulp/tmp by default), the socket lands on the shared PVC and persists across pod restarts, causing Permission denied when a new pod tries to recreate it during a rolling update. Default to /tmp/pulpcore-content.ctl, which is pod-local ephemeral storage. Users who want a different path can override via gunicorn.conf.py. Assisted-by: Claude Code Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> fixes: pulp#7574
c41bf04 to
d79792c
Compare
| # On k8s, the default location may persist across restarts and cause permission errors | ||
| # See: <https://github.com/pulp/pulpcore/issues/7574> |
There was a problem hiding this comment.
What was the default location?
This file belongs in /run/ somewhere.
This directory contains system information data describing the system since it was booted. Files under this directory must be cleared (removed or truncated as appropriate) at the beginning of the boot process.
[...]
System programs that maintain transient UNIX-domain sockets must place them in this [/run] directory or an appropriate subdirectory as outlined above.
https://refspecs.linuxfoundation.org/FHS_3.0/fhs/ch03s15.html
There was a problem hiding this comment.
It defaults to the current directory: https://gunicorn.org/guides/gunicornc/#start-gunicorn-with-control-socket
I wonder why.
There was a problem hiding this comment.
Hmmm, the code apparently says something else:
https://github.com/benoitc/gunicorn/blob/9aa54703f4950818aed538dbee9578e868375cc9/gunicorn/config.py#L3138-L3146
There was a problem hiding this comment.
Ok, it was changed in 25.2: benoitc/gunicorn@0ad47db
I guess we should not do anything, then.
This change was to improve on gunicorn's default (of version 25.1), but they have improved by themselves.
There was a problem hiding this comment.
Or do you think it's still worth it, to account for the case 25.1 is installed?
There was a problem hiding this comment.
So what I understand is that gunicorn tries XDG_RUNTIME_DIR first and falls back to HOME.
I would claim that the variable XDG_RUNTIME_DIR should have been set. Not sure if the os in the container or the container runtime is to blame, but the default gunicorn behaviour seems sound to me and your change makes that unnecessarily rigid.
We should probably propagate the option instead so it stays possible to overwrite it.
There was a problem hiding this comment.
We should probably propagate the option instead so it stays possible to overwrite it.
That sounds good.
There was a problem hiding this comment.
So what I understand is that gunicorn tries XDG_RUNTIME_DIR first and falls back to HOME.
The first release of the feature the default was the current directory (whathever that was...). In the following Y they've changed to this, which I agree is sane.
gunicorn 25.1.0 introduced a control socket (gunicornc) that defaults to
gunicorn.ctlrelative to the working directory. Since pulpcore-content sets its CWD to WORKING_DIRECTORY (/var/lib/pulp/tmpby default), the socket lands on the shared PVC and persists across pod restarts, causing Permission denied when a new pod tries to recreate it during a rolling update.Default to /tmp/pulpcore-content.ctl, which is pod-local ephemeral storage. Users who want a different path can override via gunicorn.conf.py.
fixes: #7574
Assisted-by: Claude Code
📜 Checklist
See: Pull Request Walkthrough