bugfix: prevent SIGSEGV in event timer rbtree during worker shutdown#2497
Merged
zhuizhuhaomeng merged 1 commit intoopenresty:masterfrom May 7, 2026
Merged
Conversation
When a worker exits, ngx_worker_process_exit() calls
ngx_destroy_pool(cycle->pool), which fires the cycle's cleanup
handlers. One of these is ngx_http_lua_cleanup_vm, which calls
lua_close() on the worker's Lua VM. lua_close() runs LuaJIT GC
finalizers (__gc) on every live userdata, including TCP socket
cosocket userdata still bound to in-flight requests.
The __gc finalizers ngx_http_lua_socket_tcp_upstream_destroy and
ngx_http_lua_socket_downstream_destroy reach
ngx_http_lua_socket_tcp_finalize_{read,write}_part, which call
ngx_del_timer() on connection events. By that point the event
timer rbtree may already have been partially torn down by sibling
finalizers, so ngx_del_timer() corrupts the tree and the next
deletion segfaults inside ngx_rbtree_min / ngx_rbtree_delete.
Skip the cleanup when ngx_terminate or ngx_exiting is set: the
cycle pool is about to be destroyed and the kernel reclaims fds,
so unwinding events and timers from the __gc path is unnecessary
and unsafe. The same guard is already used in
ngx_http_lua_socket_tcp_setkeepalive() for the analogous case.
Reproduces under SIGTERM and SIGQUIT shutdown when there are
in-flight ngx.socket.tcp() / ngx.req.socket() coroutines, e.g.
long-lived ngx.sleep + tcpsock:receive() loops.
Crash signature:
ngx_rbtree_min src/core/ngx_rbtree.h
ngx_rbtree_delete src/core/ngx_rbtree.c
ngx_event_del_timer
ngx_http_lua_socket_tcp_finalize_read_part
ngx_http_lua_socket_tcp_finalize
ngx_http_lua_socket_tcp_cleanup
ngx_http_lua_socket_tcp_upstream_destroy (Lua __gc)
... LuaJIT GC sweep ...
ngx_http_lua_cleanup_vm
ngx_destroy_pool(cycle->pool)
ngx_worker_process_exit
zhuizhuhaomeng
approved these changes
May 7, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Worker shutdown can SIGSEGV inside
ngx_rbtree_min/ngx_rbtree_deletewhen there are in-flight cosocket userdata at the timelua_close()runs.ngx_worker_process_exit()callsngx_destroy_pool(cycle->pool), which fires the cycle's cleanup handlers. One of these isngx_http_lua_cleanup_vm, which callslua_close()on the worker's Lua VM.lua_close()runs LuaJIT GC finalizers (__gc) on every live userdata, including TCP socket cosocket userdata still bound to in-flight requests.The
__gcfinalizersngx_http_lua_socket_tcp_upstream_destroyandngx_http_lua_socket_downstream_destroyreachngx_http_lua_socket_tcp_finalize_{read,write}_part, which callngx_del_timer()on connection events. By that point the event timer rbtree may already have been partially torn down by sibling finalizers, songx_del_timer()corrupts the tree and the next deletion segfaults insidengx_rbtree_min/ngx_rbtree_delete.Fix
Skip the cleanup when
ngx_terminateorngx_exitingis set: the cycle pool is about to be destroyed and the kernel reclaims fds, so unwinding events and timers from the__gcpath is unnecessary and unsafe.The same guard is already used in
ngx_http_lua_socket_tcp_setkeepalive()for the analogous case (cosocket put into the pool while the worker is shutting down).Reproducer
Reproduces under SIGTERM and SIGQUIT shutdown when there are in-flight
ngx.socket.tcp()/ngx.req.socket()coroutines — e.g. long-livedngx.sleep+tcpsock:receive()loops.Crash signature
ngx_http_lua_socket_downstream_destroyproduces the same chain via the request-socket finalizer instead of the upstream finalizer.Test plan
upstream/master(bbace592).nginx -s stop/nginx -s quitwith active cosockets — no SEGV.