From 2a5adc5971423b3449938c36fe3f59e2ca198a21 Mon Sep 17 00:00:00 2001 From: Mykhailo Chalyi Date: Wed, 10 Jun 2026 01:27:14 +0000 Subject: [PATCH 1/2] fix(python): keep private-loop worker off Python during interpreter exit MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The python.yml Examples job crashed flaky (SIGABRT, ~25% of runs) in langgraph_async_tool.py at process exit: the bashkit-py-loop worker thread wakes when the engine is gc'd — commonly inside Py_Finalize — and called Python::attach to close its asyncio loop. Attaching a fresh thread state during finalization fatals CPython with 'PyGILState_Release: thread state must be current when releasing'. Python::try_attach does not help: its finalization check is compiled only for Python >= 3.13 and Py_IsInitialized() still returns 1 during Py_FinalizeEx's GC on older versions (verified via core dumps). The worker exit path no longer touches Python at all: the loop's Py ref is dropped unattached (pyo3 defers the decref) and the loop is closed by asyncio's BaseEventLoop.__del__ when the decref runs, or reclaimed by the OS at process exit. Documented as TM-PY-030 variant (3). Verified: example aborted 6/30 runs before, 0/80 across two stress runs after; full bashkit-python pytest suite passes (700 passed, 1 skipped). --- crates/bashkit-python/src/lib.rs | 13 ++++++++++--- specs/threat-model.md | 11 +++++++++-- 2 files changed, 19 insertions(+), 5 deletions(-) diff --git a/crates/bashkit-python/src/lib.rs b/crates/bashkit-python/src/lib.rs index 7f744378..75e70564 100644 --- a/crates/bashkit-python/src/lib.rs +++ b/crates/bashkit-python/src/lib.rs @@ -2249,9 +2249,16 @@ impl PyPrivateAsyncLoop { let _ = item.result_tx.send(result); } - Python::attach(|py| { - let _ = event_loop.bind(py).call_method0("close"); - }); + // THREAT[TM-PY-030]: do NOT touch Python on the exit path. + // The worker wakes here because the engine was gc'd, and that + // gc commonly runs inside Py_Finalize — attaching then crashes + // CPython (PyGILState_Release fatal: SIGABRT at interpreter + // exit). Even Python::try_attach cannot detect finalization + // before 3.13. Dropping `event_loop` without attaching is safe + // (pyo3 defers the decref); the loop is closed by asyncio's + // BaseEventLoop.__del__ when the deferred decref runs, or + // reclaimed by the OS at process exit. + drop(event_loop); }) .map_err(|e| { PyRuntimeError::new_err(format!("failed to spawn private loop thread: {e}")) diff --git a/specs/threat-model.md b/specs/threat-model.md index 35474cfe..9b14212b 100644 --- a/specs/threat-model.md +++ b/specs/threat-model.md @@ -1767,9 +1767,16 @@ and receive now both run inside `py.detach(...)`. (2) Pyclass dealloc runs attac and dropped the last `Arc`; tokio's default `Runtime::drop` joins in-flight blocking tasks, and an abandoned (timed-out) callback task must re-attach to finish — freezing the entire interpreter. The `PyRuntime` handle now shuts the runtime down -with `shutdown_background()` on last drop. Regression tests: +with `shutdown_background()` on last drop. (3) The private-loop worker thread called +`Python::attach` on its exit path to close its asyncio loop; the worker usually wakes +because the engine was gc'd, and that gc commonly runs inside `Py_Finalize` — +attaching during finalization fatals CPython (`PyGILState_Release`, SIGABRT at +interpreter exit; `Python::try_attach` cannot detect finalization before 3.13). The +worker exit path no longer touches Python: the loop's `Py` ref is dropped unattached +(deferred decref) and the loop is closed by `BaseEventLoop.__del__`. Regression tests: `tests/test_async_callbacks.py::test_async_callback_execute_sync_honors_timeout`, -`…::test_dealloc_during_inflight_callback_does_not_deadlock`. +`…::test_dealloc_during_inflight_callback_does_not_deadlock`; variant (3) is covered +by the `langgraph_async_tool.py` example run in the Python CI Examples job. | TM-PY-029 | Host clock information disclosure | `datetime.date.today()` / `datetime.datetime.now()` expose host system time and timezone | Intentional — required for correct datetime semantics | **ACCEPTED** | From 499a94c508d3a97a5026641134fe5cfe2c24c126 Mon Sep 17 00:00:00 2001 From: Mykhailo Chalyi Date: Wed, 10 Jun 2026 01:31:31 +0000 Subject: [PATCH 2/2] chore(specs): sync TM-PY-030 summary row with variant (3) Review feedback: the table row only described the two deadlock variants while the paragraph below documents the interpreter-exit SIGABRT too. --- specs/threat-model.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/specs/threat-model.md b/specs/threat-model.md index 9b14212b..4ac136ea 100644 --- a/specs/threat-model.md +++ b/specs/threat-model.md @@ -1745,7 +1745,7 @@ caller's GIL hold. | TM-PY-026 | reset() discards security config | `BashTool.reset()` creates new `Bash` with bare builder, dropping all configured limits | `PyBash::reset` and `BashTool::reset` rebuild via `replace_live_bash_with_builder` + `build_live_builder`, which preserves the original limits, env, and registered builtins | **MITIGATED** | | TM-PY-027 | Unbounded recursion in JSON conversion | `py_to_json`/`json_to_py` recurse without depth limit on nested dicts/lists | `json_to_py_inner`, `py_to_json_inner`, and the MontyObject converters all carry a `depth` arg; depth > `MAX_NESTING_DEPTH = 64` raises `ValueError("… nesting depth exceeds maximum of 64")` | **MITIGATED** | -| TM-PY-030 | GIL deadlock via async-callback private loop | Private-loop dispatch blocked on a rendezvous channel while attached (GIL held), and pyclass dealloc joined in-flight blocking tasks that must re-attach to finish — either froze the whole process (observed as a 6 h CI hang) | Dispatch detaches around both the send and the receive; `PyRuntime` drop shuts the tokio runtime down with `shutdown_background()` instead of a blocking join | **MITIGATED** | +| TM-PY-030 | GIL deadlock / exit crash via async-callback private loop | Private-loop dispatch blocked on a rendezvous channel while attached (GIL held); pyclass dealloc joined in-flight blocking tasks that must re-attach to finish (froze the whole process, observed as a 6 h CI hang); worker thread attached during interpreter finalization to close its loop (SIGABRT at process exit) | Dispatch detaches around both the send and the receive; `PyRuntime` drop shuts the tokio runtime down with `shutdown_background()` instead of a blocking join; worker exit path never touches Python (loop closed via `BaseEventLoop.__del__`) | **MITIGATED** | **TM-PY-026** (mitigated): `PyBash::reset` and `BashTool::reset` (`crates/bashkit-python/src/lib.rs`) rebuild the inner `Bash` via `replace_live_bash_with_builder` + `build_live_builder`, which