Skip to content

fix(lcm): segfault at cleanup#1771

Draft
paul-nechifor wants to merge 3 commits intodevfrom
paul/fix/segmentation-fault-2
Draft

fix(lcm): segfault at cleanup#1771
paul-nechifor wants to merge 3 commits intodevfrom
paul/fix/segmentation-fault-2

Conversation

@paul-nechifor
Copy link
Copy Markdown
Contributor

@paul-nechifor paul-nechifor commented Apr 10, 2026

Problem

Example broken runs:

Closes DIM-XXX

Solution

Breaking Changes

How to Test

Contributor License Agreement

  • I have read and approved the CLA.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 10, 2026

Greptile Summary

This PR fixes a segfault during cleanup of LCMService caused by the main thread deleting the C++ LCM object (via stop()) while the LCM loop thread was still executing inside handle_timeout(). The fix defers LCM cleanup to the _lcm_loop finally block, so stop() only calls _cleanup_owned_lcm() directly when the thread has already exited; if join() times out and the thread is still alive, cleanup is left entirely to the loop. The new tests cover the timeout-race scenario, self-stopping from within the LCM thread, and concurrent publish/subscribe/unsubscribe operations.

Confidence Score: 5/5

Safe to merge — the race condition root cause is correctly identified and fixed, and the fix is well-tested.

The fix is targeted and correct: stop() now defers LCM cleanup to the loop thread's finally block when the thread is still alive, eliminating the race that caused the segfault. _cleanup_owned_lcm() is idempotent (guarded by self.l is not None), so double-cleanup is also safe. The new tests cover the timeout race, self-stop deadlock, and external LCM lifetime. No P0 or P1 issues found.

No files require special attention.

Important Files Changed

Filename Overview
dimos/protocol/service/lcmservice.py Core fix: added _cleanup_owned_lcm() helper and reworked stop() to avoid racing with the LCM loop thread during teardown; cleanup is deferred to the loop's finally block when the thread is still alive.
dimos/protocol/service/test_lcmservice.py Comprehensive new tests added: BlockingLCM for synchronisation, test_stop_timeout_preserves_lcm_until_loop_exits for the race-condition scenario, and test_stop_from_within_lcm_thread for self-stop deadlock prevention.
dimos/protocol/pubsub/impl/lcmpubsub.py Minor defensive guard added to publish() and subscribe() to log an error and return early when self.l is None after cleanup.

Sequence Diagram

sequenceDiagram
    participant Main as Main Thread
    participant LoopThread as LCM Loop Thread
    participant LCMObj as C++ LCM Object

    Main->>Main: _stop_event.set()
    Main->>LoopThread: thread.join(timeout)

    alt Thread exits before timeout
        LoopThread->>LoopThread: stop_event detected, exits while loop
        LoopThread->>LCMObj: _cleanup_owned_lcm() - del self.l
        LoopThread->>LoopThread: self._thread = None
        Main-->>Main: join returns, thread not alive
        Main->>Main: skip _cleanup_owned_lcm()
    else Thread still in handle_timeout when timeout expires
        Main-->>Main: join timed out, thread.is_alive() == True
        Main->>Main: SKIP _cleanup_owned_lcm() - key fix avoids segfault
        LoopThread->>LoopThread: handle_timeout returns, loop exits
        LoopThread->>LCMObj: _cleanup_owned_lcm() - del self.l
        LoopThread->>LoopThread: self._thread = None
    end
Loading

Reviews (1): Last reviewed commit: "fix(lcm): segfault at cleanup" | Re-trigger Greptile

@paul-nechifor paul-nechifor force-pushed the paul/fix/segmentation-fault-2 branch from 710f9f6 to 7fa805a Compare April 11, 2026 02:54
@paul-nechifor paul-nechifor force-pushed the paul/fix/segmentation-fault-2 branch from 7fa805a to ea6b398 Compare April 11, 2026 03:01
@leshy
Copy link
Copy Markdown
Contributor

leshy commented Apr 11, 2026

can we get a
python -m pytest -svm tool dimos/protocol/pubsub/benchmark/test_benchmark.py -k lcm
before and after? just to confirm there are no perf changes

@leshy
Copy link
Copy Markdown
Contributor

leshy commented Apr 11, 2026

2026-04-11_06-04

leshy
leshy previously approved these changes Apr 11, 2026
@leshy leshy enabled auto-merge (squash) April 11, 2026 03:17
@paul-nechifor paul-nechifor marked this pull request as draft April 11, 2026 05:12
auto-merge was automatically disabled April 11, 2026 05:12

Pull request was converted to draft

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants