Fix/system monitor by jaagut · Pull Request #875 · bit-bots/bitbots_main

jaagut · 2026-05-24T17:52:43Z

Summary

Proposed changes

Related issues

Checklist

…e consistency

Enhance GPU monitoring by integrating NVIDIA and AMD detection, updating collection methods, and adding support for nvidia-ml-py package

Copilot

Pull request overview

Improves the ROS2 system_monitor package to collect and publish more robust system workload metrics (notably GPU stats) across different hardware backends, and ensures the required Workload message is generated in bitbots_msgs.

Changes:

Add Workload.msg to bitbots_msgs interface generation.
Refactor GPU monitoring to auto-detect NVIDIA (NVML), Jetson (sysfs), and AMD (pyamdgpuinfo) backends; tighten type consistency in collectors.
Adjust sampling behavior (CPU smoothing + lower default update frequency) and add nvidia-ml-py to the Pixi environment.

Reviewed changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
src/bitbots_msgs/CMakeLists.txt	Adds `Workload.msg` to rosidl-generated interfaces so downstream nodes can publish/subscribe it.
src/bitbots_misc/system_monitor/system_monitor/network_interfaces.py	Adds return type annotations for interface collection helpers.
src/bitbots_misc/system_monitor/system_monitor/monitor.py	Updates GPU collector call signature and aligns default “disabled” tuple types; minor comment grammar fix.
src/bitbots_misc/system_monitor/system_monitor/memory.py	Adds a typed return annotation for memory stats collection.
src/bitbots_misc/system_monitor/system_monitor/gpu.py	Replaces single-backend AMD logic with auto-detected NVIDIA/Jetson/AMD backends and improved error handling/logging.
src/bitbots_misc/system_monitor/system_monitor/cpus.py	Adds EMA smoothing for CPU usage values and updates return/type annotations.
src/bitbots_misc/system_monitor/config/config.yaml	Lowers default update frequency from 10 Hz to 2 Hz.
pixi.toml	Adds `nvidia-ml-py` dependency for NVML-based monitoring.
pixi.lock	Locks `nvidia-ml-py` into all environments.
.vscode/settings.json	Adds dictionary words related to new GPU monitoring terms.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+            handle = pynvml.nvmlDeviceGetHandleByIndex(0)
+            load = float(pynvml.nvmlDeviceGetUtilizationRates(handle).gpu)
+            mem_info = pynvml.nvmlDeviceGetMemoryInfo(handle)
+            vram_used = mem_info.used
+            vram_total = mem_info.total
+            temperature = float(pynvml.nvmlDeviceGetTemperature(handle, 0))
+            return (load, vram_used, vram_total, temperature)


+            if raw_load is None:
+                continue
+            # Jetson reports GPU load in permille on current L4T kernels.
+            load = raw_load / 10.0


+    # smooth short-term sampling noise with exponential moving average
+    prev = _prev_usage[cpu_num]
+    if prev == 0.0:
+        smoothed = float(round(raw_usage, 2))
+    else:
+        smoothed = float(round((raw_usage * _EMA_ALPHA) + (prev * (1.0 - _EMA_ALPHA)), 2))
+


+def _collect_nvidia(node: Node) -> tuple[float, int, int, float]:
+    """Collect GPU metrics from NVIDIA GPU using pynvml."""
+    try:
+        import pynvml
+
+        pynvml.nvmlInit()
+        try:
+            handle = pynvml.nvmlDeviceGetHandleByIndex(0)
+            load = float(pynvml.nvmlDeviceGetUtilizationRates(handle).gpu)
+            mem_info = pynvml.nvmlDeviceGetMemoryInfo(handle)
+            vram_used = mem_info.used
+            vram_total = mem_info.total
+            temperature = float(pynvml.nvmlDeviceGetTemperature(handle, 0))
+            return (load, vram_used, vram_total, temperature)
+        finally:
+            try:
+                pynvml.nvmlShutdown()
+            except Exception:
+                pass


+    If `node` is provided the ROS node's logger will be used for messages.
+
+    node: ROS node for logging (required for backend detection and error logging)


jaagut · 2026-05-25T06:19:28Z

@copilot please implement the suggestions.

Agent-Logs-Url: https://github.com/bit-bots/bitbots_main/sessions/797bd202-b15d-4688-b2a5-48352421a742 Co-authored-by: jaagut <34797331+jaagut@users.noreply.github.com>

Copilot · 2026-05-25T06:26:03Z

@copilot please implement the suggestions.

Implemented the requested suggestions in bf629a3 (plus preceding commits in this update): GPU load is now consistently normalized to fraction semantics across backends, NVML init/handle are cached instead of per-sample init/shutdown, CPU EMA initialization no longer relies on 0.0 as a sentinel, and the GPU collector docstrings were corrected.

Refactor system monitor components for improved data handling and typ…

c6fcff2

…e consistency

github-project-automation Bot added this to Software May 24, 2026

github-project-automation Bot moved this to 🆕 New in Software May 24, 2026

Fix system monitoring (on intel system)

33366a8

Enhance GPU monitoring by integrating NVIDIA and AMD detection, updating collection methods, and adding support for nvidia-ml-py package

jaagut force-pushed the fix/system_monitor branch from b051716 to 33366a8 Compare May 24, 2026 19:12

Fix system_monitor GPU on jetson

c6842cf

jaagut marked this pull request as ready for review May 24, 2026 20:02

jaagut moved this from 🆕 New to 📋 Backlog in Software May 24, 2026

jaagut moved this from 📋 Backlog to 👀 In review in Software May 24, 2026

jaagut requested review from ChlukasX, Flova, MegaIng and Copilot May 24, 2026 20:02

Copilot started reviewing on behalf of jaagut May 24, 2026 20:03 View session

Copilot AI reviewed May 24, 2026

View reviewed changes

Copilot started work on behalf of jaagut May 25, 2026 06:19 View session

Copilot AI and others added 5 commits May 25, 2026 06:21

Fix system monitor GPU unit consistency and CPU EMA init

167b938

Agent-Logs-Url: https://github.com/bit-bots/bitbots_main/sessions/797bd202-b15d-4688-b2a5-48352421a742 Co-authored-by: jaagut <34797331+jaagut@users.noreply.github.com>

Add docstrings for GPU normalization helpers

a0fa5b3

Agent-Logs-Url: https://github.com/bit-bots/bitbots_main/sessions/797bd202-b15d-4688-b2a5-48352421a742 Co-authored-by: jaagut <34797331+jaagut@users.noreply.github.com>

Polish GPU backend globals and docstring parameter format

fbc7c96

Agent-Logs-Url: https://github.com/bit-bots/bitbots_main/sessions/797bd202-b15d-4688-b2a5-48352421a742 Co-authored-by: jaagut <34797331+jaagut@users.noreply.github.com>

Use explicit dict lookup for CPU EMA previous usage

66048af

Agent-Logs-Url: https://github.com/bit-bots/bitbots_main/sessions/797bd202-b15d-4688-b2a5-48352421a742 Co-authored-by: jaagut <34797331+jaagut@users.noreply.github.com>

Clarify defensive clamping in GPU helper docstrings

bf629a3

Agent-Logs-Url: https://github.com/bit-bots/bitbots_main/sessions/797bd202-b15d-4688-b2a5-48352421a742 Co-authored-by: jaagut <34797331+jaagut@users.noreply.github.com>

Copilot finished work on behalf of jaagut May 25, 2026 06:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/system monitor#875

Fix/system monitor#875
jaagut wants to merge 8 commits into
mainfrom
fix/system_monitor

jaagut commented May 24, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

jaagut commented May 25, 2026

Uh oh!

Copilot AI commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		If `node` is provided the ROS node's logger will be used for messages.

		node: ROS node for logging (required for backend detection and error logging)

Conversation

jaagut commented May 24, 2026

Summary

Proposed changes

Related issues

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

jaagut commented May 25, 2026

Uh oh!

Copilot AI commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants