i40e: Fix kernel crash from PTP work queue corruption during resets by stephenchengCloud · Pull Request #33 · intel/ethernet-linux-i40e

stephenchengCloud · 2026-05-25T03:18:50Z

The driver crashes with NULL pointer dereferences in process_one_work() when PTP is enabled and the device undergoes reset operations:

[ 189646.890619] ALERT: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[ 189646.890632] INFO: PGD 0 P4D 0
[ 189646.890638] WARN: Oops: 0000 [#1] SMP NOPTI
[ 189646.890645] WARN: CPU: 14 PID: 2948135 Comm: kworker/14:3 Tainted: G O 4.19.0+1 #1
[ 189646.890654] WARN: Hardware name: FUJITSU PRIMERGY RX2530 M4/D3383-A1, BIOS V5.0.0.12 R1.64.0 for D3383-A1x 03/06/2025
[ 189646.890676] WARN: Workqueue: (null) (events)
[ 189646.890688] WARN: RIP: e030:process_one_work+0x2e/0x370
[ 189646.890694] WARN: Code: 00 41 57 41 56 41 55 45 31 ed 41 54 55 48 89 fd 53 48 89 f3 48 83 ec 08 48 8b 06 4c 8b 67 40 48 89 c2 30 d2 a8 04 4c 0f 45 ea <49> 8b 45 08 44 8b b0 00 01 00 00 41 c1 ee 05 41 83 e6 01 41 f6 44
[ 189646.890711] WARN: RSP: e02b:ffffc9004ec9fe80 EFLAGS: 00010046
[ 189646.890717] WARN: RAX: 0000000fffffffe0 RBX: ffff8882b3f535d8 RCX: ffff8882bc7a4f60
[ 189646.890725] WARN: RDX: 0000000fffffff00 RSI: ffff8882b3f535d8 RDI: ffff88820469b680
[ 189646.890733] WARN: RBP: ffff88820469b680 R08: ffff8882b55a0a00 R09: 8080808080808080
[ 189646.890741] WARN: R10: 0000000000007ff0 R11: 0000000000000000 R12: ffff8882bc7a10c0
[ 189646.890749] WARN: R13: 0000000000000000 R14: ffff88820469b680 R15: ffff88820469b6a8
[ 189646.890766] WARN: FS: 0000000000000000(0000) GS:ffff8882bc780000(0000) knlGS:0000000000000000
[ 189646.890774] WARN: CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 189646.890781] WARN: CR2: 00000000000000b0 CR3: 00000002b67a2000 CR4: 0000000000040660
[ 189646.890794] WARN: Call Trace:
[ 189646.890801] WARN: worker_thread+0x49/0x3e0
[ 189646.890809] WARN: kthread+0xf8/0x130
[ 189646.890815] WARN: ? rescuer_thread+0x310/0x310
[ 189646.890821] WARN: ? kthread_bind+0x10/0x10
[ 189646.890829] WARN: ret_from_fork+0x35/0x40

Root Cause - INIT_WORK() Called Multiple Times:

The work structure ptp_extts0_work is initialized via INIT_WORK() in i40e_ptp_set_timestamp_mode(), which is called from i40e_ptp_init() during:

Initial probe (i40e_probe -> i40e_setup_pf_switch -> i40e_ptp_init)
Every rebuild after reset (i40e_rebuild -> i40e_setup_pf_switch -> i40e_ptp_init)

When a PF reset occurs, the following race corrupts the work_struct:

i40e_rebuild() re-enables PTP interrupts via i40e_setup_misc_vector()
PTP timestamp interrupt fires immediately
- ISR schedules work: schedule_work(&pf->ptp_extts0_work)
- Work is queued with valid list head pointers
i40e_rebuild() continues -> i40e_ptp_init() is called - INIT_WORK(&pf->ptp_extts0_work, ...) overwrites list pointers - work_struct.entry.next = 0x0 (from INIT_WORK macro) - BUT work is still queued in workqueue!
worker_thread processes queued work
- Reads corrupted next pointer (0x0)
- NULL pointer dereference in process_one_work()

The hardware does not require link or traffic for this crash. A PF reset alone is sufficient, as it re-enables interrupts before software reconfiguration completes, creating a window where interrupts can fire while INIT_WORK() is about to execute.

Primary Fix - Initialize Work Once:

Move INIT_WORK(&pf->ptp_extts0_work, i40e_ptp_extts0_work) from i40e_ptp_set_timestamp_mode() to i40e_probe(). This ensures the work_struct is initialized exactly once during driver lifetime, never while work is queued or running.

Additionally, add cancel_work_sync(&pf->ptp_extts0_work) at the beginning of i40e_ptp_init(). This safely drains any pending work before PTP hardware reconfiguration during rebuilds.

Secondary Fix - Avoid Race During Driver Removal:

Move i40e_ptp_stop() to after i40e_vsi_release() in i40e_remove(). Previously, i40e_ptp_stop() ran before the VSIs were closed, which left a window where a PTP interrupt could fire after the PTP clock was unregistered:

T0: ptp_clock_unregister() called
T1: pf->ptp_clock = NULL
T2: PTP interrupt fires (still enabled in hardware)
T3: ISR schedules ptp_extts0_work
T4: Worker runs -> ptp_clock_event(NULL, ...) -> potential crash

By moving i40e_ptp_stop() after the VSI release loop, i40e_vsi_close() -> i40e_down() disables the misc interrupt vector first, eliminating this race.

However, this reordering means that by the time i40e_ptp_stop() runs, all VSIs have already been released and pf->vsi[pf->lan_vsi] is NULL. The original "removed PHC from " log message dereferenced pf->vsi[pf->lan_vsi]->netdev->name, which would now cause its own NULL pointer dereference. Drop the interface name from the log message and print only "removed PHC", since the netdev is no longer accessible at this point in the cleanup sequence.

Additionally, in i40e_ptp_stop(), move the PTP interrupt disable and add cancel_work_sync() to the beginning of the function. This provides defense in depth: PTP-specific interrupts are explicitly disabled and any pending work is drained before unregistering the PTP clock, regardless of the surrounding cleanup ordering.

Testing:

Before the fix, one of our customer's kernel crashed repeatedly after running for approximately one day under normal production workload. After applying the fix, no crashes have been observed.

The driver crashes with NULL pointer dereferences in process_one_work() when PTP is enabled and the device undergoes reset operations: [ 189646.890619] ALERT: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 [ 189646.890632] INFO: PGD 0 P4D 0 [ 189646.890638] WARN: Oops: 0000 [intel#1] SMP NOPTI [ 189646.890645] WARN: CPU: 14 PID: 2948135 Comm: kworker/14:3 Tainted: G O 4.19.0+1 intel#1 [ 189646.890654] WARN: Hardware name: FUJITSU PRIMERGY RX2530 M4/D3383-A1, BIOS V5.0.0.12 R1.64.0 for D3383-A1x 03/06/2025 [ 189646.890676] WARN: Workqueue: (null) (events) [ 189646.890688] WARN: RIP: e030:process_one_work+0x2e/0x370 [ 189646.890694] WARN: Code: 00 41 57 41 56 41 55 45 31 ed 41 54 55 48 89 fd 53 48 89 f3 48 83 ec 08 48 8b 06 4c 8b 67 40 48 89 c2 30 d2 a8 04 4c 0f 45 ea <49> 8b 45 08 44 8b b0 00 01 00 00 41 c1 ee 05 41 83 e6 01 41 f6 44 [ 189646.890711] WARN: RSP: e02b:ffffc9004ec9fe80 EFLAGS: 00010046 [ 189646.890717] WARN: RAX: 0000000fffffffe0 RBX: ffff8882b3f535d8 RCX: ffff8882bc7a4f60 [ 189646.890725] WARN: RDX: 0000000fffffff00 RSI: ffff8882b3f535d8 RDI: ffff88820469b680 [ 189646.890733] WARN: RBP: ffff88820469b680 R08: ffff8882b55a0a00 R09: 8080808080808080 [ 189646.890741] WARN: R10: 0000000000007ff0 R11: 0000000000000000 R12: ffff8882bc7a10c0 [ 189646.890749] WARN: R13: 0000000000000000 R14: ffff88820469b680 R15: ffff88820469b6a8 [ 189646.890766] WARN: FS: 0000000000000000(0000) GS:ffff8882bc780000(0000) knlGS:0000000000000000 [ 189646.890774] WARN: CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 189646.890781] WARN: CR2: 00000000000000b0 CR3: 00000002b67a2000 CR4: 0000000000040660 [ 189646.890794] WARN: Call Trace: [ 189646.890801] WARN: worker_thread+0x49/0x3e0 [ 189646.890809] WARN: kthread+0xf8/0x130 [ 189646.890815] WARN: ? rescuer_thread+0x310/0x310 [ 189646.890821] WARN: ? kthread_bind+0x10/0x10 [ 189646.890829] WARN: ret_from_fork+0x35/0x40 Root Cause - INIT_WORK() Called Multiple Times: ----------------------------------------------- The work structure ptp_extts0_work is initialized via INIT_WORK() in i40e_ptp_set_timestamp_mode(), which is called from i40e_ptp_init() during: 1. Initial probe (i40e_probe -> i40e_setup_pf_switch -> i40e_ptp_init) 2. Every rebuild after reset (i40e_rebuild -> i40e_setup_pf_switch -> i40e_ptp_init) When a PF reset occurs, the following race corrupts the work_struct: 1. i40e_rebuild() re-enables PTP interrupts via i40e_setup_misc_vector() 2. PTP timestamp interrupt fires immediately - ISR schedules work: schedule_work(&pf->ptp_extts0_work) - Work is queued with valid list head pointers 3. i40e_rebuild() continues -> i40e_ptp_init() is called - INIT_WORK(&pf->ptp_extts0_work, ...) overwrites list pointers - work_struct.entry.next = 0x0 (from INIT_WORK macro) - BUT work is still queued in workqueue! 4. worker_thread processes queued work - Reads corrupted next pointer (0x0) - NULL pointer dereference in process_one_work() The hardware does not require link or traffic for this crash. A PF reset alone is sufficient, as it re-enables interrupts before software reconfiguration completes, creating a window where interrupts can fire while INIT_WORK() is about to execute. Primary Fix - Initialize Work Once: ------------------------------------ Move INIT_WORK(&pf->ptp_extts0_work, i40e_ptp_extts0_work) from i40e_ptp_set_timestamp_mode() to i40e_probe(). This ensures the work_struct is initialized exactly once during driver lifetime, never while work is queued or running. Additionally, add cancel_work_sync(&pf->ptp_extts0_work) at the beginning of i40e_ptp_init(). This safely drains any pending work before PTP hardware reconfiguration during rebuilds. Secondary Fix - Avoid Race During Driver Removal: ------------------------------------------------- Move i40e_ptp_stop() to after i40e_vsi_release() in i40e_remove(). Previously, i40e_ptp_stop() ran before the VSIs were closed, which left a window where a PTP interrupt could fire after the PTP clock was unregistered: T0: ptp_clock_unregister() called T1: pf->ptp_clock = NULL T2: PTP interrupt fires (still enabled in hardware) T3: ISR schedules ptp_extts0_work T4: Worker runs -> ptp_clock_event(NULL, ...) -> potential crash By moving i40e_ptp_stop() after the VSI release loop, i40e_vsi_close() -> i40e_down() disables the misc interrupt vector first, eliminating this race. However, this reordering means that by the time i40e_ptp_stop() runs, all VSIs have already been released and pf->vsi[pf->lan_vsi] is NULL. The original "removed PHC from <interface>" log message dereferenced pf->vsi[pf->lan_vsi]->netdev->name, which would now cause its own NULL pointer dereference. Drop the interface name from the log message and print only "removed PHC", since the netdev is no longer accessible at this point in the cleanup sequence. Additionally, in i40e_ptp_stop(), move the PTP interrupt disable and add cancel_work_sync() to the beginning of the function. This provides defense in depth: PTP-specific interrupts are explicitly disabled and any pending work is drained before unregistering the PTP clock, regardless of the surrounding cleanup ordering. Testing: -------- Before the fix, one of our customer's kernel crashed repeatedly after running for approximately one day under normal production workload. After applying the fix, no crashes have been observed. Signed-off-by: Stephen Cheng <stephen.cheng@citrix.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

i40e: Fix kernel crash from PTP work queue corruption during resets#33

i40e: Fix kernel crash from PTP work queue corruption during resets#33
stephenchengCloud wants to merge 1 commit into
intel:mainfrom
stephenchengCloud:fix-kernel-crash-from-PTP-work-queue

stephenchengCloud commented May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

stephenchengCloud commented May 25, 2026

Root Cause - INIT_WORK() Called Multiple Times:

Primary Fix - Initialize Work Once:

Secondary Fix - Avoid Race During Driver Removal:

Testing:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant