Skip to content

Commit 61221d0

Browse files
kaihuanghansendc
authored andcommitted
KVM/TDX: Explicitly do WBINVD when no more TDX SEAMCALLs
On TDX platforms, during kexec, the kernel needs to make sure there are no dirty cachelines of TDX private memory before booting to the new kernel to avoid silent memory corruption to the new kernel. To do this, the kernel has a percpu boolean to indicate whether the cache of a CPU may be in incoherent state. During kexec, namely in stop_this_cpu(), the kernel does WBINVD if that percpu boolean is true. TDX turns on that percpu boolean on a CPU when the kernel does SEAMCALL, Thus making sure the cache will be flushed during kexec. However, kexec has a race condition that, while remaining extremely rare, would be more likely in the presence of a relatively long operation such as WBINVD. In particular, the kexec-ing CPU invokes native_stop_other_cpus() to stop all remote CPUs before booting to the new kernel. native_stop_other_cpus() then sends a REBOOT vector IPI to remote CPUs and waits for them to stop; if that times out, it also sends NMIs to the still-alive CPUs and waits again for them to stop. If the race happens, kexec proceeds before all CPUs have processed the NMI and stopped[1], and the system hangs. But after tdx_disable_virtualization_cpu(), no more TDX activity can happen on this cpu. When kexec is enabled, flush the cache explicitly at that point; this moves the WBINVD to an earlier stage than stop_this_cpus(), avoiding a possibly lengthy operation at a time where it could cause this race. [1] https://lore.kernel.org/kvm/b963fcd60abe26c7ec5dc20b42f1a2ebbcc72397.1750934177.git.kai.huang@intel.com/ [Make the new function a stub for !CONFIG_KEXEC_CORE. - Paolo] Signed-off-by: Kai Huang <kai.huang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Tested-by: Farrah Chen <farrah.chen@intel.com> Link: https://lore.kernel.org/all/20250901160930.1785244-8-pbonzini%40redhat.com
1 parent 5f9b5bd commit 61221d0

3 files changed

Lines changed: 35 additions & 0 deletions

File tree

arch/x86/include/asm/tdx.h

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -228,5 +228,11 @@ static inline const char *tdx_dump_mce_info(struct mce *m) { return NULL; }
228228
static inline const struct tdx_sys_info *tdx_get_sysinfo(void) { return NULL; }
229229
#endif /* CONFIG_INTEL_TDX_HOST */
230230

231+
#ifdef CONFIG_KEXEC_CORE
232+
void tdx_cpu_flush_cache_for_kexec(void);
233+
#else
234+
static inline void tdx_cpu_flush_cache_for_kexec(void) { }
235+
#endif
236+
231237
#endif /* !__ASSEMBLER__ */
232238
#endif /* _ASM_X86_TDX_H */

arch/x86/kvm/vmx/tdx.c

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -423,6 +423,16 @@ void tdx_disable_virtualization_cpu(void)
423423
tdx_flush_vp(&arg);
424424
}
425425
local_irq_restore(flags);
426+
427+
/*
428+
* Flush cache now if kexec is possible: this is necessary to avoid
429+
* having dirty private memory cachelines when the new kernel boots,
430+
* but WBINVD is a relatively expensive operation and doing it during
431+
* kexec can exacerbate races in native_stop_other_cpus(). Do it
432+
* now, since this is a safe moment and there is going to be no more
433+
* TDX activity on this CPU from this point on.
434+
*/
435+
tdx_cpu_flush_cache_for_kexec();
426436
}
427437

428438
#define TDX_SEAMCALL_RETRIES 10000

arch/x86/virt/vmx/tdx/tdx.c

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1872,3 +1872,22 @@ u64 tdh_phymem_page_wbinvd_hkid(u64 hkid, struct page *page)
18721872
return seamcall(TDH_PHYMEM_PAGE_WBINVD, &args);
18731873
}
18741874
EXPORT_SYMBOL_GPL(tdh_phymem_page_wbinvd_hkid);
1875+
1876+
#ifdef CONFIG_KEXEC_CORE
1877+
void tdx_cpu_flush_cache_for_kexec(void)
1878+
{
1879+
lockdep_assert_preemption_disabled();
1880+
1881+
if (!this_cpu_read(cache_state_incoherent))
1882+
return;
1883+
1884+
/*
1885+
* Private memory cachelines need to be clean at the time of
1886+
* kexec. Write them back now, as the caller promises that
1887+
* there should be no more SEAMCALLs on this CPU.
1888+
*/
1889+
wbinvd();
1890+
this_cpu_write(cache_state_incoherent, false);
1891+
}
1892+
EXPORT_SYMBOL_GPL(tdx_cpu_flush_cache_for_kexec);
1893+
#endif

0 commit comments

Comments
 (0)