Skip to content

Commit a96ef58

Browse files
ryanhrobkees
authored andcommitted
randomize_kstack: Unify random source across arches
Previously different architectures were using random sources of differing strength and cost to decide the random kstack offset. A number of architectures (loongarch, powerpc, s390, x86) were using their timestamp counter, at whatever the frequency happened to be. Other arches (arm64, riscv) were using entropy from the crng via get_random_u16(). There have been concerns that in some cases the timestamp counters may be too weak, because they can be easily guessed or influenced by user space. And get_random_u16() has been shown to be too costly for the level of protection kstack offset randomization provides. So let's use a common, architecture-agnostic source of entropy; a per-cpu prng, seeded at boot-time from the crng. This has a few benefits: - We can remove choose_random_kstack_offset(); That was only there to try to make the timestamp counter value a bit harder to influence from user space [*]. - The architecture code is simplified. All it has to do now is call add_random_kstack_offset() in the syscall path. - The strength of the randomness can be reasoned about independently of the architecture. - Arches previously using get_random_u16() now have much faster syscall paths, see below results. [*] Additionally, this gets rid of some redundant work on s390 and x86. Before this patch, those architectures called choose_random_kstack_offset() under arch_exit_to_user_mode_prepare(), which is also called for exception returns to userspace which were *not* syscalls (e.g. regular interrupts). Getting rid of choose_random_kstack_offset() avoids a small amount of redundant work for the non-syscall cases. In some configurations, add_random_kstack_offset() will now call instrumentable code, so for a couple of arches, I have moved the call a bit later to the first point where instrumentation is allowed. This doesn't impact the efficacy of the mechanism. There have been some claims that a prng may be less strong than the timestamp counter if not regularly reseeded. But the prng has a period of about 2^113. So as long as the prng state remains secret, it should not be possible to guess. If the prng state can be accessed, we have bigger problems. Additionally, we are only consuming 6 bits to randomize the stack, so there are only 64 possible random offsets. I assert that it would be trivial for an attacker to brute force by repeating their attack and waiting for the random stack offset to be the desired one. The prng approach seems entirely proportional to this level of protection. Performance data are provided below. The baseline is v6.18 with rndstack on for each respective arch. (I)/(R) indicate statistically significant improvement/regression. arm64 platform is AWS Graviton3 (m7g.metal). x86_64 platform is AWS Sapphire Rapids (m7i.24xlarge): +-----------------+--------------+---------------+---------------+ | Benchmark | Result Class | per-cpu-prng | per-cpu-prng | | | | arm64 (metal) | x86_64 (VM) | +=================+==============+===============+===============+ | syscall/getpid | mean (ns) | (I) -9.50% | (I) -17.65% | | | p99 (ns) | (I) -59.24% | (I) -24.41% | | | p99.9 (ns) | (I) -59.52% | (I) -28.52% | +-----------------+--------------+---------------+---------------+ | syscall/getppid | mean (ns) | (I) -9.52% | (I) -19.24% | | | p99 (ns) | (I) -59.25% | (I) -25.03% | | | p99.9 (ns) | (I) -59.50% | (I) -28.17% | +-----------------+--------------+---------------+---------------+ | syscall/invalid | mean (ns) | (I) -10.31% | (I) -18.56% | | | p99 (ns) | (I) -60.79% | (I) -20.06% | | | p99.9 (ns) | (I) -61.04% | (I) -25.04% | +-----------------+--------------+---------------+---------------+ I tested an earlier version of this change on x86 bare metal and it showed a smaller but still significant improvement. The bare metal system wasn't available this time around so testing was done in a VM instance. I'm guessing the cost of rdtsc is higher for VMs. Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Link: https://patch.msgid.link/20260303150840.3789438-3-ryan.roberts@arm.com Signed-off-by: Kees Cook <kees@kernel.org>
1 parent 37beb42 commit a96ef58

14 files changed

Lines changed: 33 additions & 115 deletions

File tree

arch/Kconfig

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1519,9 +1519,8 @@ config HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET
15191519
def_bool n
15201520
help
15211521
An arch should select this symbol if it can support kernel stack
1522-
offset randomization with calls to add_random_kstack_offset()
1523-
during syscall entry and choose_random_kstack_offset() during
1524-
syscall exit. Careful removal of -fstack-protector-strong and
1522+
offset randomization with a call to add_random_kstack_offset()
1523+
during syscall entry. Careful removal of -fstack-protector-strong and
15251524
-fstack-protector should also be applied to the entry code and
15261525
closely examined, as the artificial stack bump looks like an array
15271526
to the compiler, so it will attempt to add canary checks regardless

arch/arm64/kernel/syscall.c

Lines changed: 0 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -52,17 +52,6 @@ static void invoke_syscall(struct pt_regs *regs, unsigned int scno,
5252
}
5353

5454
syscall_set_return_value(current, regs, 0, ret);
55-
56-
/*
57-
* This value will get limited by KSTACK_OFFSET_MAX(), which is 10
58-
* bits. The actual entropy will be further reduced by the compiler
59-
* when applying stack alignment constraints: the AAPCS mandates a
60-
* 16-byte aligned SP at function boundaries, which will remove the
61-
* 4 low bits from any entropy chosen here.
62-
*
63-
* The resulting 6 bits of entropy is seen in SP[9:4].
64-
*/
65-
choose_random_kstack_offset(get_random_u16());
6655
}
6756

6857
static inline bool has_syscall_work(unsigned long flags)

arch/loongarch/kernel/syscall.c

Lines changed: 0 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -79,16 +79,5 @@ void noinstr __no_stack_protector do_syscall(struct pt_regs *regs)
7979
regs->regs[7], regs->regs[8], regs->regs[9]);
8080
}
8181

82-
/*
83-
* This value will get limited by KSTACK_OFFSET_MAX(), which is 10
84-
* bits. The actual entropy will be further reduced by the compiler
85-
* when applying stack alignment constraints: 16-bytes (i.e. 4-bits)
86-
* aligned, which will remove the 4 low bits from any entropy chosen
87-
* here.
88-
*
89-
* The resulting 6 bits of entropy is seen in SP[9:4].
90-
*/
91-
choose_random_kstack_offset(get_cycles());
92-
9382
syscall_exit_to_user_mode(regs);
9483
}

arch/powerpc/kernel/syscall.c

Lines changed: 2 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -20,8 +20,6 @@ notrace long system_call_exception(struct pt_regs *regs, unsigned long r0)
2020

2121
kuap_lock();
2222

23-
add_random_kstack_offset();
24-
2523
if (IS_ENABLED(CONFIG_PPC_IRQ_SOFT_MASK_DEBUG))
2624
BUG_ON(irq_soft_mask_return() != IRQS_ALL_DISABLED);
2725

@@ -30,6 +28,8 @@ notrace long system_call_exception(struct pt_regs *regs, unsigned long r0)
3028
CT_WARN_ON(ct_state() == CT_STATE_KERNEL);
3129
user_exit_irqoff();
3230

31+
add_random_kstack_offset();
32+
3333
BUG_ON(regs_is_unrecoverable(regs));
3434
BUG_ON(!user_mode(regs));
3535
BUG_ON(arch_irq_disabled_regs(regs));
@@ -173,17 +173,5 @@ notrace long system_call_exception(struct pt_regs *regs, unsigned long r0)
173173
}
174174
#endif
175175

176-
/*
177-
* Ultimately, this value will get limited by KSTACK_OFFSET_MAX(),
178-
* so the maximum stack offset is 1k bytes (10 bits).
179-
*
180-
* The actual entropy will be further reduced by the compiler when
181-
* applying stack alignment constraints: the powerpc architecture
182-
* may have two kinds of stack alignment (16-bytes and 8-bytes).
183-
*
184-
* So the resulting 6 or 7 bits of entropy is seen in SP[9:4] or SP[9:3].
185-
*/
186-
choose_random_kstack_offset(mftb());
187-
188176
return ret;
189177
}

arch/riscv/kernel/traps.c

Lines changed: 0 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -344,18 +344,6 @@ void do_trap_ecall_u(struct pt_regs *regs)
344344
syscall_handler(regs, syscall);
345345
}
346346

347-
/*
348-
* Ultimately, this value will get limited by KSTACK_OFFSET_MAX(),
349-
* so the maximum stack offset is 1k bytes (10 bits).
350-
*
351-
* The actual entropy will be further reduced by the compiler when
352-
* applying stack alignment constraints: 16-byte (i.e. 4-bit) aligned
353-
* for RV32I or RV64I.
354-
*
355-
* The resulting 6 bits of entropy is seen in SP[9:4].
356-
*/
357-
choose_random_kstack_offset(get_random_u16());
358-
359347
syscall_exit_to_user_mode(regs);
360348
} else {
361349
irqentry_state_t state = irqentry_nmi_enter(regs);

arch/s390/include/asm/entry-common.h

Lines changed: 0 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -51,14 +51,6 @@ static __always_inline void arch_exit_to_user_mode(void)
5151

5252
#define arch_exit_to_user_mode arch_exit_to_user_mode
5353

54-
static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs,
55-
unsigned long ti_work)
56-
{
57-
choose_random_kstack_offset(get_tod_clock_fast());
58-
}
59-
60-
#define arch_exit_to_user_mode_prepare arch_exit_to_user_mode_prepare
61-
6254
static __always_inline bool arch_in_rcu_eqs(void)
6355
{
6456
if (IS_ENABLED(CONFIG_KVM))

arch/s390/kernel/syscall.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -96,8 +96,8 @@ void noinstr __do_syscall(struct pt_regs *regs, int per_trap)
9696
{
9797
unsigned long nr;
9898

99-
add_random_kstack_offset();
10099
enter_from_user_mode(regs);
100+
add_random_kstack_offset();
101101
regs->psw = get_lowcore()->svc_old_psw;
102102
regs->int_code = get_lowcore()->svc_int_code;
103103
update_timer_sys();

arch/x86/entry/syscall_32.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -247,7 +247,6 @@ __visible noinstr void do_int80_syscall_32(struct pt_regs *regs)
247247
{
248248
int nr = syscall_32_enter(regs);
249249

250-
add_random_kstack_offset();
251250
/*
252251
* Subtlety here: if ptrace pokes something larger than 2^31-1 into
253252
* orig_ax, the int return value truncates it. This matches
@@ -256,6 +255,7 @@ __visible noinstr void do_int80_syscall_32(struct pt_regs *regs)
256255
nr = syscall_enter_from_user_mode(regs, nr);
257256
instrumentation_begin();
258257

258+
add_random_kstack_offset();
259259
do_syscall_32_irqs_on(regs, nr);
260260

261261
instrumentation_end();
@@ -268,7 +268,6 @@ static noinstr bool __do_fast_syscall_32(struct pt_regs *regs)
268268
int nr = syscall_32_enter(regs);
269269
int res;
270270

271-
add_random_kstack_offset();
272271
/*
273272
* This cannot use syscall_enter_from_user_mode() as it has to
274273
* fetch EBP before invoking any of the syscall entry work
@@ -277,6 +276,7 @@ static noinstr bool __do_fast_syscall_32(struct pt_regs *regs)
277276
enter_from_user_mode(regs);
278277

279278
instrumentation_begin();
279+
add_random_kstack_offset();
280280
local_irq_enable();
281281
/* Fetch EBP from where the vDSO stashed it. */
282282
if (IS_ENABLED(CONFIG_X86_64)) {

arch/x86/entry/syscall_64.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -86,10 +86,10 @@ static __always_inline bool do_syscall_x32(struct pt_regs *regs, int nr)
8686
/* Returns true to return using SYSRET, or false to use IRET */
8787
__visible noinstr bool do_syscall_64(struct pt_regs *regs, int nr)
8888
{
89-
add_random_kstack_offset();
9089
nr = syscall_enter_from_user_mode(regs, nr);
9190

9291
instrumentation_begin();
92+
add_random_kstack_offset();
9393

9494
if (!do_syscall_x64(regs, nr) && !do_syscall_x32(regs, nr) && nr != -1) {
9595
/* Invalid system call, but still a system call. */

arch/x86/include/asm/entry-common.h

Lines changed: 0 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -82,18 +82,6 @@ static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs,
8282
current_thread_info()->status &= ~(TS_COMPAT | TS_I386_REGS_POKED);
8383
#endif
8484

85-
/*
86-
* This value will get limited by KSTACK_OFFSET_MAX(), which is 10
87-
* bits. The actual entropy will be further reduced by the compiler
88-
* when applying stack alignment constraints (see cc_stack_align4/8 in
89-
* arch/x86/Makefile), which will remove the 3 (x86_64) or 2 (ia32)
90-
* low bits from any entropy chosen here.
91-
*
92-
* Therefore, final stack offset entropy will be 7 (x86_64) or
93-
* 8 (ia32) bits.
94-
*/
95-
choose_random_kstack_offset(rdtsc());
96-
9785
/* Avoid unnecessary reads of 'x86_ibpb_exit_to_user' */
9886
if (cpu_feature_enabled(X86_FEATURE_IBPB_EXIT_TO_USER) &&
9987
this_cpu_read(x86_ibpb_exit_to_user)) {

0 commit comments

Comments
 (0)