Skip to content

Commit b800058

Browse files
Peter Zijlstraingomolnar
authored andcommitted
perf/x86/intel: Cure bogus unwind from PEBS entries
Vince Weaver reported that perf_fuzzer + KASAN detects that PEBS event unwinds sometimes do 'weird' things. In particular, we seemed to be ending up unwinding from random places on the NMI stack. While it was somewhat expected that the event record BP,SP would not match the interrupt BP,SP in that the interrupt is strictly later than the record event, it was overlooked that it could be on an already overwritten stack. Therefore, don't copy the recorded BP,SP over the interrupted BP,SP when we need stack unwinds. Note that its still possible the unwind doesn't full match the actual event, as its entirely possible to have done an (I)RET between record and interrupt, but on average it should still point in the general direction of where the event came from. Also, it's the best we can do, considering. The particular scenario that triggered the bogus NMI stack unwind was a PEBS event with very short period, upon enabling the event at the tail of the PMI handler (FREEZE_ON_PMI is not used), it instantly triggers a record (while still on the NMI stack) which in turn triggers the next PMI. This then causes back-to-back NMIs and we'll try and unwind the stack-frame from the last NMI, which obviously is now overwritten by our own. Analyzed-by: Josh Poimboeuf <jpoimboe@redhat.com> Reported-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@gmail.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: davej@codemonkey.org.uk <davej@codemonkey.org.uk> Cc: dvyukov@google.com <dvyukov@google.com> Cc: stable@vger.kernel.org Fixes: ca03770 ("perf, x86: Add PEBS infrastructure") Link: http://lkml.kernel.org/r/20161117171731.GV3157@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
1 parent ae31fe5 commit b800058

2 files changed

Lines changed: 24 additions & 13 deletions

File tree

arch/x86/events/intel/ds.c

Lines changed: 23 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1108,20 +1108,20 @@ static void setup_pebs_sample_data(struct perf_event *event,
11081108
}
11091109

11101110
/*
1111-
* We use the interrupt regs as a base because the PEBS record
1112-
* does not contain a full regs set, specifically it seems to
1113-
* lack segment descriptors, which get used by things like
1114-
* user_mode().
1111+
* We use the interrupt regs as a base because the PEBS record does not
1112+
* contain a full regs set, specifically it seems to lack segment
1113+
* descriptors, which get used by things like user_mode().
11151114
*
1116-
* In the simple case fix up only the IP and BP,SP regs, for
1117-
* PERF_SAMPLE_IP and PERF_SAMPLE_CALLCHAIN to function properly.
1118-
* A possible PERF_SAMPLE_REGS will have to transfer all regs.
1115+
* In the simple case fix up only the IP for PERF_SAMPLE_IP.
1116+
*
1117+
* We must however always use BP,SP from iregs for the unwinder to stay
1118+
* sane; the record BP,SP can point into thin air when the record is
1119+
* from a previous PMI context or an (I)RET happend between the record
1120+
* and PMI.
11191121
*/
11201122
*regs = *iregs;
11211123
regs->flags = pebs->flags;
11221124
set_linear_ip(regs, pebs->ip);
1123-
regs->bp = pebs->bp;
1124-
regs->sp = pebs->sp;
11251125

11261126
if (sample_type & PERF_SAMPLE_REGS_INTR) {
11271127
regs->ax = pebs->ax;
@@ -1130,10 +1130,21 @@ static void setup_pebs_sample_data(struct perf_event *event,
11301130
regs->dx = pebs->dx;
11311131
regs->si = pebs->si;
11321132
regs->di = pebs->di;
1133-
regs->bp = pebs->bp;
1134-
regs->sp = pebs->sp;
11351133

1136-
regs->flags = pebs->flags;
1134+
/*
1135+
* Per the above; only set BP,SP if we don't need callchains.
1136+
*
1137+
* XXX: does this make sense?
1138+
*/
1139+
if (!(sample_type & PERF_SAMPLE_CALLCHAIN)) {
1140+
regs->bp = pebs->bp;
1141+
regs->sp = pebs->sp;
1142+
}
1143+
1144+
/*
1145+
* Preserve PERF_EFLAGS_VM from set_linear_ip().
1146+
*/
1147+
regs->flags = pebs->flags | (regs->flags & PERF_EFLAGS_VM);
11371148
#ifndef CONFIG_X86_32
11381149
regs->r8 = pebs->r8;
11391150
regs->r9 = pebs->r9;

arch/x86/events/perf_event.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -113,7 +113,7 @@ struct debug_store {
113113
* Per register state.
114114
*/
115115
struct er_account {
116-
raw_spinlock_t lock; /* per-core: protect structure */
116+
raw_spinlock_t lock; /* per-core: protect structure */
117117
u64 config; /* extra MSR config */
118118
u64 reg; /* extra MSR number */
119119
atomic_t ref; /* reference count */

0 commit comments

Comments
 (0)