Skip to content

Commit e4bf304

Browse files
committed
Merge tag 'trace-ringbuffer-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull ring-buffer updates from Steven Rostedt: - Add remote buffers for pKVM pKVM has a hypervisor component that is used to protect the guest from the host kernel. This hypervisor is a black box to the kernel as the kernel is to user space. The remote buffers are used to have a memory mapping between the hypervisor and the kernel where kernel may send commands to enable tracing within the hypervisor. Then the kernel will read this memory mapping just like user space can read the memory mapped ring buffer of the kernel tracing system. Since the hypervisor only has a single context, it doesn't need to worry about races between normal context, interrupt context and NMIs like the kernel does. The ring buffer it uses doesn't need to be as complex. The remote buffers are a simple version of the ring buffer that works in a single context. They are still per-CPU and use sub buffers. The data layout is the same as the kernel's ring buffer to share the same parsing. Currently, only ARM64 implements pKVM, but there's work to implement it also in x86. The remote buffer code is separated out from the ARM implementation so that it can be used in the future by x86. The ARM64 updates for pKVM is in the ARM/KVM tree and it merged in the remote buffers of this tree. - Make the backup instance non reusable The backup instance is a copy of the persistent ring buffer so that the persistent ring buffer could start recording again without using the data from the previous boot. The backup isn't for normal tracing. It is made read-only, and after it is consumed, it is automatically removed. - Have backup copy persistent instance before it starts recording To allow the persistent ring buffer to start recording from the kernel command line commands, move the copy of the backup instance to before the the command line options start recording. - Report header_page overwrite field as "char" and not "int' The rust parser of the header_page file was triggering a warning when it defined the overwrite variable as "int" but it was only a single byte in size. - Fix memory barriers for the trace_buffer CPU mask When a CPU comes online, the bit is set to allow readers to know that the CPU buffer is allocated. The bit is set after the allocation is done, and a smp_wmb() is performed after the allocation and before the setting of the bit. But instead of adding a smp_rmb() to all readers, since once a buffer is created for a CPU it is not deleted if that CPU goes offline, so this allocation is almost always done at boot up before any readers exist. If for the unlikely case where a CPU comes online for the first time after the system boot has finished, send an IPI to all CPUs to force the smp_rmb() for each CPU. - Show clock function being used in debugging ring buffer data When the ring buffer checks are enabled and the ring buffer detects an inconsistency in the times of the invents, print out the clock being used when the error occurred. There was a very hard to hit bug that would happen every so often and it ended up being only triggered when the jiffies clock was being used. If the bug showed the clock being used, it would have been much easier to find the problem (which was an internal function was being traced which caused the clock accounting to go off). * tag 'trace-ringbuffer-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (26 commits) ring-buffer: Prevent off-by-one array access in ring_buffer_desc_page() ring-buffer: Report header_page overwrite as char tracing: Allow backup to save persistent ring buffer before it starts tracing/Documentation: Add a section about backup instance tracing: Remove the backup instance automatically after read tracing: Make the backup instance non-reusable ring-buffer: Enforce read ordering of trace_buffer cpumask and buffers ring-buffer: Show what clock function is used on timestamp errors tracing: Check for undefined symbols in simple_ring_buffer tracing: load/unload page callbacks for simple_ring_buffer Documentation: tracing: Add tracing remotes tracing: selftests: Add trace remote tests tracing: Add a trace remote module for testing tracing: Introduce simple_ring_buffer ring-buffer: Export buffer_data_page and macros tracing: Add helpers to create trace remote events tracing: Add events/ root files to trace remotes tracing: Add events to trace remotes tracing: Add init callback to trace remotes tracing: Add non-consuming read to trace remotes ...
2 parents 1521829 + 6170922 commit e4bf304

28 files changed

Lines changed: 3654 additions & 136 deletions

Documentation/trace/debugging.rst

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -159,3 +159,22 @@ If setting it from the kernel command line, it is recommended to also
159159
disable tracing with the "traceoff" flag, and enable tracing after boot up.
160160
Otherwise the trace from the most recent boot will be mixed with the trace
161161
from the previous boot, and may make it confusing to read.
162+
163+
Using a backup instance for keeping previous boot data
164+
------------------------------------------------------
165+
166+
It is also possible to record trace data at system boot time by specifying
167+
events with the persistent ring buffer, but in this case the data before the
168+
reboot will be lost before it can be read. This problem can be solved by a
169+
backup instance. From the kernel command line::
170+
171+
reserve_mem=12M:4096:trace trace_instance=boot_map@trace,sched,irq trace_instance=backup=boot_map
172+
173+
On boot up, the previous data in the "boot_map" is copied to the "backup"
174+
instance, and the "sched:*" and "irq:*" events for the current boot are traced
175+
in the "boot_map". Thus the user can read the previous boot data from the "backup"
176+
instance without stopping the trace.
177+
178+
Note that this "backup" instance is readonly, and will be removed automatically
179+
if you clear the trace data or read out all trace data from the "trace_pipe"
180+
or the "trace_pipe_raw" files.

Documentation/trace/index.rst

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,17 @@ interactions.
9292
user_events
9393
uprobetracer
9494

95+
Remote Tracing
96+
--------------
97+
98+
This section covers the framework to read compatible ring-buffers, written by
99+
entities outside of the kernel (most likely firmware or hypervisor)
100+
101+
.. toctree::
102+
:maxdepth: 1
103+
104+
remotes
105+
95106
Additional Resources
96107
--------------------
97108

Documentation/trace/remotes.rst

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
.. SPDX-License-Identifier: GPL-2.0
2+
3+
===============
4+
Tracing Remotes
5+
===============
6+
7+
:Author: Vincent Donnefort <vdonnefort@google.com>
8+
9+
Overview
10+
========
11+
Firmware and hypervisors are black boxes to the kernel. Having a way to see what
12+
they are doing can be useful to debug both. This is where remote tracing buffers
13+
come in. A remote tracing buffer is a ring buffer executed by the firmware or
14+
hypervisor into memory that is memory mapped to the host kernel. This is similar
15+
to how user space memory maps the kernel ring buffer but in this case the kernel
16+
is acting like user space and the firmware or hypervisor is the "kernel" side.
17+
With a trace remote ring buffer, the firmware and hypervisor can record events
18+
for which the host kernel can see and expose to user space.
19+
20+
Register a remote
21+
=================
22+
A remote must provide a set of callbacks `struct trace_remote_callbacks` whom
23+
description can be found below. Those callbacks allows Tracefs to enable and
24+
disable tracing and events, to load and unload a tracing buffer (a set of
25+
ring-buffers) and to swap a reader page with the head page, which enables
26+
consuming reading.
27+
28+
.. kernel-doc:: include/linux/trace_remote.h
29+
30+
Once registered, an instance will appear for this remote in the Tracefs
31+
directory **remotes/**. Buffers can then be read using the usual Tracefs files
32+
**trace_pipe** and **trace**.
33+
34+
Declare a remote event
35+
======================
36+
Macros are provided to ease the declaration of remote events, in a similar
37+
fashion to in-kernel events. A declaration must provide an ID, a description of
38+
the event arguments and how to print the event:
39+
40+
.. code-block:: c
41+
42+
REMOTE_EVENT(foo, EVENT_FOO_ID,
43+
RE_STRUCT(
44+
re_field(u64, bar)
45+
),
46+
RE_PRINTK("bar=%lld", __entry->bar)
47+
);
48+
49+
Then those events must be declared in a C file with the following:
50+
51+
.. code-block:: c
52+
53+
#define REMOTE_EVENT_INCLUDE_FILE foo_events.h
54+
#include <trace/define_remote_events.h>
55+
56+
This will provide a `struct remote_event remote_event_foo` that can be given to
57+
`trace_remote_register`.
58+
59+
Registered events appear in the remote directory under **events/**.
60+
61+
Simple ring-buffer
62+
==================
63+
A simple implementation for a ring-buffer writer can be found in
64+
kernel/trace/simple_ring_buffer.c.
65+
66+
.. kernel-doc:: include/linux/simple_ring_buffer.h

fs/tracefs/inode.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -664,6 +664,7 @@ struct dentry *tracefs_create_file(const char *name, umode_t mode,
664664
fsnotify_create(d_inode(dentry->d_parent), dentry);
665665
return tracefs_end_creating(dentry);
666666
}
667+
EXPORT_SYMBOL_GPL(tracefs_create_file);
667668

668669
static struct dentry *__create_dir(const char *name, struct dentry *parent,
669670
const struct inode_operations *ops)

include/linux/ring_buffer.h

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -251,4 +251,62 @@ int ring_buffer_map(struct trace_buffer *buffer, int cpu,
251251
void ring_buffer_map_dup(struct trace_buffer *buffer, int cpu);
252252
int ring_buffer_unmap(struct trace_buffer *buffer, int cpu);
253253
int ring_buffer_map_get_reader(struct trace_buffer *buffer, int cpu);
254+
255+
struct ring_buffer_desc {
256+
int cpu;
257+
unsigned int nr_page_va; /* excludes the meta page */
258+
unsigned long meta_va;
259+
unsigned long page_va[] __counted_by(nr_page_va);
260+
};
261+
262+
struct trace_buffer_desc {
263+
int nr_cpus;
264+
size_t struct_len;
265+
char __data[]; /* list of ring_buffer_desc */
266+
};
267+
268+
static inline struct ring_buffer_desc *__next_ring_buffer_desc(struct ring_buffer_desc *desc)
269+
{
270+
size_t len = struct_size(desc, page_va, desc->nr_page_va);
271+
272+
return (struct ring_buffer_desc *)((void *)desc + len);
273+
}
274+
275+
static inline struct ring_buffer_desc *__first_ring_buffer_desc(struct trace_buffer_desc *desc)
276+
{
277+
return (struct ring_buffer_desc *)(&desc->__data[0]);
278+
}
279+
280+
static inline size_t trace_buffer_desc_size(size_t buffer_size, unsigned int nr_cpus)
281+
{
282+
unsigned int nr_pages = max(DIV_ROUND_UP(buffer_size, PAGE_SIZE), 2UL) + 1;
283+
struct ring_buffer_desc *rbdesc;
284+
285+
return size_add(offsetof(struct trace_buffer_desc, __data),
286+
size_mul(nr_cpus, struct_size(rbdesc, page_va, nr_pages)));
287+
}
288+
289+
#define for_each_ring_buffer_desc(__pdesc, __cpu, __trace_pdesc) \
290+
for (__pdesc = __first_ring_buffer_desc(__trace_pdesc), __cpu = 0; \
291+
(__cpu) < (__trace_pdesc)->nr_cpus; \
292+
(__cpu)++, __pdesc = __next_ring_buffer_desc(__pdesc))
293+
294+
struct ring_buffer_remote {
295+
struct trace_buffer_desc *desc;
296+
int (*swap_reader_page)(unsigned int cpu, void *priv);
297+
int (*reset)(unsigned int cpu, void *priv);
298+
void *priv;
299+
};
300+
301+
int ring_buffer_poll_remote(struct trace_buffer *buffer, int cpu);
302+
303+
struct trace_buffer *
304+
__ring_buffer_alloc_remote(struct ring_buffer_remote *remote,
305+
struct lock_class_key *key);
306+
307+
#define ring_buffer_alloc_remote(remote) \
308+
({ \
309+
static struct lock_class_key __key; \
310+
__ring_buffer_alloc_remote(remote, &__key); \
311+
})
254312
#endif /* _LINUX_RING_BUFFER_H */

include/linux/ring_buffer_types.h

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
/* SPDX-License-Identifier: GPL-2.0 */
2+
#ifndef _LINUX_RING_BUFFER_TYPES_H
3+
#define _LINUX_RING_BUFFER_TYPES_H
4+
5+
#include <asm/local.h>
6+
7+
#define TS_SHIFT 27
8+
#define TS_MASK ((1ULL << TS_SHIFT) - 1)
9+
#define TS_DELTA_TEST (~TS_MASK)
10+
11+
/*
12+
* We need to fit the time_stamp delta into 27 bits.
13+
*/
14+
static inline bool test_time_stamp(u64 delta)
15+
{
16+
return !!(delta & TS_DELTA_TEST);
17+
}
18+
19+
#define BUF_PAGE_HDR_SIZE offsetof(struct buffer_data_page, data)
20+
21+
#define RB_EVNT_HDR_SIZE (offsetof(struct ring_buffer_event, array))
22+
#define RB_ALIGNMENT 4U
23+
#define RB_MAX_SMALL_DATA (RB_ALIGNMENT * RINGBUF_TYPE_DATA_TYPE_LEN_MAX)
24+
#define RB_EVNT_MIN_SIZE 8U /* two 32bit words */
25+
26+
#ifndef CONFIG_HAVE_64BIT_ALIGNED_ACCESS
27+
# define RB_FORCE_8BYTE_ALIGNMENT 0
28+
# define RB_ARCH_ALIGNMENT RB_ALIGNMENT
29+
#else
30+
# define RB_FORCE_8BYTE_ALIGNMENT 1
31+
# define RB_ARCH_ALIGNMENT 8U
32+
#endif
33+
34+
#define RB_ALIGN_DATA __aligned(RB_ARCH_ALIGNMENT)
35+
36+
struct buffer_data_page {
37+
u64 time_stamp; /* page time stamp */
38+
local_t commit; /* write committed index */
39+
unsigned char data[] RB_ALIGN_DATA; /* data of buffer page */
40+
};
41+
#endif

include/linux/simple_ring_buffer.h

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
/* SPDX-License-Identifier: GPL-2.0 */
2+
#ifndef _LINUX_SIMPLE_RING_BUFFER_H
3+
#define _LINUX_SIMPLE_RING_BUFFER_H
4+
5+
#include <linux/list.h>
6+
#include <linux/ring_buffer.h>
7+
#include <linux/ring_buffer_types.h>
8+
#include <linux/types.h>
9+
10+
/*
11+
* Ideally those struct would stay private but the caller needs to know
12+
* the allocation size for simple_ring_buffer_init().
13+
*/
14+
struct simple_buffer_page {
15+
struct list_head link;
16+
struct buffer_data_page *page;
17+
u64 entries;
18+
u32 write;
19+
u32 id;
20+
};
21+
22+
struct simple_rb_per_cpu {
23+
struct simple_buffer_page *tail_page;
24+
struct simple_buffer_page *reader_page;
25+
struct simple_buffer_page *head_page;
26+
struct simple_buffer_page *bpages;
27+
struct trace_buffer_meta *meta;
28+
u32 nr_pages;
29+
30+
#define SIMPLE_RB_UNAVAILABLE 0
31+
#define SIMPLE_RB_READY 1
32+
#define SIMPLE_RB_WRITING 2
33+
u32 status;
34+
35+
u64 last_overrun;
36+
u64 write_stamp;
37+
38+
struct simple_rb_cbs *cbs;
39+
};
40+
41+
int simple_ring_buffer_init(struct simple_rb_per_cpu *cpu_buffer, struct simple_buffer_page *bpages,
42+
const struct ring_buffer_desc *desc);
43+
44+
void simple_ring_buffer_unload(struct simple_rb_per_cpu *cpu_buffer);
45+
46+
void *simple_ring_buffer_reserve(struct simple_rb_per_cpu *cpu_buffer, unsigned long length,
47+
u64 timestamp);
48+
49+
void simple_ring_buffer_commit(struct simple_rb_per_cpu *cpu_buffer);
50+
51+
int simple_ring_buffer_enable_tracing(struct simple_rb_per_cpu *cpu_buffer, bool enable);
52+
53+
int simple_ring_buffer_reset(struct simple_rb_per_cpu *cpu_buffer);
54+
55+
int simple_ring_buffer_swap_reader_page(struct simple_rb_per_cpu *cpu_buffer);
56+
57+
int simple_ring_buffer_init_mm(struct simple_rb_per_cpu *cpu_buffer,
58+
struct simple_buffer_page *bpages,
59+
const struct ring_buffer_desc *desc,
60+
void *(*load_page)(unsigned long va),
61+
void (*unload_page)(void *va));
62+
63+
void simple_ring_buffer_unload_mm(struct simple_rb_per_cpu *cpu_buffer,
64+
void (*unload_page)(void *));
65+
#endif

include/linux/trace_remote.h

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
/* SPDX-License-Identifier: GPL-2.0 */
2+
3+
#ifndef _LINUX_TRACE_REMOTE_H
4+
#define _LINUX_TRACE_REMOTE_H
5+
6+
#include <linux/dcache.h>
7+
#include <linux/ring_buffer.h>
8+
#include <linux/trace_remote_event.h>
9+
10+
/**
11+
* struct trace_remote_callbacks - Callbacks used by Tracefs to control the remote
12+
* @init: Called once the remote has been registered. Allows the
13+
* caller to extend the Tracefs remote directory
14+
* @load_trace_buffer: Called before Tracefs accesses the trace buffer for the first
15+
* time. Must return a &trace_buffer_desc
16+
* (most likely filled with trace_remote_alloc_buffer())
17+
* @unload_trace_buffer:
18+
* Called once Tracefs has no use for the trace buffer
19+
* (most likely call trace_remote_free_buffer())
20+
* @enable_tracing: Called on Tracefs tracing_on. It is expected from the
21+
* remote to allow writing.
22+
* @swap_reader_page: Called when Tracefs consumes a new page from a
23+
* ring-buffer. It is expected from the remote to isolate a
24+
* @reset: Called on `echo 0 > trace`. It is expected from the
25+
* remote to reset all ring-buffer pages.
26+
* new reader-page from the @cpu ring-buffer.
27+
* @enable_event: Called on events/event_name/enable. It is expected from
28+
* the remote to allow the writing event @id.
29+
*/
30+
struct trace_remote_callbacks {
31+
int (*init)(struct dentry *d, void *priv);
32+
struct trace_buffer_desc *(*load_trace_buffer)(unsigned long size, void *priv);
33+
void (*unload_trace_buffer)(struct trace_buffer_desc *desc, void *priv);
34+
int (*enable_tracing)(bool enable, void *priv);
35+
int (*swap_reader_page)(unsigned int cpu, void *priv);
36+
int (*reset)(unsigned int cpu, void *priv);
37+
int (*enable_event)(unsigned short id, bool enable, void *priv);
38+
};
39+
40+
int trace_remote_register(const char *name, struct trace_remote_callbacks *cbs, void *priv,
41+
struct remote_event *events, size_t nr_events);
42+
43+
int trace_remote_alloc_buffer(struct trace_buffer_desc *desc, size_t desc_size, size_t buffer_size,
44+
const struct cpumask *cpumask);
45+
46+
void trace_remote_free_buffer(struct trace_buffer_desc *desc);
47+
48+
#endif

include/linux/trace_remote_event.h

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
/* SPDX-License-Identifier: GPL-2.0 */
2+
3+
#ifndef _LINUX_TRACE_REMOTE_EVENTS_H
4+
#define _LINUX_TRACE_REMOTE_EVENTS_H
5+
6+
struct trace_remote;
7+
struct trace_event_fields;
8+
struct trace_seq;
9+
10+
struct remote_event_hdr {
11+
unsigned short id;
12+
};
13+
14+
#define REMOTE_EVENT_NAME_MAX 30
15+
struct remote_event {
16+
char name[REMOTE_EVENT_NAME_MAX];
17+
unsigned short id;
18+
bool enabled;
19+
struct trace_remote *remote;
20+
struct trace_event_fields *fields;
21+
char *print_fmt;
22+
void (*print)(void *evt, struct trace_seq *seq);
23+
};
24+
25+
#define RE_STRUCT(__args...) __args
26+
#define re_field(__type, __field) __type __field;
27+
28+
#define REMOTE_EVENT_FORMAT(__name, __struct) \
29+
struct remote_event_format_##__name { \
30+
struct remote_event_hdr hdr; \
31+
__struct \
32+
}
33+
#endif

0 commit comments

Comments
 (0)