|
| 1 | +--- |
| 2 | +title: "Export Data to OpenTelemetry" |
| 3 | +metaTitle: "Tutorials | Integrations and Alerts | Pixie <> OpenTelemetry" |
| 4 | +metaDescription: "Export Data to OpenTelemetry" |
| 5 | +order: 3 |
| 6 | +redirect_from: |
| 7 | + - /tutorials/otel/ |
| 8 | +--- |
| 9 | + |
| 10 | + |
| 11 | +Pixie comes packaged with an OpenTelemetry exporter. You can write PxL scripts that define the transformation of Pixie DataFrames into OpenTelemetry data. This article walks through a script that exports HTTP data collected by Pixie into an OpenTelemetry endpoint. More detailed PxL documentation for the OpenTelemetry integration is available [here](/reference/pxl/otel-export). |
| 12 | + |
| 13 | + |
| 14 | +## Example OpenTelemetry Export PxL Script |
| 15 | + |
| 16 | +The following [PxL script](/tutorials/pxl-scripts/write-pxl-scripts/#overview) calculates the rate of HTTP requests made to each pod in your cluster and exports that data as an OpenTelemetry Gauge metric. |
| 17 | + |
| 18 | + |
| 19 | +```python |
| 20 | +import px |
| 21 | +# Read in the http_events table |
| 22 | +df = px.DataFrame(table='http_events', start_time='-10s') |
| 23 | + |
| 24 | +# Attach the pod and service metadata |
| 25 | +df.pod = df.ctx['pod'] |
| 26 | +df.service = df.ctx['service'] |
| 27 | +# Count the number of requests per pod and service |
| 28 | +df = df.groupby(['pod', 'service', 'req_path']).agg( |
| 29 | + throughput=('latency', px.count), |
| 30 | + time_=('time_', px.max), |
| 31 | +) |
| 32 | + |
| 33 | +# Change the denominator if you change start_time above. |
| 34 | +df.requests_per_s = df.throughput / 10 |
| 35 | + |
| 36 | +px.export(df, px.otel.Data( |
| 37 | + # endpoint arg not required if run in a plugin that provides the endpoint |
| 38 | + endpoint=px.otel.Endpoint( |
| 39 | + url='0.0.0.0:98765', |
| 40 | + headers={ |
| 41 | + 'apikey': '12345', |
| 42 | + } |
| 43 | + ), |
| 44 | + resource={ |
| 45 | + # service.name is required by OpenTelemetry. |
| 46 | + 'service.name' : df.service, |
| 47 | + 'service.instance.id': df.pod, |
| 48 | + 'k8s.pod.name': df.pod, |
| 49 | + }, |
| 50 | + data=[ |
| 51 | + px.otel.metric.Gauge( |
| 52 | + name='http.throughput', |
| 53 | + description='The number of messages sent per second', |
| 54 | + value=df.requests_per_s, |
| 55 | + attributes={ |
| 56 | + 'req_path': df.req_path, |
| 57 | + } |
| 58 | + ) |
| 59 | + ] |
| 60 | +)) |
| 61 | +``` |
| 62 | + |
| 63 | + |
| 64 | + |
| 65 | +## The Data |
| 66 | +The first part of this script (lines 1-19) read in the `http_events` data and count the number of requests made to each pod from the last 10s. |
| 67 | + |
| 68 | + |
| 69 | +```python |
| 70 | +import px |
| 71 | + |
| 72 | +# Read in the http_events table |
| 73 | +df = px.DataFrame(table='http_events', start_time='-10s') |
| 74 | + |
| 75 | +# Attach the pod and service metadata |
| 76 | +df.pod = df.ctx['pod'] |
| 77 | +df.service = df.ctx['service'] |
| 78 | + |
| 79 | +# Count the number of requests per pod and service |
| 80 | +df = df.groupby(['pod', 'service', 'req_path']).agg( |
| 81 | + throughput=('latency', px.count), |
| 82 | + time_=('time_', px.max), |
| 83 | +) |
| 84 | + |
| 85 | +# Calculate the rate for the time window |
| 86 | +df.requests_per_s = df.throughput / 10 |
| 87 | +``` |
| 88 | + |
| 89 | + |
| 90 | + |
| 91 | +## Exporting |
| 92 | + |
| 93 | +To export the data, you’ll call `px.export` with the DataFrame as the first argument and the export target `px.otel.Data` as the second argument. |
| 94 | + |
| 95 | + |
| 96 | +```python |
| 97 | +px.export(df, px.otel.Data(...)) |
| 98 | +``` |
| 99 | + |
| 100 | + |
| 101 | +The export target (`px.otel.Data`) describes which columns to use for the corresponding OpenTelemetry fields. You specify a column using the same syntax as in a regular query: `df.column_name` or `df[‘column_name’]`. The columns must reference a column available in the `df` argument or the PxL compiler will throw an error |
| 102 | + |
| 103 | + |
| 104 | +## Specifying a Collector Endpoint and Authentication |
| 105 | + |
| 106 | +The PxL OpenTelemetry exporter needs to talk with a collector. You must specify this information via the `endpoint` parameter: |
| 107 | + |
| 108 | + |
| 109 | +```python |
| 110 | +endpoint=px.otel.Endpoint( |
| 111 | + url='0.0.0.0:55690', |
| 112 | + headers={ |
| 113 | + 'api-key': '12345', |
| 114 | + } |
| 115 | +), |
| 116 | +``` |
| 117 | + |
| 118 | + |
| 119 | +The endpoint url must be an OpenTelemetry grpc endpoint and must be secured with SSL. Don’t specify a protocol prefix. Optionally, you can also specify the headers passed to the endpoint. Some OpenTelemetry collector providers look for authentication tokens or api keys in the connection context. The headers field is where you can add this information. |
| 120 | + |
| 121 | +Note that if you’re writing a [plugin script](/reference/plugins/plugin-system), this information should be passed in from the plugin context. |
| 122 | + |
| 123 | + |
| 124 | +## Transforming Data |
| 125 | + |
| 126 | +The core idea of the PxL OpenTelemetry export is that you’re converting columnar data from a Pixie DataFrame into the fields of whatever OpenTelemetry data that you wish to capture. You can reference a column by using the attribute syntax `df.column_name`. Under the hood, Pixie will convert the values for each row into a new OpenTelemetry message. The columns must match up with the DataFrame that you are exporting (the first argument to `px.export`), otherwise you will receive a compiler error. |
| 127 | + |
| 128 | + |
| 129 | +## Specifying a Resource |
| 130 | + |
| 131 | +The `resource` parameter defines the entity producing the [telemetry data](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/resource/sdk.md). Users define the `resource` argument as a dictionary mapping attribute keys to the STRING columns that populate the attribute values. The PxL configuration expects `service.name` to be set, all other attributes are optional. |
| 132 | + |
| 133 | +When creating new attribute keys, keep in mind OpenTelemetry has a [recommended pattern](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/resource/semantic_conventions/README.md#document-conventions) that you should follow to maintain broad compatibility with OpenTelemetry collectors. |
| 134 | + |
| 135 | +```python |
| 136 | +resource={ |
| 137 | + # service.name is required by OpenTelemetry. |
| 138 | + 'service.name' : df.service, |
| 139 | + 'service.instance.id': df.pod, |
| 140 | + 'k8s.pod.name': df.pod, |
| 141 | +}, |
| 142 | +``` |
| 143 | + |
| 144 | + |
| 145 | + |
| 146 | +## Specifying Data |
| 147 | + |
| 148 | +The data parameter allows you to specify a list of metrics or traces that are generated from the DataFrame. In the example script, we specify a single Gauge metric for the `df.request_per_s` column. We also supply an attribute for the metric, `req_path`. Each Metric and Trace type supports a custom attribute field. Metric/Trace attributes work similarly to Resource attributes, but they are scoped only to the specific method |
| 149 | + |
| 150 | + |
| 151 | +```python |
| 152 | +data=[ |
| 153 | + px.otel.metric.Gauge( |
| 154 | + name='http.throughput', |
| 155 | + description='The number of messages sent per second', |
| 156 | + value=df.requests_per_s, |
| 157 | + attributes={ |
| 158 | + 'req_path': df.req_path, |
| 159 | + } |
| 160 | + ) |
| 161 | +] |
| 162 | +``` |
| 163 | + |
| 164 | + |
| 165 | +We currently support a limited set of OpenTelemetry signal types: `metric.Gauge`, `metric.Summary` and `trace.Span`. We also support a subset of the available fields for each instrument. You can see the full set of features [in our api documentation.](/reference/pxl/otel-export) If you want support for other fields, please [open an issue](https://github.com/pixie-io/pixie). |
| 166 | + |
0 commit comments