The Python package mirrors the TypeScript eval2otel.v1 contract and can also
emit OpenTelemetry spans when the optional OTel extras are installed. Without
those extras, it still validates Eval2Otel payloads and returns conversion
reports.
pip install -e ".[otel,validation]"from eval2otel import instrument_all
client = instrument_all()
report = client.process_evaluation({
"id": "case-1",
"timestamp": 1700000000000,
"model": "gpt-4o-mini",
"system": "openai",
"operation": "chat",
"request": {"model": "gpt-4o-mini"},
"response": {"model": "gpt-4o-mini"},
"usage": {"inputTokens": 12, "outputTokens": 8},
"performance": {"duration": 0.25},
"conversation": {
"messages": [
{"role": "user", "content": "What shipped?"},
{"role": "assistant", "content": "Eval2Otel Python OTLP hooks shipped."}
]
},
"provenance": {
"sourceFramework": "deepeval",
"runId": "nightly",
"caseId": "case-1"
}
})
assert report.contract_version == "eval2otel.v1"
client.shutdown()The package registers an opentelemetry_instrumentor entry point named
eval2otel. In an environment with opentelemetry-instrumentation installed,
opentelemetry-instrument can discover Eval2Otel and call the same
instrument_all() path used above:
OTEL_SERVICE_NAME=my-ai-service \
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 \
OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf \
EVAL2OTEL_PROVIDERS=openai,anthropic \
opentelemetry-instrument python main.pyProgrammatic use is also available:
from eval2otel import Eval2OtelInstrumentor, get_instrumented_client
Eval2OtelInstrumentor().instrument()
client = get_instrumented_client()instrument_all() reads:
OTEL_SERVICE_NAMEorEVAL2OTEL_SERVICE_NAMEOTEL_EXPORTER_OTLP_TRACES_ENDPOINTorOTEL_EXPORTER_OTLP_ENDPOINTOTEL_EXPORTER_OTLP_PROTOCOLOTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENTEVAL2OTEL_SAMPLE_RATEEVAL2OTEL_REDACT_PIIEVAL2OTEL_PROVIDERS
Content capture is off by default. When
OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=true, message content is
emitted as span events and sampled by EVAL2OTEL_SAMPLE_RATE. When
EVAL2OTEL_REDACT_PII=true, the built-in redactor masks common emails,
bearer tokens, secret assignments, and long number sequences before content is
emitted.
instrument_all() returns client.instrumentation_handles when provider
patching is enabled. Each handle reports whether the provider package was
available, whether a compatible instrumentor was invoked, and the reason when it
could not be instrumented.
Supported provider names:
openaianthropicgoogle-generativeaibedrockcoherehuggingface
Set EVAL2OTEL_PROVIDERS=openai,anthropic to limit discovery.
Install the validation extra to use optional Pydantic models:
from eval2otel.models import EvalResultModel
payload = EvalResultModel.model_validate({
"id": "case-1",
"model": "gpt-4o-mini",
"operation": "chat",
"request": {"model": "gpt-4o-mini"},
"performance": {"duration": 0.25},
})
client.process_evaluation(payload.to_eval_result())From the repository root:
PYTHONPATH=python python3 -m unittest discover -s python/tests