Skip to content

Latest commit

 

History

History
128 lines (100 loc) · 3.41 KB

File metadata and controls

128 lines (100 loc) · 3.41 KB

eval2otel Python SDK Preview

The Python package mirrors the TypeScript eval2otel.v1 contract and can also emit OpenTelemetry spans when the optional OTel extras are installed. Without those extras, it still validates Eval2Otel payloads and returns conversion reports.

pip install -e ".[otel,validation]"
from eval2otel import instrument_all

client = instrument_all()
report = client.process_evaluation({
    "id": "case-1",
    "timestamp": 1700000000000,
    "model": "gpt-4o-mini",
    "system": "openai",
    "operation": "chat",
    "request": {"model": "gpt-4o-mini"},
    "response": {"model": "gpt-4o-mini"},
    "usage": {"inputTokens": 12, "outputTokens": 8},
    "performance": {"duration": 0.25},
    "conversation": {
        "messages": [
            {"role": "user", "content": "What shipped?"},
            {"role": "assistant", "content": "Eval2Otel Python OTLP hooks shipped."}
        ]
    },
    "provenance": {
        "sourceFramework": "deepeval",
        "runId": "nightly",
        "caseId": "case-1"
    }
})

assert report.contract_version == "eval2otel.v1"
client.shutdown()

Zero-Code Instrumentation

The package registers an opentelemetry_instrumentor entry point named eval2otel. In an environment with opentelemetry-instrumentation installed, opentelemetry-instrument can discover Eval2Otel and call the same instrument_all() path used above:

OTEL_SERVICE_NAME=my-ai-service \
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 \
OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf \
EVAL2OTEL_PROVIDERS=openai,anthropic \
opentelemetry-instrument python main.py

Programmatic use is also available:

from eval2otel import Eval2OtelInstrumentor, get_instrumented_client

Eval2OtelInstrumentor().instrument()
client = get_instrumented_client()

Environment

instrument_all() reads:

  • OTEL_SERVICE_NAME or EVAL2OTEL_SERVICE_NAME
  • OTEL_EXPORTER_OTLP_TRACES_ENDPOINT or OTEL_EXPORTER_OTLP_ENDPOINT
  • OTEL_EXPORTER_OTLP_PROTOCOL
  • OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT
  • EVAL2OTEL_SAMPLE_RATE
  • EVAL2OTEL_REDACT_PII
  • EVAL2OTEL_PROVIDERS

Content capture is off by default. When OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=true, message content is emitted as span events and sampled by EVAL2OTEL_SAMPLE_RATE. When EVAL2OTEL_REDACT_PII=true, the built-in redactor masks common emails, bearer tokens, secret assignments, and long number sequences before content is emitted.

Provider Hooks

instrument_all() returns client.instrumentation_handles when provider patching is enabled. Each handle reports whether the provider package was available, whether a compatible instrumentor was invoked, and the reason when it could not be instrumented.

Supported provider names:

  • openai
  • anthropic
  • google-generativeai
  • bedrock
  • cohere
  • huggingface

Set EVAL2OTEL_PROVIDERS=openai,anthropic to limit discovery.

Typed Validation

Install the validation extra to use optional Pydantic models:

from eval2otel.models import EvalResultModel

payload = EvalResultModel.model_validate({
    "id": "case-1",
    "model": "gpt-4o-mini",
    "operation": "chat",
    "request": {"model": "gpt-4o-mini"},
    "performance": {"duration": 0.25},
})

client.process_evaluation(payload.to_eval_result())

Development

From the repository root:

PYTHONPATH=python python3 -m unittest discover -s python/tests