This document describes how to properly instrument AI agents with OpenTelemetry following the GenAI semantic conventions, with specific guidance for MLflow and Phoenix compatibility.

Overview

Agents emit traces following OpenTelemetry GenAI Semantic Conventions. The OTEL Collector transforms these to target-specific formats for:

  • MLflow: Experiment tracking and trace analysis
  • Phoenix: LLM observability and debugging
Agent (gen_ai.*) → OTEL Collector → Transform → MLflow + Phoenix

Span Naming Conventions

Agent Invocation Span

Per GenAI Agent Spans Spec:

ConditionSpan Name Format
Agent name availableinvoke_agent {gen_ai.agent.name}
Agent name unavailableinvoke_agent

Example:

# Correct span names:
"invoke_agent weather-assistant"  # When agent name is known
"invoke_agent"                    # Fallback when name unknown

Important: The gen_ai.operation.name attribute MUST be invoke_agent.

Other Operation Spans

OperationSpan Namegen_ai.operation.name
LLM chat completionchat {gen_ai.request.model}chat
Tool executionexecute_tool {gen_ai.tool.name}execute_tool
Embeddingsembeddings {gen_ai.request.model}embeddings
Agent creationcreate_agent {gen_ai.agent.name}create_agent

Required Attributes

Agent Span Attributes (invoke_agent)

AttributeTypeRequirementDescription
gen_ai.operation.namestringRequiredMust be invoke_agent
gen_ai.provider.namestringRequiredProvider: langchain, crewai, openai, etc.
gen_ai.agent.namestringConditionally RequiredAgent identifier
gen_ai.agent.idstringConditionally RequiredUnique agent instance ID
gen_ai.conversation.idstringConditionally RequiredSession/conversation ID (for multi-turn)
gen_ai.request.modelstringConditionally RequiredModel name if applicable
error.typestringConditionally RequiredError class (only if operation failed)
AttributeTypeDescription
gen_ai.response.finish_reasonsstring[]Why generation stopped: ["stop"], ["tool_calls"]
gen_ai.usage.input_tokensintTokens in prompt
gen_ai.usage.output_tokensintTokens in response
gen_ai.request.temperaturedoubleSampling temperature
gen_ai.request.max_tokensintMaximum tokens to generate
gen_ai.response.modelstringActual model used (may differ from requested)

Optional (Opt-In, Sensitive)

AttributeTypeDescription
gen_ai.input.messagesanyFull conversation history
gen_ai.output.messagesanyModel responses
gen_ai.system_instructionsanySystem prompt
gen_ai.tool.definitionsanyAvailable tools

Span Kind

ScenarioSpan Kind
Remote agent service (OpenAI Assistants, AWS Bedrock)CLIENT
In-process agent (LangChain, CrewAI, custom)INTERNAL

MLflow-Specific Requirements

MLflow reads specific attributes for its UI columns. These are in addition to GenAI conventions.

MLflow UI Column Mapping

UI ColumnAttributeSource
Requestmlflow.spanInputsRoot span
Responsemlflow.spanOutputsRoot span
Trace Namemlflow.traceNameRoot span or trace tag
Sessionmlflow.trace.sessionTrace tag (from gen_ai.conversation.id)
Usermlflow.userRoot span
Tokensmlflow.span.chat_usageLLM span (JSON: {"input_tokens": N, "output_tokens": M})
Typemlflow.spanTypeSpan (AGENT, LLM, TOOL, CHAIN)

MLflow Span Type Values

mlflow.spanTypeUse Case
AGENTAgent invocation span
LLMDirect LLM call
TOOLTool/function execution
CHAINChain/workflow step
RETRIEVERRAG retrieval
EMBEDDINGEmbedding generation

Critical: Root Span Attributes

MLflow reads mlflow.spanInputs and mlflow.spanOutputs from the ROOT span only.

For streaming responses, you must explicitly set these on the root span:

from contextvars import ContextVar

_root_span: ContextVar = ContextVar('root_span', default=None)

def get_root_span():
    return _root_span.get()

# In middleware - store root span
async def middleware(request, call_next):
    with tracer.start_as_current_span("invoke_agent weather-assistant") as span:
        token = _root_span.set(span)
        span.set_attribute("mlflow.spanInputs", user_input)
        try:
            response = await call_next(request)
        finally:
            _root_span.reset(token)

# In agent - set output on root span (NOT current span)
def on_complete(output):
    root_span = get_root_span()
    if root_span:
        root_span.set_attribute("mlflow.spanOutputs", output)

Phoenix (OpenInference) Requirements

Note: Phoenix is an optional component (components.phoenix.enabled). The OTEL Collector’s Phoenix pipeline (traces/phoenix) is only deployed when Phoenix is enabled.

Phoenix uses OpenInference semantic conventions.

Required Attributes

AttributeTypeDescription
openinference.span.kindstringRequired: AGENT, LLM, TOOL, etc.
input.valuestringInput to the operation
output.valuestringOutput from the operation

OpenInference Span Kinds

KindUse Case
AGENTAgent invocation
LLMLLM calls
CHAINChain/workflow
TOOLTool execution
RETRIEVERRAG retrieval
EMBEDDINGEmbedding generation
RERANKERDocument reranking
GUARDRAILSafety checks

LLM-Specific Attributes

AttributeTypeDescription
llm.model_namestringModel identifier
llm.systemstringAI vendor (openai, anthropic)
llm.providerstringHosting provider if different
llm.token_count.promptintInput tokens
llm.token_count.completionintOutput tokens
llm.input_messagesindexedChat messages (flattened)
llm.output_messagesindexedResponse messages

Complete Agent Instrumentation Example

"""
Weather Agent with proper GenAI + MLflow + Phoenix instrumentation.
"""
from contextvars import ContextVar
from opentelemetry import trace
from opentelemetry.trace import SpanKind, Status, StatusCode

# Agent metadata
AGENT_NAME = "weather-assistant"
AGENT_VERSION = "1.0.0"
PROVIDER = "langchain"

# ContextVar for root span access
_root_span: ContextVar = ContextVar('root_span', default=None)

def get_root_span():
    return _root_span.get()


def create_tracing_middleware():
    """Middleware that creates properly named root span."""
    tracer = trace.get_tracer("weather-agent")

    async def middleware(request, call_next):
        # Parse user input from request
        user_input = extract_user_input(request)
        session_id = extract_session_id(request)

        # Create root span with correct naming
        span_name = f"invoke_agent {AGENT_NAME}"

        with tracer.start_as_current_span(
            span_name,
            kind=SpanKind.INTERNAL,  # In-process agent
        ) as span:
            # Store for agent code access
            token = _root_span.set(span)

            try:
                # === GenAI Semantic Conventions (Required) ===
                span.set_attribute("gen_ai.operation.name", "invoke_agent")
                span.set_attribute("gen_ai.provider.name", PROVIDER)
                span.set_attribute("gen_ai.agent.name", AGENT_NAME)

                # === GenAI (Conditionally Required) ===
                if session_id:
                    span.set_attribute("gen_ai.conversation.id", session_id)

                # === MLflow-Specific ===
                if user_input:
                    span.set_attribute("mlflow.spanInputs", user_input[:1000])
                span.set_attribute("mlflow.spanType", "AGENT")
                span.set_attribute("mlflow.traceName", AGENT_NAME)
                span.set_attribute("mlflow.version", AGENT_VERSION)

                # === OpenInference (Phoenix) ===
                span.set_attribute("openinference.span.kind", "AGENT")
                if user_input:
                    span.set_attribute("input.value", user_input[:1000])

                # Call agent handler
                response = await call_next(request)

                span.set_status(Status(StatusCode.OK))
                return response

            except Exception as e:
                span.set_status(Status(StatusCode.ERROR, str(e)))
                span.set_attribute("error.type", type(e).__name__)
                span.record_exception(e)
                raise
            finally:
                _root_span.reset(token)

    return middleware


def set_agent_output(output: str):
    """Set output on root span (call after agent completes)."""
    root_span = get_root_span()
    if root_span and root_span.is_recording():
        truncated = output[:1000]

        # GenAI convention (for general OTEL consumers)
        root_span.set_attribute("gen_ai.completion", truncated)

        # MLflow (Response column)
        root_span.set_attribute("mlflow.spanOutputs", truncated)

        # Phoenix/OpenInference
        root_span.set_attribute("output.value", truncated)

OTEL Collector Transforms

The OTEL Collector can transform GenAI attributes to target formats:

processors:
  transform/genai_to_mlflow:
    trace_statements:
      - context: span
        statements:
          # Copy gen_ai.conversation.id to mlflow.trace.session
          - set(attributes["mlflow.trace.session"], attributes["gen_ai.conversation.id"])
            where attributes["gen_ai.conversation.id"] != nil

          # Set mlflow.spanType based on operation
          - set(attributes["mlflow.spanType"], "AGENT")
            where attributes["gen_ai.operation.name"] == "invoke_agent"
          - set(attributes["mlflow.spanType"], "LLM")
            where attributes["gen_ai.operation.name"] == "chat"
          - set(attributes["mlflow.spanType"], "TOOL")
            where attributes["gen_ai.operation.name"] == "execute_tool"

  transform/genai_to_openinference:
    trace_statements:
      - context: span
        statements:
          # Map gen_ai.* to OpenInference llm.*
          - set(attributes["llm.model_name"], attributes["gen_ai.request.model"])
          - set(attributes["llm.system"], attributes["gen_ai.provider.name"])
          - set(attributes["llm.token_count.prompt"], attributes["gen_ai.usage.input_tokens"])
          - set(attributes["llm.token_count.completion"], attributes["gen_ai.usage.output_tokens"])

Attribute Summary Table

PurposeGenAI AttributeMLflow AttributeOpenInference Attribute
Operation typegen_ai.operation.namemlflow.spanTypeopeninference.span.kind
Agent namegen_ai.agent.namemlflow.traceName-
Modelgen_ai.request.model-llm.model_name
Providergen_ai.provider.name-llm.system
Sessiongen_ai.conversation.idmlflow.trace.session-
Inputgen_ai.promptmlflow.spanInputsinput.value
Outputgen_ai.completionmlflow.spanOutputsoutput.value
Input tokensgen_ai.usage.input_tokens-llm.token_count.prompt
Output tokensgen_ai.usage.output_tokens-llm.token_count.completion

Testing Requirements

E2E tests should verify:

  1. Span naming: Root span name matches invoke_agent {agent_name} format
  2. Required attributes: gen_ai.operation.name, gen_ai.provider.name present
  3. MLflow columns: mlflow.spanInputs, mlflow.spanOutputs on root span
  4. Phoenix attributes: openinference.span.kind, input.value, output.value
  5. Session tracking: gen_ai.conversation.id propagated correctly
  6. Token counts: Present when LLM returns usage data

References