OpenTelemetry trace correlation

If your training stack already runs under OpenTelemetry, the SDK attaches the active trace context to every episode you log — no new kwargs, no manual ID copying. The portal renders a Tracing card on the episode detail page with the IDs and (when configured) a one-click "Open trace" button into your APM.

Available since robotrace-dev==0.1.0a2.

Why

Robotics teams who instrument their training loops or eval workers with OpenTelemetry already have one source of truth for what happened: the trace. The episode they logged into RoboTrace has historically lived in a parallel timeline — you saw the episode, then context-switched into Datadog / Honeycomb / Tempo / Jaeger to find the matching trace by hand.

Trace correlation closes that loop. Every episode created inside an active span carries trace_id / span_id / traceparent, so clicking an episode in the portal is one keystroke away from the full distributed trace it was part of — including every sensor fetch, policy inference, and downstream service call.

Install

pip install 'robotrace-dev[otel]==0.1.0a2'

The pin is the most reliable install during alpha and drops once we cut 1.0.

The [otel] extra adds only opentelemetry-api (~30 KB), not the heavy opentelemetry-sdk. The expectation is that your existing OTel pipeline already provides the SDK, exporter, and context propagators — RoboTrace is just a consumer of the ambient context.

If you don't already use OpenTelemetry, you can either:

Skip the extra entirely. log_episode(...) keeps working exactly as before — the field is simply not attached.
Set up the OTel SDK side too (pip install opentelemetry-sdk, configure an exporter pointing at your APM). The OTel docs cover this; once it's running, RoboTrace picks it up without any RoboTrace-side change.

Usage

There is no usage. That's the point.

import robotrace as rt
from opentelemetry import trace
 
tracer = trace.get_tracer("training-loop")
 
with tracer.start_as_current_span("rollout") as span:
    span.set_attribute("policy.version", "pap-v3.2.1")
 
    rt.log_episode(
        name="warmup pick-and-place",
        source="real",
        policy_version="pap-v3.2.1",
        env_version="halcyon-cell-rev4",
        git_sha="abc1234",
        seed=8124,
        video="/tmp/rollout.mp4",
    )

rt.log_episode(...) reads opentelemetry.trace.get_current_span() internally — when the active span is the rollout span above, the episode picks up that span's trace and span IDs.

You'll see them on the episode detail page in the portal under the Tracing card.

What gets attached

The SDK serializes the W3C Trace Context — three fields, snake_case:

Field	Format	Purpose
`trace_id`	32 hex chars (lowercase)	The OTel trace this episode is part of. Pasteable into any APM search.
`span_id`	16 hex chars (lowercase)	The specific span active at `start_episode` time.
`traceparent`	`00-<trace_id>-<span_id>-<flags>` (W3C header form)	The full propagation header — pasteable into curl, gRPC metadata, etc.

These land in episodes.metadata.otel server-side and render in the portal automatically.

Deep-linking into your APM

The portal looks for an environment variable named NEXT_PUBLIC_TRACE_URL_TEMPLATE. When set, the Open trace button on the Tracing card substitutes {trace_id}, {span_id}, or {traceparent} into the template URL.

Examples:

# Datadog APM
NEXT_PUBLIC_TRACE_URL_TEMPLATE=https://app.datadoghq.com/apm/trace/{trace_id}
 
# Honeycomb (replace <team> and <ds>)
NEXT_PUBLIC_TRACE_URL_TEMPLATE=https://ui.honeycomb.io/<team>/datasets/<ds>/trace?trace_id={trace_id}
 
# Grafana Tempo (your-grafana host)
NEXT_PUBLIC_TRACE_URL_TEMPLATE=https://<grafana>/explore?...&traceId={trace_id}
 
# Jaeger (your-jaeger host)
NEXT_PUBLIC_TRACE_URL_TEMPLATE=https://<jaeger>/trace/{trace_id}

If the env var isn't set, the IDs and copy buttons still render — users can paste the IDs into their APM manually. The card itself only shows up when the episode actually has an OTel block in its metadata.

What happens when there's no active span

The same call without a surrounding with tracer.start_as_current_span(...) silently produces an episode with no otel block:

import robotrace as rt
rt.log_episode(name="standalone run", source="sim")
# → no `otel` field in payload, no Tracing card on the portal page

That's intentional. The SDK never fabricates a span ID — if your instrumentation didn't capture this run, the episode shouldn't pretend it did.

The same applies when:

You haven't installed [otel]. The soft import returns None and the SDK behaves as if OTel didn't exist.
OTel returns INVALID_SPAN. The sentinel zero IDs are filtered so the portal never renders an unclickable 0000…0000.
OTel raises an exception while resolving the span. We swallow it — SDK telemetry must not crash a customer training run.

Sampling

We attach the trace context regardless of the W3C sampled bit. Reasoning: the episode row is its own retention domain. Even if your APM dropped the trace (sampling, retention, exporter outage), the episode metadata still records which trace this run was part of, so a human can reconstruct context months later.

If you'd rather respect the sampled bit verbatim, wrap the log_episode(...) call in your own conditional — pass an explicit metadata={...} dict that omits the OTel block when the span isn't sampled. We may add a config flag for this in 0.2 if it turns out to matter for someone.

What this isn't

Not OTel itself. RoboTrace doesn't ship spans, metrics, or logs to your collector. You bring your own OTel SDK and exporter.
Not a replacement for episode metadata. The OTel block is purely correlation — policy_version, env_version, git_sha, seed, metadata continue to be the source of truth for what the run was. Trace context tells you where else this run shows up, not what it did.