OpenTelemetry trace correlation
If your training stack already runs under OpenTelemetry, the SDK attaches the active trace context to every episode you log — no new kwargs, no manual ID copying. The portal renders a Tracing card on the episode detail page with the IDs and (when configured) a one-click "Open trace" button into your APM.
Available since robotrace-dev==0.1.0a2.
Why
Robotics teams who instrument their training loops or eval workers with OpenTelemetry already have one source of truth for what happened: the trace. The episode they logged into RoboTrace has historically lived in a parallel timeline — you saw the episode, then context-switched into Datadog / Honeycomb / Tempo / Jaeger to find the matching trace by hand.
Trace correlation closes that loop. Every episode created inside an
active span carries trace_id / span_id / traceparent, so
clicking an episode in the portal is one keystroke away from the
full distributed trace it was part of — including every sensor
fetch, policy inference, and downstream service call.
Install
pip install 'robotrace-dev[otel]==0.1.0a2'The pin is the most reliable install during alpha and drops once
we cut 1.0.
The [otel] extra adds only opentelemetry-api (~30 KB), not
the heavy opentelemetry-sdk. The expectation is that your
existing OTel pipeline already provides the SDK, exporter, and
context propagators — RoboTrace is just a consumer of the
ambient context.
If you don't already use OpenTelemetry, you can either:
- Skip the extra entirely.
log_episode(...)keeps working exactly as before — the field is simply not attached. - Set up the OTel SDK side too (
pip install opentelemetry-sdk, configure an exporter pointing at your APM). The OTel docs cover this; once it's running, RoboTrace picks it up without any RoboTrace-side change.
Usage
There is no usage. That's the point.
import robotrace as rt
from opentelemetry import trace
tracer = trace.get_tracer("training-loop")
with tracer.start_as_current_span("rollout") as span:
span.set_attribute("policy.version", "pap-v3.2.1")
rt.log_episode(
name="warmup pick-and-place",
source="real",
policy_version="pap-v3.2.1",
env_version="halcyon-cell-rev4",
git_sha="abc1234",
seed=8124,
video="/tmp/rollout.mp4",
)rt.log_episode(...) reads opentelemetry.trace.get_current_span()
internally — when the active span is the rollout span above, the
episode picks up that span's trace and span IDs.
You'll see them on the episode detail page in the portal under the Tracing card.
What gets attached
The SDK serializes the W3C Trace Context — three fields, snake_case:
| Field | Format | Purpose |
|---|---|---|
trace_id | 32 hex chars (lowercase) | The OTel trace this episode is part of. Pasteable into any APM search. |
span_id | 16 hex chars (lowercase) | The specific span active at start_episode time. |
traceparent | 00-<trace_id>-<span_id>-<flags> (W3C header form) | The full propagation header — pasteable into curl, gRPC metadata, etc. |
These land in episodes.metadata.otel server-side and render in the
portal automatically.
Deep-linking into your APM
The portal looks for an environment variable named
NEXT_PUBLIC_TRACE_URL_TEMPLATE. When set, the Open trace
button on the Tracing card substitutes {trace_id}, {span_id},
or {traceparent} into the template URL.
Examples:
# Datadog APM
NEXT_PUBLIC_TRACE_URL_TEMPLATE=https://app.datadoghq.com/apm/trace/{trace_id}
# Honeycomb (replace <team> and <ds>)
NEXT_PUBLIC_TRACE_URL_TEMPLATE=https://ui.honeycomb.io/<team>/datasets/<ds>/trace?trace_id={trace_id}
# Grafana Tempo (your-grafana host)
NEXT_PUBLIC_TRACE_URL_TEMPLATE=https://<grafana>/explore?...&traceId={trace_id}
# Jaeger (your-jaeger host)
NEXT_PUBLIC_TRACE_URL_TEMPLATE=https://<jaeger>/trace/{trace_id}If the env var isn't set, the IDs and copy buttons still render — users can paste the IDs into their APM manually. The card itself only shows up when the episode actually has an OTel block in its metadata.
What happens when there's no active span
The same call without a surrounding with tracer.start_as_current_span(...)
silently produces an episode with no otel block:
import robotrace as rt
rt.log_episode(name="standalone run", source="sim")
# → no `otel` field in payload, no Tracing card on the portal pageThat's intentional. The SDK never fabricates a span ID — if your instrumentation didn't capture this run, the episode shouldn't pretend it did.
The same applies when:
- You haven't installed
[otel]. The soft import returnsNoneand the SDK behaves as if OTel didn't exist. - OTel returns
INVALID_SPAN. The sentinel zero IDs are filtered so the portal never renders an unclickable0000…0000. - OTel raises an exception while resolving the span. We swallow it — SDK telemetry must not crash a customer training run.
Sampling
We attach the trace context regardless of the W3C sampled bit.
Reasoning: the episode row is its own retention domain. Even if
your APM dropped the trace (sampling, retention, exporter outage),
the episode metadata still records which trace this run was part
of, so a human can reconstruct context months later.
If you'd rather respect the sampled bit verbatim, wrap the
log_episode(...) call in your own conditional — pass an explicit
metadata={...} dict that omits the OTel block when the span isn't
sampled. We may add a config flag for this in 0.2 if it turns
out to matter for someone.
What this isn't
- Not OTel itself. RoboTrace doesn't ship spans, metrics, or logs to your collector. You bring your own OTel SDK and exporter.
- Not a replacement for episode metadata. The OTel block is
purely correlation —
policy_version,env_version,git_sha,seed,metadatacontinue to be the source of truth for what the run was. Trace context tells you where else this run shows up, not what it did.