robotrace.log_episode

The single one-shot entry point for ingesting an episode. Equivalent to start_episode(...) → upload all artifacts → finalize. Use this for the 95% case of "I have files on disk, log them and move on."

The contract is sacred

This signature is the sacred SDK contract. Once we cut 1.0.0, breaking it requires:

  • A major version bump (1.x2.0)
  • At least one minor of DeprecationWarning before the break ships, so existing training scripts get an early warning instead of a TypeError

Until 1.0.0 the surface may still iterate - every change lands in the SDK changelog. From 0.2.0 onward the contract is frozen: public signatures only widen, and any removal goes through one full minor of DeprecationWarning first.

Already on OpenTelemetry? Calling log_episode(...) inside an active OTel span attaches trace_id / span_id / traceparent to the episode automatically - no new kwargs. See OpenTelemetry trace correlation for the install step (pip install 'robotrace-dev[otel]==0.3.0') and how the portal deep-links into your APM.

Signature

def log_episode(
    *,
    # Identification
    name: str | None = None,
    source: Literal["real", "sim", "replay"] = "real",
    robot: str | None = None,
 
    # Reproducibility - load-bearing
    policy_version: str | None = None,
    env_version: str | None = None,
    git_sha: str | None = None,
    seed: int | None = None,
 
    # Artifact paths (uploaded inline via signed PUT URLs)
    video: str | Path | None = None,
    sensors: str | Path | None = None,
    actions: str | Path | None = None,
 
    # Run details
    duration_s: float | None = None,
    fps: float | None = None,
    metadata: Mapping[str, Any] | None = None,
 
    # Final state - defaults to "ready". Pass "failed" when the run
    # errored before producing usable data.
    status: Literal["ready", "failed"] = "ready",
 
    # Canonical failure timestamp (seconds from start). Only valid
    # with status="failed". Drives the frame-accurate "Failure"
    # marker on the replay scrubber - see `failure_time_s` below.
    failure_time_s: float | None = None,
) -> Episode

All arguments are keyword-only - positional calls raise TypeError. This is intentional: it lets us add new params without breaking older call sites.

Identification

name: str | None

Human-readable label for the run, shown in the episodes list. Falls back to episode_<short_id> when omitted. Use the same naming scheme across runs of the same task - it makes the list filterable.

source: "real" | "sim" | "replay"

Where the episode came from:

  • real - physical robot. The default.
  • sim - simulator (MuJoCo, Genesis, Isaac, Drake, etc.).
  • replay - generated by re-rolling a policy against a previously recorded observation stream. The eval engine sets this for you; you generally don't pass it manually.

robot: str | None

Stable identifier for the physical robot or sim configuration that produced the episode. Recommend a short slug (halcyon-bimanual-01, franka-right, ur5-cell-3) so the portal can group runs across days.

Reproducibility (load-bearing)

These four fields exist so future-you can re-roll a new policy against this episode and know what changed. Don't drop them to "simplify" - the eval engine literally can't run without them.

policy_version: str | None

A stable identifier for the policy / model checkpoint that produced this episode. Conventions we recommend:

StyleExample
SL/ILckpt_2026-05-01_step_180k
RLppo_2026-05-01_seed42
Frozen baselinebaseline_v1
VLApap-v3.2.1 (semver against the policy)

Whatever you pick - make it resolvable. The re-roll feature can only re-run a policy version it can locate, so don't put random hashes here unless your registry can map them back to weights.

env_version: str | None

The environment / world version. For sim, the build hash or config tag (mujoco_warehouse_v3, genesis-rev412). For real-world, the workcell setup version (cell_a_2026-04-12). Required so re-rolls know whether comparing across policy_versions is fair.

git_sha: str | None

The git SHA of your training/inference code at the time the episode was produced. We don't validate that the SHA exists in any specific repo - that's between you and your CI. Seven characters minimum is the convention.

seed: int | None

The seed used by the policy / env. If your stack uses multiple seeds, pass the highest-level one and stash the rest in metadata.

Artifacts

Local file paths. Each is uploaded to Cloudflare R2 via a short-lived signed PUT URL - bytes never touch the RoboTrace origin server. The SDK streams from disk so memory stays flat regardless of file size.

video: str | Path | None

A video file (.mp4, .webm, .mov). The signed URL is minted with Content-Type: video/mp4, so the file's actual content type needs to match. Files up to 8 GB are supported in Phase 1; split longer episodes.

sensors: str | Path | None

A serialized sensor blob - typically a .npy, .npz, .h5, or .bin file containing per-step sensor arrays ((T, ...) shaped, time axis first). Format is opaque to the server: we store the bytes and let your replay tooling deserialize them.

actions: str | Path | None

A serialized actions blob - typically a .parquet, .feather, or .npy file containing the (T, action_dim) action vector. Required if you want to re-roll a different policy on this episode later.

The SDK sanity-checks file extensions against slot names. Passing actions="run.mp4" raises ConfigurationError - likely the kwargs got swapped.

Run details

duration_s: float | None

Wall-clock duration of the run in seconds. Shown on the detail page and used by the dashboard heatmap to weight cells.

fps: float | None

Sampling rate for the recorded sensors / actions. Used by the replay viewer (when it ships) to align video and sensor tracks.

metadata: Mapping[str, Any] | None

Free-form JSON metadata stored as metadata jsonb on the episode row. Use it for anything that doesn't fit the standard fields: operator, lighting, shift, hardware revision, task outcome, etc.

Don't put bytes or raw sensor values here - that's what the artifact slots are for. The column is indexed for JSON search but not designed for multi-MB blobs.

status: "ready" | "failed"

Final state to flip the episode into. Defaults to "ready". Pass "failed" when you know the run errored before producing usable data - the episode still appears in the list but is filtered out of "recent successful runs" cards.

failure_time_s: float | None

Canonical failure timestamp in seconds from started_at. Set this when you know exactly when the run broke (collision watchdog, joint limit trip, manual abort) and want the replay scrubber to land on the right frame. The portal renders an amber "Failure" pin at this instant - ahead of any heuristic Failure Intelligence findings.

  • Only valid with status="failed". The SDK raises ValueError if you pass failure_time_s alongside status="ready", because that combination almost always means a mis-wired error handler.
  • Must be non-negative. Negative values raise ValueError at the SDK boundary; the server-side CHECK constraint additionally caps the value at duration_s and clamps small float overshoots (within 1 ms of the run length) instead of rejecting.
  • Optional. Leave it None (the default) and Failure Intelligence still produces best-effort markers from metadata heuristics.
rt.log_episode(
    name="pick_and_place evening run",
    policy_version="pap-v3.2.1",
    video="run.mp4",
    duration_s=18.4,
    status="failed",
    failure_time_s=12.34,           # collision watchdog tripped here
    metadata={"failure_reason": "wrist collision"},
)

Return value

@dataclass
class Episode:
    id: str                               # uuid, as str
    status: str                           # "ready" or "failed"
    storage: Literal["r2", "unconfigured"]
    upload_urls: dict[ArtifactKind, UploadUrl]

You rarely need the return value from log_episode - by the time it returns, everything's already uploaded and finalized. Useful when you want to capture the episode id for your own logs:

ep = rt.log_episode(...)
my_logger.info("logged episode", episode_id=ep.id)

Errors

log_episode raises typed exceptions on every failure path. See Errors for the full hierarchy and recovery patterns. The most common ones in this call:

ExceptionWhen
ConfigurationErrorapi_key / base_url missing, or a file path doesn't exist
AuthErrorAPI key bad / revoked
ValidationErrorPayload didn't pass server-side validation
ConflictError(rare) Episode is somehow already archived
TransportErrorNetwork / DNS / timeout
ServerError5xx - flag for retries

If an upload fails partway through, the SDK auto-flips the run to status="failed" with the failure reason in metadata.failure_reason before re-raising - so you don't end up with ghostly "recording" runs in the portal.

Don'ts

  • Don't call log_episode from inside your training inner loop. Rate-limit at episode boundaries, not per step.
  • Don't put episode bytes in metadata. The DB is for metadata, R2 is for bytes.
  • Don't log the API key in your training script - use environment variables. The SDK never logs the key value.
  • Don't pass positional arguments. The contract is keyword-only on purpose.