Errors

Every error the SDK raises inherits from robotrace.RobotraceError. Catch by type, not by parsing message strings — the messages are human-readable and may change between minor versions. The types are stable and follow the same "sacred contract" rule as log_episode.

The hierarchy

RobotraceError
├── ConfigurationError       # missing api_key / base_url, bad path, etc.
├── TransportError           # network / timeout / DNS / TLS
└── APIError                 # the server responded with an error
    ├── AuthError            # 401 — bad / missing / revoked key
    ├── NotFoundError        # 404 — episode id doesn't exist (or cross-tenant)
    ├── ConflictError        # 409 — episode is archived, etc.
    ├── ValidationError      # 400 — payload didn't match the schema
    └── ServerError          # 5xx — flag for retries

APIError and its subclasses carry two extra attributes for debugging:

exc.status_code   # int — the HTTP status the server returned
exc.response_body # parsed JSON body (or raw text on non-JSON 5xx)

When you'll see each one

ConfigurationError

The SDK is missing or misconfigured. Caught at the call site, never reaches the network. Common cases:

  • api_key not passed and ROBOTRACE_API_KEY not set
  • base_url not passed and ROBOTRACE_BASE_URL not set
  • A path passed to upload_video(...) doesn't exist
  • The deployment hasn't wired R2 (storage="unconfigured") and your code calls ep.upload_video(...) anyway — the SDK fails loud rather than silently dropping bytes
from robotrace import ConfigurationError
 
try:
    rt.log_episode(name="oops", video="/missing/file.mp4")
except ConfigurationError as exc:
    print(f"fix your inputs: {exc}")

Don't retry — the inputs need to change first.

TransportError

The HTTP request failed before the server could respond. DNS, TCP reset, TLS handshake, or a timeout. The request is not known to have landed, so retrying with backoff is generally safe:

from robotrace import TransportError
import time
 
for attempt in range(3):
    try:
        rt.log_episode(...)
        break
    except TransportError:
        if attempt == 2:
            raise
        time.sleep(2 ** attempt)  # 1, 2, 4 seconds

The SDK doesn't auto-retry because what's safe depends on the call: re-trying a start_episode after a transport error is fine (server might have created the row twice, but each gets a unique id); re-trying an upload PUT against an expired signed URL just wastes bytes.

AuthError (401)

The API key is missing, malformed, or revoked. Don't retry — the user needs to mint a fresh key in Admin → Clients → <client> → API access.

from robotrace import AuthError
 
try:
    rt.log_episode(...)
except AuthError as exc:
    alerts.notify(
        "RoboTrace key needs rotation",
        details=str(exc),
    )
    raise

NotFoundError (404)

The episode id doesn't exist, or belongs to a different client. We deliberately make these two cases indistinguishable server-side to avoid a UUID-enumeration oracle.

This won't happen during normal log_episode(...) flow — you only see it if you constructed an Episode from a stale id and tried to finalize it.

ConflictError (409)

The request is well-formed but conflicts with current server state. The most common cause: trying to finalize(...) an episode that's already been archived in the admin UI.

Restore the episode from /admin/episodes/<id> before retrying, or start a fresh episode.

ValidationError (400)

The payload didn't pass server-side validation. The server's error field tells you which constraint tripped:

from robotrace import ValidationError
 
try:
    rt.log_episode(name="x" * 500, ...)  # name is capped at 200 chars
except ValidationError as exc:
    print(exc)                # human message
    print(exc.response_body)  # {'error': 'name must be ≤ 200 chars'}

Don't retry without changing the inputs.

ServerError (5xx)

Something blew up on the server side — database hiccup, R2 signing failed, etc. Worth retrying with exponential backoff. The SDK deliberately does not auto-retry because retrying a finalize twice could double-bill artifact storage in future paid tiers.

from robotrace import ServerError
import time
 
for attempt in range(5):
    try:
        rt.log_episode(...)
        break
    except ServerError:
        if attempt == 4:
            raise
        time.sleep(2 ** attempt)  # 1, 2, 4, 8, 16

If ServerError persists past a few retries, check status.robotrace.dev (Phase 2) or ping us — there's likely an incident.

Catch-all pattern

For training scripts where you want one alert path for any RoboTrace problem without distinguishing types:

from robotrace import RobotraceError
 
try:
    rt.log_episode(...)
except RobotraceError as exc:
    # Anything from the SDK — auth, config, network, server.
    # User code bugs (TypeError, ValueError) still propagate.
    sentry_sdk.capture_exception(exc)
    raise

RobotraceError deliberately does not inherit from OSError / IOError — we don't want a blanket except Exception: in your training loop to silently eat our errors and leave you wondering why nothing's showing up in the portal.

Server vs SDK redaction

The SDK never logs:

  • The value of your API key
  • The body of an ingest request (which can carry trade secrets)
  • Signed PUT URLs (they expire fast but still)

The server side has the same rule — see AGENTS.md → "Don't console.log SDK ingest payloads or API keys." If you find an exception message that leaks any of the above, it's a bug — please report it.