Errors
Every error the SDK raises inherits from robotrace.RobotraceError.
Catch by type, not by parsing message strings — the messages are
human-readable and may change between minor versions. The types are
stable and follow the same "sacred contract" rule as
log_episode.
The hierarchy
RobotraceError
├── ConfigurationError # missing api_key / base_url, bad path, etc.
├── TransportError # network / timeout / DNS / TLS
└── APIError # the server responded with an error
├── AuthError # 401 — bad / missing / revoked key
├── NotFoundError # 404 — episode id doesn't exist (or cross-tenant)
├── ConflictError # 409 — episode is archived, etc.
├── ValidationError # 400 — payload didn't match the schema
└── ServerError # 5xx — flag for retriesAPIError and its subclasses carry two extra attributes for
debugging:
exc.status_code # int — the HTTP status the server returned
exc.response_body # parsed JSON body (or raw text on non-JSON 5xx)When you'll see each one
ConfigurationError
The SDK is missing or misconfigured. Caught at the call site, never reaches the network. Common cases:
api_keynot passed andROBOTRACE_API_KEYnot setbase_urlnot passed andROBOTRACE_BASE_URLnot set- A path passed to
upload_video(...)doesn't exist - The deployment hasn't wired R2 (
storage="unconfigured") and your code callsep.upload_video(...)anyway — the SDK fails loud rather than silently dropping bytes
from robotrace import ConfigurationError
try:
rt.log_episode(name="oops", video="/missing/file.mp4")
except ConfigurationError as exc:
print(f"fix your inputs: {exc}")Don't retry — the inputs need to change first.
TransportError
The HTTP request failed before the server could respond. DNS, TCP reset, TLS handshake, or a timeout. The request is not known to have landed, so retrying with backoff is generally safe:
from robotrace import TransportError
import time
for attempt in range(3):
try:
rt.log_episode(...)
break
except TransportError:
if attempt == 2:
raise
time.sleep(2 ** attempt) # 1, 2, 4 secondsThe SDK doesn't auto-retry because what's safe depends on the call:
re-trying a start_episode after a transport error is fine (server
might have created the row twice, but each gets a unique id);
re-trying an upload PUT against an expired signed URL just wastes
bytes.
AuthError (401)
The API key is missing, malformed, or revoked. Don't retry — the user needs to mint a fresh key in Admin → Clients → <client> → API access.
from robotrace import AuthError
try:
rt.log_episode(...)
except AuthError as exc:
alerts.notify(
"RoboTrace key needs rotation",
details=str(exc),
)
raiseNotFoundError (404)
The episode id doesn't exist, or belongs to a different client. We deliberately make these two cases indistinguishable server-side to avoid a UUID-enumeration oracle.
This won't happen during normal log_episode(...) flow — you only
see it if you constructed an Episode from a stale id and tried to
finalize it.
ConflictError (409)
The request is well-formed but conflicts with current server state.
The most common cause: trying to finalize(...) an episode that's
already been archived in the admin UI.
Restore the episode from /admin/episodes/<id> before retrying, or
start a fresh episode.
ValidationError (400)
The payload didn't pass server-side validation. The server's
error field tells you which constraint tripped:
from robotrace import ValidationError
try:
rt.log_episode(name="x" * 500, ...) # name is capped at 200 chars
except ValidationError as exc:
print(exc) # human message
print(exc.response_body) # {'error': 'name must be ≤ 200 chars'}Don't retry without changing the inputs.
ServerError (5xx)
Something blew up on the server side — database hiccup, R2 signing
failed, etc. Worth retrying with exponential backoff. The SDK
deliberately does not auto-retry because retrying a finalize
twice could double-bill artifact storage in future paid tiers.
from robotrace import ServerError
import time
for attempt in range(5):
try:
rt.log_episode(...)
break
except ServerError:
if attempt == 4:
raise
time.sleep(2 ** attempt) # 1, 2, 4, 8, 16If ServerError persists past a few retries, check
status.robotrace.dev (Phase 2) or
ping us — there's likely an incident.
Catch-all pattern
For training scripts where you want one alert path for any RoboTrace problem without distinguishing types:
from robotrace import RobotraceError
try:
rt.log_episode(...)
except RobotraceError as exc:
# Anything from the SDK — auth, config, network, server.
# User code bugs (TypeError, ValueError) still propagate.
sentry_sdk.capture_exception(exc)
raiseRobotraceError deliberately does not inherit from
OSError / IOError — we don't want a blanket except Exception:
in your training loop to silently eat our errors and leave you
wondering why nothing's showing up in the portal.
Server vs SDK redaction
The SDK never logs:
- The value of your API key
- The body of an ingest request (which can carry trade secrets)
- Signed PUT URLs (they expire fast but still)
The server side has the same rule — see AGENTS.md → "Don't
console.log SDK ingest payloads or API keys." If you find an
exception message that leaks any of the above, it's a bug — please
report it.