Observability and evals for AI-powered robots.
RoboTrace is the SDK, the storage, the replay, and the regression harness sitting between a robotics team's policy and the engineer who has to ship it. So your next quarter goes into the model, not into rebuilding the dashboard your team complains about every standup.
Robots regress in the field.
Every robotics team starts the same way — rosbags piling up on a dev machine, an MP4 from the wrist cam in someone's Downloads folder, regression checks in a notebook nobody can rerun. It works until a model regresses in the field and nobody can tell which run, which sensor, or which scene broke it.
The teams who ship anyway end up writing the same dashboard from scratch, every quarter. They build a custom episode store. They write a custom replay viewer. They duct-tape a regression harness. They lose two engineer-quarters to plumbing before a single new policy goes out the door.
RoboTrace is that dashboard, written once, ready on a pip install. So the engineers we work with spend their quarter on the policy that makes the robot do the right thing — not on the infrastructure underneath it.
Four principles we don't relitigate.
First-party data is the product.
Episodes can be tens of GB. We store them in object storage with signed URLs, never in Postgres, and we never train third-party general-purpose models on your sensor streams or policy weights.
The SDK contract is sacred.
Once log_episode ships 1.0, breaking its signature would orphan every running training job. We bump major versions for breakages and ship deprecation warnings for at least one minor first.
Reproducibility over magic.
Every episode keeps policy_version, env_version, git_sha, and seed. The dataset that trained v8 is still there, byte-identical, in week 12.
Robots in the physical world are not a demo.
Evals and re-rolls are useful proxies, not safety certifications. We never claim a green dashboard means it's safe to put a robot on a customer floor.
Phase 1 is invite-only and tightly scoped.
We're onboarding teams one at a time so we can stay close to the painful parts. Here's what's on the menu — and what isn't.
In scope today
- Episodes — synced video, sensors, action vectors with reproducibility metadata
- Object storage via Cloudflare R2 (or your own S3-compatible bucket)
- Replay & regression — re-roll candidate policies against historical observations
- ROS 2 (humble, jazzy), LeRobot datasets, raw NumPy episodes
- Out-of-distribution alerts and crash replays for deployed robots
- Tag, slice, and snapshot runs into reproducible training sets
Out of scope (Phase 1)
- ROS 1 — explicitly out of scope; not on any roadmap
- Episode bytes in Postgres — heavy data lives in object storage
- Self-serve signup during Phase 1 — every account goes through admin approval
- Stripe and paid tiers during Phase 1 — invite-only and free until product-market fit
- Locales beyond English — no i18n scaffolding in Phase 1
- Safety certification or regulatory sign-off — we ship analytics, not approvals
The stack, in one paragraph.
Boring tech where it doesn't matter, picked for operability. The interesting bits are the SDK contract and the regression harness — everything else is commodity, and we like it that way.
- Web app
- Next.js 16 (App Router) on Vercel
- Database & auth
- Supabase Postgres with Row-Level Security
- Object storage
- Cloudflare R2 (signed PUT / GET)
- SDK
- Python ≥ 3.10, httpx-only hard dep
- CLI auth
- Device-code flow (RFC 8628), no copy-pasted keys
- Robots supported
- ROS 2 (humble · jazzy), LeRobot, raw NumPy
Want to put RoboTrace on your robot?
We're onboarding teams one at a time. Tell us what you're training and how you're shipping today — we usually write back personally within a week.