Object storage
Episode artifacts (video, sensors, actions) live in Cloudflare R2, an S3-compatible object store. Heavy bytes never touch the RoboTrace origin server — the SDK uploads them directly to R2 using short-lived signed PUT URLs.
Why R2
R2 charges $0 egress vs. S3's ~$0.09/GB. For a product where every episode page replays multi-GB videos to engineers' browsers, that single difference pays for the rest of the infra many times over. Storage and ops pricing is also slightly cheaper than S3:
| R2 | S3 (Standard) | |
|---|---|---|
| Storage | $0.015 / GB-month | $0.023 / GB-month |
| Egress | $0 | ~$0.09 / GB |
| Free tier | 10 GB + 1M Class A + 10M Class B / month, forever | 5 GB for 12 months |
R2 speaks the S3 wire protocol, so any S3-compatible tool — including
our @aws-sdk/client-s3 and the Python SDK — works against it by
swapping the endpoint.
Dev-optional, prod-required
R2 is deliberately optional in development:
- Without R2 (env vars blank) — the ingest endpoint returns
storage: "unconfigured"and an emptyupload_urlsarray. The SDK can still test the metadata path end-to-end. Episodes show up in/admin/episodeswith no playable artifacts. - With R2 (all four env vars set) — the ingest endpoint mints signed PUT URLs and the SDK streams files straight to the bucket.
The Python SDK exposes the mode on Episode.storage so your
training scripts can bail loud if they expected R2 and didn't get it.
Before pointing real users at any deployment, walk through the production setup checklist. Without R2 the product literally has no way to store the actual episode data — it'd just be a metadata browser.
Required env vars
R2_ACCOUNT_ID=… # from Cloudflare → R2 sidebar
R2_ACCESS_KEY_ID=… # from R2 → API Tokens → Create
R2_SECRET_ACCESS_KEY=… # shown once at token creation
R2_BUCKET_EPISODES=… # bucket name; we recommend "robotrace-episodes"
R2_PUBLIC_URL= # optional; set when you connect a custom domainFull Cloudflare clickpath in
docs/PRODUCTION-SETUP.md
→ §1.
Bucket layout
Objects are keyed by episodes/<client_id>/<episode_id>/<file>:
episodes/
└── 8a4f01c2-…/ # client id
└── e8a4f01c-2b39-…/ # episode id
├── video.mp4
├── sensors.bin
└── actions.parquetThis layout means:
- A single client's data is a single prefix → easy to lifecycle, audit, or hard-delete.
- Object names are predictable so the admin UI can construct fresh signed read URLs without a database lookup.
- Filename always ends in the canonical extension expected for the artifact kind (helps when a CDN or browser sniffs MIME types).
Signed URL TTL
PUT URLs are valid 30 minutes after they're minted. That's:
- Long enough for a slow uplink to push a multi-GB video.
- Short enough that a leaked URL isn't a long-lived credential.
If your upload exceeds 30 minutes, the SDK currently re-calls
POST /api/ingest/episode to mint fresh URLs (which creates a new
episode row, today). A "regenerate URLs for an existing episode"
endpoint is on the 0.2 roadmap.
Content-Type matters
Each PUT URL is signed with a specific Content-Type. The PUT must
match or R2 returns 403. The SDK handles this for you; for raw
HTTP clients see Ingest API → §2.
CORS
Phase 1 uploads come from the Python SDK, which doesn't need CORS.
When the in-browser upload UI lands (Phase 3+), add a CORS rule on
the bucket allowing PUT/GET from your app origin. Example in
the
production checklist.
Don'ts
- Don't put episode bytes in Postgres. The DB row holds metadata
- a URL; bytes live in R2. This is rule one in
AGENTS.md.
- a URL; bytes live in R2. This is rule one in
- Don't treat the public URL as authenticated. R2 buckets
connected to a custom domain via
R2_PUBLIC_URLare publicly readable by URL — that's why bucket keys include the random episode UUID (effectively unguessable). For the upcoming portal, read access will move behind signed GET URLs. - Don't hand out the
R2_SECRET_ACCESS_KEYto clients or staff. Only the Vercel server runtime needs it — rotate it quarterly per the production checklist.