LeRobot adapter
Reads Hugging Face LeRobot
datasets (formats v2.0 / v2.1 / v3.0) and creates one RoboTrace
episode per trajectory. Tiny install footprint - the adapter does
not depend on the heavy lerobot PyPI package (which would pull
torch, torchvision, pyav, and several CUDA wheels). It reads the
on-disk format directly with pyarrow + huggingface_hub, so the
install adds about 20 MB on top of the base SDK.
from robotrace.adapters import lerobot
# Upload every trajectory in a Hub dataset as its own RoboTrace episode.
lerobot.upload_dataset(
"lerobot/aloha_static_cups_open",
policy_version="aloha-v1",
env_version="aloha-cell-1",
)That's the whole 95% case. Read on for the four explicit verbs, column auto-classification, multi-camera handling, and how this maps to LeRobot's data model.
Install
# Sensor / action only - passing canonical_camera or no cameras at all.
pip install 'robotrace-dev[lerobot]==0.3.0'
# With multi-camera horizontal tiling (most LeRobot datasets have 2+ cams).
pip install 'robotrace-dev[lerobot,video]==0.3.0'Pinning is the most reliable install on the pre-1.0 line and
drops once we cut 1.0.
[lerobot] pulls in huggingface_hub, pyarrow, and numpy. The
[video] extra adds opencv-python for tiling multiple cameras
into one video.mp4.
For v2.x datasets, single-camera uploads (or any
canonical_camera="..."-pinned upload) don't need [video]
because the source mp4 is copied byte-for-byte without re-encoding.
For v3.0 datasets, video always needs [video]: episodes are
concatenated into shared mp4 shards, so even a single camera has to
be trimmed out by timestamp - there's no per-episode file to copy.
Sensor / action-only v3.0 uploads still skip opencv.
The four verbs
| Verb | What it does |
|---|---|
lerobot.scan_dataset(repo) | Read-only introspection. Pulls only meta/* from the Hub. Returns a DatasetSummary with fps, episode count, frame count, camera list, and per-episode lengths and tasks. |
lerobot.encode_episode(repo, idx, out) | Fetch one episode's parquet + per-camera mp4s, write video.mp4 / sensors.npz / actions.npz into out. Returns an EncodedEpisode with the file paths and provenance metadata. No upload. |
lerobot.upload_episode(repo, idx, ...) | One-shot for a single trajectory: scan → encode to a tempdir → start_episode + ep.upload(kind, path) + finalize. Returns the finalized Episode. |
lerobot.upload_dataset(repo, ...) | Bulk: walk every (or a subset of) trajectory and call upload_episode for each. Sequential - one episode at a time, fresh tempdir, so disk stays at one trajectory's worth at any moment. |
scan_dataset is the dry-run - most users start there to see how
many episodes the adapter would upload before paying the network
cost.
summary = lerobot.scan_dataset("lerobot/aloha_static_cups_open")
print(summary.report())
# lerobot/aloha_static_cups_open (hub, v2.1, 50 fps)
# episodes: 50, frames: 12500
# cameras: observation.images.cam_high, observation.images.cam_low_left, observation.images.cam_low_right
# features: action, observation.state, next.reward, next.doneIf it looks right, swap scan_dataset for upload_dataset and
you're done.
Local datasets vs. Hub datasets
The first argument can be either a Hub repo id (namespace/dataset-name)
or a local directory containing the meta/, data/, videos/
layout. Resolution is automatic - anything that exists on disk wins,
otherwise we hit the Hub.
# Hub dataset (downloads files lazily, caches in ~/.cache/huggingface).
lerobot.upload_dataset("lerobot/pusht", policy_version="pusht-v1")
# Local dataset on a workstation.
lerobot.upload_dataset("/data/robot_runs/2026-05-10/", policy_version="pusht-v1")
# Pin the Hub revision to a specific commit / tag / branch.
lerobot.upload_dataset(
"lerobot/aloha_static_cups_open",
revision="v2.1",
policy_version="aloha-v1",
)For private or gated datasets, set HF_TOKEN in your environment -
huggingface_hub reads it automatically.
Column auto-classification
LeRobot datasets use a strong dotted-column convention, so the classifier is mechanical. The mapping (first match wins):
| Column pattern | → Slot |
|---|---|
observation.images.<camera_key> | video (mp4 source) |
action or action.<x> | actions |
next.reward, next.done, next.success, next.<x> | episode_meta (rolled into per-episode metadata) |
timestamp, frame_index, episode_index, index, task_index | internal (skipped) |
observation.state | sensors |
Any other observation.<x> | sensors |
| Anything else | sensors (safe default) |
Camera keys are read from info.json["features"] (any feature with
dtype: "video", falling back to the observation.images. prefix),
not from the parquet - LeRobot stores image data in
videos/.../<key>/...mp4 and references it by feature name only. The
classifier is a pure function - you can call
lerobot.classify_column("...") to sanity-check what the encoder
will do without writing anything to disk.
Multi-camera datasets
When a dataset has more than one observation.images.<key> feature,
the adapter tiles the per-camera mp4s horizontally into a single
video.mp4. Heights are black-padded so cameras with different
resolutions still align. Cameras are emitted in the order they
appear in info.json["features"], so the same dataset always
produces the same mosaic.
If you only want one camera, pass canonical_camera:
lerobot.upload_episode(
"lerobot/aloha_static_cups_open",
episode_index=0,
canonical_camera="observation.images.cam_high",
policy_version="aloha-v1",
)On v2.x datasets, single-camera uploads skip the opencv code path entirely - no tile, no re-encode. The source mp4 is copied byte-for-byte and pushed to R2 as-is. On v3.0 datasets the camera still has to be trimmed out of its shared shard (see below), so opencv is required even for one camera.
How sensors / actions get packed
Each non-image column contributes a set of arrays into a single NPZ file per slot. Layout uses the column name as a namespace and preserves per-frame timestamps:
sensors.npz
observation.state/_t_ns int64[N] # nanosecond timestamps
observation.state/value float32[N, K] # per-frame state vector
observation.environment_state/_t_ns int64[N]
observation.environment_state/value float32[N, M]
actions.npz
action/_t_ns int64[N]
action/value float32[N, A]
action.gripper/_t_ns int64[N]
action.gripper/value float32[N]_t_ns is recovered from the parquet's timestamp column (LeRobot
stores it in seconds; we convert to nanoseconds for symmetry with
the ROS 2 adapter and the SDK ingest schema). Columns that aren't
1-D scalars or fixed-length lists of floats - e.g. structs, ragged
lists, strings - are skipped and recorded in
metadata.skipped_columns so you can spot them in the portal.
Episode outcome
next.reward, next.done, next.success and any other next.*
column don't go into actions.npz. They describe the episode's
outcome, so they're rolled up into the episode-level metadata
instead:
{
"adapter": "lerobot",
"lerobot_repo_id": "lerobot/aloha_static_cups_open",
"lerobot_codebase_version": "v2.1",
"lerobot_episode_index": 0,
"lerobot_episode_length": 250,
"lerobot_tasks": ["pick up the cup"],
"lerobot_episode_outcome": {
"next.done": true,
"next.reward": 0.42,
"next.reward_sum": 87.5
}
}next.reward_sum is the trajectory's cumulative reward (LeRobot
stores per-step reward, so we sum once during encoding) - what
training pipelines usually want as a single quality signal per run.
Bulk uploads with progress
upload_dataset walks every trajectory by default. Pass
episode_indices= to upload a slice, and on_progress= to surface
per-episode progress in your own UI:
def progress(done, total, episode, error):
if error is not None:
print(f" [{done}/{total}] FAILED: {error}")
else:
print(f" [{done}/{total}] {episode.id}")
lerobot.upload_dataset(
"lerobot/aloha_static_cups_open",
policy_version="aloha-v1",
env_version="aloha-cell-1",
episode_indices=range(0, 10),
on_progress=progress,
)Errors don't abort the loop by default - a single corrupted parquet
shouldn't kill a 50-episode upload. Pass stop_on_error=True to
fail fast.
Encode-then-handle-it-yourself
encode_episode exposes the artifacts as files so you can inspect
or post-process before uploading:
encoded = lerobot.encode_episode(
"lerobot/aloha_static_cups_open",
episode_index=0,
output_dir="/tmp/encoded/",
)
print(encoded.duration_s, encoded.fps)
# 5.0 50.0
print([a.path for a in (encoded.video, encoded.sensors, encoded.actions) if a])
# [PosixPath('/tmp/encoded/video.mp4'),
# PosixPath('/tmp/encoded/sensors.npz'),
# PosixPath('/tmp/encoded/actions.npz')]
print(encoded.metadata["lerobot_episode_outcome"])
# {'next.done': True, 'next.reward': 0.95, 'next.reward_sum': 23.4}Then drive start_episode / ep.upload(kind, path) directly.
Same plumbing upload_episode uses internally.
Format compatibility
| LeRobot version | Status | Notes |
|---|---|---|
| v2.0 / v2.1 | ✅ supported | One parquet per episode, one mp4 per episode per camera. Used by virtually every public lerobot/* Hub dataset through 2025. |
| v3.0 | ✅ supported | Multi-episode shards (lerobot >= 0.3.x). The adapter reads meta/episodes/*.parquet, slices the shared data parquet to one episode, and trims each camera clip out of the shared mp4. Video needs the [video] extra. |
| v4+ / unknown | ❌ clear error | Anything newer or unrecognized raises a ConfigurationError rather than guessing at a layout. |
Both layouts go through the same four verbs and produce the same
video.mp4 / sensors.npz / actions.npz shape - you don't pass a
version flag, the adapter detects it from info.json.
How v3.0 differs on disk
v3.0 stopped storing one file per episode. Instead, many episodes are concatenated into shared parquet/mp4 shards, and each episode's location is recorded as relational metadata:
my_dataset/
├── meta/
│ ├── info.json # + data_path / video_path templates
│ ├── tasks.parquet
│ └── episodes/chunk-000/file-000.parquet # per-episode rows + locators
├── data/chunk-000/file-000.parquet # many episodes concatenated
└── videos/<camera_key>/chunk-000/file-000.mp4Each episode row carries data/chunk_index + data/file_index (which
data shard holds it), dataset_from_index / dataset_to_index (its
row range), and per camera videos/<key>/{chunk_index, file_index, from_timestamp, to_timestamp} (which mp4 shard and the
[from, to) window inside it). The adapter exposes these on
EpisodeMeta as data_chunk_index, dataset_from_index,
video_locators, etc. - so scan_dataset stays a cheap metadata-only
call and the heavy shards are only fetched at encode_episode time.
Errors
| Exception | When |
|---|---|
ConfigurationError | Repo / path doesn't exist, format is v4+/unrecognized, parquet/mp4 missing, or pyarrow / huggingface_hub aren't installed |
AuthError | API key bad / revoked (raised by the underlying start_episode) |
ValidationError | Server rejected the create payload |
TransportError | Network / DNS / timeout during the create or upload |
If an upload fails partway through, the adapter (via
Client.start_episode's standard handling) flips the run to
status="failed" with the failure reason in
metadata.failure_reason before re-raising - so you don't end up
with ghostly "recording" runs in the portal.