LeRobot adapter

Reads Hugging Face LeRobot datasets (formats v2.0 / v2.1 / v3.0) and creates one RoboTrace episode per trajectory. Tiny install footprint - the adapter does not depend on the heavy lerobot PyPI package (which would pull torch, torchvision, pyav, and several CUDA wheels). It reads the on-disk format directly with pyarrow + huggingface_hub, so the install adds about 20 MB on top of the base SDK.

from robotrace.adapters import lerobot
 
# Upload every trajectory in a Hub dataset as its own RoboTrace episode.
lerobot.upload_dataset(
    "lerobot/aloha_static_cups_open",
    policy_version="aloha-v1",
    env_version="aloha-cell-1",
)

That's the whole 95% case. Read on for the four explicit verbs, column auto-classification, multi-camera handling, and how this maps to LeRobot's data model.

Install

# Sensor / action only - passing canonical_camera or no cameras at all.
pip install 'robotrace-dev[lerobot]==0.3.0'
 
# With multi-camera horizontal tiling (most LeRobot datasets have 2+ cams).
pip install 'robotrace-dev[lerobot,video]==0.3.0'

Pinning is the most reliable install on the pre-1.0 line and drops once we cut 1.0.

[lerobot] pulls in huggingface_hub, pyarrow, and numpy. The [video] extra adds opencv-python for tiling multiple cameras into one video.mp4.

For v2.x datasets, single-camera uploads (or any canonical_camera="..."-pinned upload) don't need [video] because the source mp4 is copied byte-for-byte without re-encoding. For v3.0 datasets, video always needs [video]: episodes are concatenated into shared mp4 shards, so even a single camera has to be trimmed out by timestamp - there's no per-episode file to copy. Sensor / action-only v3.0 uploads still skip opencv.

The four verbs

VerbWhat it does
lerobot.scan_dataset(repo)Read-only introspection. Pulls only meta/* from the Hub. Returns a DatasetSummary with fps, episode count, frame count, camera list, and per-episode lengths and tasks.
lerobot.encode_episode(repo, idx, out)Fetch one episode's parquet + per-camera mp4s, write video.mp4 / sensors.npz / actions.npz into out. Returns an EncodedEpisode with the file paths and provenance metadata. No upload.
lerobot.upload_episode(repo, idx, ...)One-shot for a single trajectory: scan → encode to a tempdir → start_episode + ep.upload(kind, path) + finalize. Returns the finalized Episode.
lerobot.upload_dataset(repo, ...)Bulk: walk every (or a subset of) trajectory and call upload_episode for each. Sequential - one episode at a time, fresh tempdir, so disk stays at one trajectory's worth at any moment.

scan_dataset is the dry-run - most users start there to see how many episodes the adapter would upload before paying the network cost.

summary = lerobot.scan_dataset("lerobot/aloha_static_cups_open")
print(summary.report())
# lerobot/aloha_static_cups_open  (hub, v2.1, 50 fps)
#   episodes: 50, frames: 12500
#   cameras: observation.images.cam_high, observation.images.cam_low_left, observation.images.cam_low_right
#   features: action, observation.state, next.reward, next.done

If it looks right, swap scan_dataset for upload_dataset and you're done.

Local datasets vs. Hub datasets

The first argument can be either a Hub repo id (namespace/dataset-name) or a local directory containing the meta/, data/, videos/ layout. Resolution is automatic - anything that exists on disk wins, otherwise we hit the Hub.

# Hub dataset (downloads files lazily, caches in ~/.cache/huggingface).
lerobot.upload_dataset("lerobot/pusht", policy_version="pusht-v1")
 
# Local dataset on a workstation.
lerobot.upload_dataset("/data/robot_runs/2026-05-10/", policy_version="pusht-v1")
 
# Pin the Hub revision to a specific commit / tag / branch.
lerobot.upload_dataset(
    "lerobot/aloha_static_cups_open",
    revision="v2.1",
    policy_version="aloha-v1",
)

For private or gated datasets, set HF_TOKEN in your environment - huggingface_hub reads it automatically.

Column auto-classification

LeRobot datasets use a strong dotted-column convention, so the classifier is mechanical. The mapping (first match wins):

Column pattern→ Slot
observation.images.<camera_key>video (mp4 source)
action or action.<x>actions
next.reward, next.done, next.success, next.<x>episode_meta (rolled into per-episode metadata)
timestamp, frame_index, episode_index, index, task_indexinternal (skipped)
observation.statesensors
Any other observation.<x>sensors
Anything elsesensors (safe default)

Camera keys are read from info.json["features"] (any feature with dtype: "video", falling back to the observation.images. prefix), not from the parquet - LeRobot stores image data in videos/.../<key>/...mp4 and references it by feature name only. The classifier is a pure function - you can call lerobot.classify_column("...") to sanity-check what the encoder will do without writing anything to disk.

Multi-camera datasets

When a dataset has more than one observation.images.<key> feature, the adapter tiles the per-camera mp4s horizontally into a single video.mp4. Heights are black-padded so cameras with different resolutions still align. Cameras are emitted in the order they appear in info.json["features"], so the same dataset always produces the same mosaic.

If you only want one camera, pass canonical_camera:

lerobot.upload_episode(
    "lerobot/aloha_static_cups_open",
    episode_index=0,
    canonical_camera="observation.images.cam_high",
    policy_version="aloha-v1",
)

On v2.x datasets, single-camera uploads skip the opencv code path entirely - no tile, no re-encode. The source mp4 is copied byte-for-byte and pushed to R2 as-is. On v3.0 datasets the camera still has to be trimmed out of its shared shard (see below), so opencv is required even for one camera.

How sensors / actions get packed

Each non-image column contributes a set of arrays into a single NPZ file per slot. Layout uses the column name as a namespace and preserves per-frame timestamps:

sensors.npz
  observation.state/_t_ns       int64[N]            # nanosecond timestamps
  observation.state/value       float32[N, K]       # per-frame state vector
  observation.environment_state/_t_ns  int64[N]
  observation.environment_state/value  float32[N, M]
 
actions.npz
  action/_t_ns                  int64[N]
  action/value                  float32[N, A]
  action.gripper/_t_ns          int64[N]
  action.gripper/value          float32[N]

_t_ns is recovered from the parquet's timestamp column (LeRobot stores it in seconds; we convert to nanoseconds for symmetry with the ROS 2 adapter and the SDK ingest schema). Columns that aren't 1-D scalars or fixed-length lists of floats - e.g. structs, ragged lists, strings - are skipped and recorded in metadata.skipped_columns so you can spot them in the portal.

Episode outcome

next.reward, next.done, next.success and any other next.* column don't go into actions.npz. They describe the episode's outcome, so they're rolled up into the episode-level metadata instead:

{
  "adapter": "lerobot",
  "lerobot_repo_id": "lerobot/aloha_static_cups_open",
  "lerobot_codebase_version": "v2.1",
  "lerobot_episode_index": 0,
  "lerobot_episode_length": 250,
  "lerobot_tasks": ["pick up the cup"],
  "lerobot_episode_outcome": {
    "next.done": true,
    "next.reward": 0.42,
    "next.reward_sum": 87.5
  }
}

next.reward_sum is the trajectory's cumulative reward (LeRobot stores per-step reward, so we sum once during encoding) - what training pipelines usually want as a single quality signal per run.

Bulk uploads with progress

upload_dataset walks every trajectory by default. Pass episode_indices= to upload a slice, and on_progress= to surface per-episode progress in your own UI:

def progress(done, total, episode, error):
    if error is not None:
        print(f"  [{done}/{total}] FAILED: {error}")
    else:
        print(f"  [{done}/{total}] {episode.id}")
 
lerobot.upload_dataset(
    "lerobot/aloha_static_cups_open",
    policy_version="aloha-v1",
    env_version="aloha-cell-1",
    episode_indices=range(0, 10),
    on_progress=progress,
)

Errors don't abort the loop by default - a single corrupted parquet shouldn't kill a 50-episode upload. Pass stop_on_error=True to fail fast.

Encode-then-handle-it-yourself

encode_episode exposes the artifacts as files so you can inspect or post-process before uploading:

encoded = lerobot.encode_episode(
    "lerobot/aloha_static_cups_open",
    episode_index=0,
    output_dir="/tmp/encoded/",
)
 
print(encoded.duration_s, encoded.fps)
# 5.0 50.0
print([a.path for a in (encoded.video, encoded.sensors, encoded.actions) if a])
# [PosixPath('/tmp/encoded/video.mp4'),
#  PosixPath('/tmp/encoded/sensors.npz'),
#  PosixPath('/tmp/encoded/actions.npz')]
print(encoded.metadata["lerobot_episode_outcome"])
# {'next.done': True, 'next.reward': 0.95, 'next.reward_sum': 23.4}

Then drive start_episode / ep.upload(kind, path) directly. Same plumbing upload_episode uses internally.

Format compatibility

LeRobot versionStatusNotes
v2.0 / v2.1✅ supportedOne parquet per episode, one mp4 per episode per camera. Used by virtually every public lerobot/* Hub dataset through 2025.
v3.0✅ supportedMulti-episode shards (lerobot >= 0.3.x). The adapter reads meta/episodes/*.parquet, slices the shared data parquet to one episode, and trims each camera clip out of the shared mp4. Video needs the [video] extra.
v4+ / unknown❌ clear errorAnything newer or unrecognized raises a ConfigurationError rather than guessing at a layout.

Both layouts go through the same four verbs and produce the same video.mp4 / sensors.npz / actions.npz shape - you don't pass a version flag, the adapter detects it from info.json.

How v3.0 differs on disk

v3.0 stopped storing one file per episode. Instead, many episodes are concatenated into shared parquet/mp4 shards, and each episode's location is recorded as relational metadata:

my_dataset/
├── meta/
│   ├── info.json                          # + data_path / video_path templates
│   ├── tasks.parquet
│   └── episodes/chunk-000/file-000.parquet # per-episode rows + locators
├── data/chunk-000/file-000.parquet         # many episodes concatenated
└── videos/<camera_key>/chunk-000/file-000.mp4

Each episode row carries data/chunk_index + data/file_index (which data shard holds it), dataset_from_index / dataset_to_index (its row range), and per camera videos/<key>/{chunk_index, file_index, from_timestamp, to_timestamp} (which mp4 shard and the [from, to) window inside it). The adapter exposes these on EpisodeMeta as data_chunk_index, dataset_from_index, video_locators, etc. - so scan_dataset stays a cheap metadata-only call and the heavy shards are only fetched at encode_episode time.

Errors

ExceptionWhen
ConfigurationErrorRepo / path doesn't exist, format is v4+/unrecognized, parquet/mp4 missing, or pyarrow / huggingface_hub aren't installed
AuthErrorAPI key bad / revoked (raised by the underlying start_episode)
ValidationErrorServer rejected the create payload
TransportErrorNetwork / DNS / timeout during the create or upload

If an upload fails partway through, the adapter (via Client.start_episode's standard handling) flips the run to status="failed" with the failure reason in metadata.failure_reason before re-raising - so you don't end up with ghostly "recording" runs in the portal.