HDF5 adapter

Reads imitation-learning HDF5 files and turns each trajectory into a RoboTrace episode. It depends on h5py only - not robomimic, lerobot, or torch - and covers the two layouts that dominate behavior cloning today:

  • robomimic - one file, many trajectories under data/demo_0, data/demo_1, … Each demo holds actions, an obs group (proprioception + camera image stacks), rewards, dones, states. One demo → one episode.
  • ALOHA / ACT - one file per episode: /action plus an /observations group (qpos, qvel, effort, and images/<camera> stacks). The whole file → one episode.
from robotrace.adapters import hdf5
 
# ALOHA single-episode file → one RoboTrace episode.
hdf5.upload_episode(
    "episode_0.hdf5",
    policy_version="act-v1",
    env_version="aloha-cell-1",
    fps=50,
)
 
# robomimic multi-demo file → one episode per demo.
hdf5.upload_dataset("low_dim.hdf5", policy_version="bc-v3", fps=20)

Install

# Proprioception + actions only (no video).
pip install 'robotrace-dev[hdf5]==0.3.0'
 
# With image streams → MP4 encoding.
pip install 'robotrace-dev[hdf5,video]==0.3.0'

[hdf5] pulls in h5py and numpy (~few MB - a thin libhdf5 wrapper). [video] adds opencv-python to encode (T, H, W, C) image datasets into video.mp4. A sensor-only file never pays the opencv cost.

The four verbs

VerbWhat it does
hdf5.scan_file(path, fps=...)Read-only introspection. Returns a FileSummary with the detected layout, trajectory count, fps, robot, and camera datasets. No frames decoded - safe on a multi-GB file.
hdf5.encode_episode(path, out, episode_index=...)Encodes one trajectory into video.mp4 / sensors.npz / actions.npz under out. Returns an EncodedEpisode. No network.
hdf5.upload_episode(path, ...)One-shot: encode one trajectory to a tempdir → start_episode + upload_* + finalize. Returns the finalized Episode.
hdf5.upload_dataset(path, ...)Bulk: walk every trajectory in a multi-demo file and upload each. Returns the finalized Episode list, with an on_progress callback hook.

Start with scan_file to confirm the layout and fps before uploading:

summary = hdf5.scan_file("low_dim.hdf5")
print(summary.report())
# low_dim.hdf5
#   layout: robomimic
#   trajectories: 200
#   fps: 20
#   robot_type: Panda
#   env: Lift
#   cameras: obs/agentview_image, obs/robot0_eye_in_hand_image

Slot mapping

Dataset names within a trajectory are routed by classify_dataset, a pure function you can call directly to pin behavior:

Dataset nameSlot
action, actions, action_dict/*actions.npz
observations/images/<cam>, *_image, *_rgb, *_depthvideo.mp4
rewards, dones, success, discountepisode metadata
qpos, qvel, robot0_eef_pos, states, anything elsesensors.npz
timestamp, frame_index, indexdropped (bookkeeping)

A name that looks like an image but isn't a (T, H, W, C) uint8 stack falls back to sensors with a note in skipped_datasets.

NPZ layout

Proprioception lands in sensors.npz, actions in actions.npz, using the same namespaced layout as the ROS 2, LeRobot, and Gymnasium adapters:

observations/qpos/value    float32[T, K]   # flattened per-step values
observations/qpos/_t_ns    int64[T]        # synthetic step clock
action/value               float32[T, action_dim]

Each dataset is read as (T, …) and reshaped to (T, K).

Timestamps & fps

HDF5 imitation files rarely store a per-step clock - the spacing is uniform by construction - so timestamps are synthesised from fps. Pass the real capture rate (ALOHA is typically 50, robomimic 20) via fps=. robomimic's control_freq is read from data.attrs["env_args"] automatically. When nothing declares a rate, the adapter assumes 30 and sets fps_assumed: true in the episode metadata.

Images & color order

Image datasets are encoded to one video.mp4 (single camera) or a horizontal tile (multiple cameras, frames aligned by index). Stored arrays are assumed RGB and converted to BGR for the encoder; pass image_color="bgr" if your file already stores BGR.

hdf5.upload_episode(
    "episode_0.hdf5",
    fps=50,
    canonical_camera="observations/images/top",  # one camera, skip the tile
)

Episode metadata

The encoder merges HDF5 facts into episode metadata:

{
  "adapter": "hdf5",
  "hdf5_layout": "robomimic",
  "hdf5_source": "low_dim.hdf5",
  "hdf5_episode_index": 0,
  "hdf5_episode_key": "demo_0",
  "hdf5_trajectory_length": 137,
  "hdf5_robot_type": "Panda",
  "hdf5_env": "Lift",
  "hdf5_episode_outcome": { "dones": 1, "reward_sum": 4.0 }
}

Reproducibility fields (policy_version, env_version, git_sha, seed) come from the caller, same as every other adapter.

Defaults

ParameterDefault
source"replay"
episode_index0
fpsread from file, else 30 (assumed)
image_color"rgb"

Roadmap

Not yet shipped:

  • RLDS / Open X-Embodiment (TFDS) import
  • Streaming row-group reads for files that don't fit in memory

See also: log_episode for raw NumPy logging, LeRobot adapter, Gymnasium adapter.