HDF5 adapter

Reads imitation-learning HDF5 files and turns each trajectory into a RoboTrace episode. It depends on h5py only - not robomimic, lerobot, or torch - and covers the two layouts that dominate behavior cloning today:

robomimic - one file, many trajectories under data/demo_0, data/demo_1, … Each demo holds actions, an obs group (proprioception + camera image stacks), rewards, dones, states. One demo → one episode.
ALOHA / ACT - one file per episode: /action plus an /observations group (qpos, qvel, effort, and images/<camera> stacks). The whole file → one episode.

from robotrace.adapters import hdf5
 
# ALOHA single-episode file → one RoboTrace episode.
hdf5.upload_episode(
    "episode_0.hdf5",
    policy_version="act-v1",
    env_version="aloha-cell-1",
    fps=50,
)
 
# robomimic multi-demo file → one episode per demo.
hdf5.upload_dataset("low_dim.hdf5", policy_version="bc-v3", fps=20)

Install

# Proprioception + actions only (no video).
pip install 'robotrace-dev[hdf5]==0.3.0'
 
# With image streams → MP4 encoding.
pip install 'robotrace-dev[hdf5,video]==0.3.0'

[hdf5] pulls in h5py and numpy (~few MB - a thin libhdf5 wrapper). [video] adds opencv-python to encode (T, H, W, C) image datasets into video.mp4. A sensor-only file never pays the opencv cost.

The four verbs

Verb	What it does
`hdf5.scan_file(path, fps=...)`	Read-only introspection. Returns a `FileSummary` with the detected layout, trajectory count, fps, robot, and camera datasets. No frames decoded - safe on a multi-GB file.
`hdf5.encode_episode(path, out, episode_index=...)`	Encodes one trajectory into `video.mp4` / `sensors.npz` / `actions.npz` under `out`. Returns an `EncodedEpisode`. No network.
`hdf5.upload_episode(path, ...)`	One-shot: encode one trajectory to a tempdir → `start_episode` + `upload_*` + `finalize`. Returns the finalized `Episode`.
`hdf5.upload_dataset(path, ...)`	Bulk: walk every trajectory in a multi-demo file and upload each. Returns the finalized `Episode` list, with an `on_progress` callback hook.

Start with scan_file to confirm the layout and fps before uploading:

summary = hdf5.scan_file("low_dim.hdf5")
print(summary.report())
# low_dim.hdf5
#   layout: robomimic
#   trajectories: 200
#   fps: 20
#   robot_type: Panda
#   env: Lift
#   cameras: obs/agentview_image, obs/robot0_eye_in_hand_image

Slot mapping

Dataset names within a trajectory are routed by classify_dataset, a pure function you can call directly to pin behavior:

Dataset name	Slot
`action`, `actions`, `action_dict/*`	`actions.npz`
`observations/images/<cam>`, `_image`, `_rgb`, `*_depth`	`video.mp4`
`rewards`, `dones`, `success`, `discount`	episode metadata
`qpos`, `qvel`, `robot0_eef_pos`, `states`, anything else	`sensors.npz`
`timestamp`, `frame_index`, `index`	dropped (bookkeeping)

A name that looks like an image but isn't a (T, H, W, C) uint8 stack falls back to sensors with a note in skipped_datasets.

NPZ layout

Proprioception lands in sensors.npz, actions in actions.npz, using the same namespaced layout as the ROS 2, LeRobot, and Gymnasium adapters:

observations/qpos/value    float32[T, K]   # flattened per-step values
observations/qpos/_t_ns    int64[T]        # synthetic step clock
action/value               float32[T, action_dim]

Each dataset is read as (T, …) and reshaped to (T, K).

Timestamps & fps

HDF5 imitation files rarely store a per-step clock - the spacing is uniform by construction - so timestamps are synthesised from fps. Pass the real capture rate (ALOHA is typically 50, robomimic 20) via fps=. robomimic's control_freq is read from data.attrs["env_args"] automatically. When nothing declares a rate, the adapter assumes 30 and sets fps_assumed: true in the episode metadata.

Images & color order

Image datasets are encoded to one video.mp4 (single camera) or a horizontal tile (multiple cameras, frames aligned by index). Stored arrays are assumed RGB and converted to BGR for the encoder; pass image_color="bgr" if your file already stores BGR.

hdf5.upload_episode(
    "episode_0.hdf5",
    fps=50,
    canonical_camera="observations/images/top",  # one camera, skip the tile
)

Episode metadata

The encoder merges HDF5 facts into episode metadata:

{
  "adapter": "hdf5",
  "hdf5_layout": "robomimic",
  "hdf5_source": "low_dim.hdf5",
  "hdf5_episode_index": 0,
  "hdf5_episode_key": "demo_0",
  "hdf5_trajectory_length": 137,
  "hdf5_robot_type": "Panda",
  "hdf5_env": "Lift",
  "hdf5_episode_outcome": { "dones": 1, "reward_sum": 4.0 }
}

Reproducibility fields (policy_version, env_version, git_sha, seed) come from the caller, same as every other adapter.

Defaults

Parameter	Default
`source`	`"replay"`
`episode_index`	`0`
`fps`	read from file, else `30` (assumed)
`image_color`	`"rgb"`

Roadmap

Not yet shipped:

RLDS / Open X-Embodiment (TFDS) import
Streaming row-group reads for files that don't fit in memory

See also: log_episode for raw NumPy logging, LeRobot adapter, Gymnasium adapter.