Policy version conventions
Coming soon
This page is a placeholder for an upcoming topic — shipping alongside the eval engine (Phase 7). The URL is stable, so any links you write today won't rot when the real page lands. For now, the `log_episode` reference covers the closest live topic.
What this page will cover
The eval engine re-rolls a new policy against a recorded observation
stream, then diffs the new actions against what the recorded policy
actually did. To do that, it needs policy_version to be:
- Stable. Two episodes with the same
policy_versionshould have been produced by the same weights. - Resolvable. Given a
policy_versionstring, the eval engine needs a way to load those weights to re-run them. - Comparable. Adjacent versions should sort and diff in a meaningful order (semver, monotonic step counts, etc.).
This page will document:
- Naming conventions per training paradigm (SL / IL / RL / VLA).
- How to wire your weight registry so RoboTrace can resolve a
policy_versionto a checkpoint. - The few edge cases (frozen baselines, ensembles, distillation) where the model isn't a single checkpoint.
Until it lands
The log_episode
reference covers the recommended naming patterns today. Treat
policy_version as opaque text — RoboTrace doesn't validate it in
Phase 1, but pick a convention you can resolve back to weights, or
the eval engine won't be able to re-roll your runs.