Policy version conventions

Coming soon

This page is a placeholder for an upcoming topic — shipping alongside the eval engine (Phase 7). The URL is stable, so any links you write today won't rot when the real page lands. For now, the `log_episode` reference covers the closest live topic.

What this page will cover

The eval engine re-rolls a new policy against a recorded observation stream, then diffs the new actions against what the recorded policy actually did. To do that, it needs policy_version to be:

Stable. Two episodes with the same policy_version should have been produced by the same weights.
Resolvable. Given a policy_version string, the eval engine needs a way to load those weights to re-run them.
Comparable. Adjacent versions should sort and diff in a meaningful order (semver, monotonic step counts, etc.).

This page will document:

Naming conventions per training paradigm (SL / IL / RL / VLA).
How to wire your weight registry so RoboTrace can resolve a policy_version to a checkpoint.
The few edge cases (frozen baselines, ensembles, distillation) where the model isn't a single checkpoint.

Until it lands

The log_episode reference covers the recommended naming patterns today. Treat policy_version as opaque text — RoboTrace doesn't validate it in Phase 1, but pick a convention you can resolve back to weights, or the eval engine won't be able to re-roll your runs.