slm.evaluate() (#1) ships the SDK + CLI surface, but nothing inside the product calls it yet — it's a leaf reachable only via slm.evaluate(...) and shadowlm eval. The task-quality number it produces is the "eval gate" the product thesis depends on ("run the shadow until it does the job as well as the frontier, then switch"), so it should be wired into the loop:
None are blocking the merge of #1; this just tracks turning the primitive into something the loop consumes.
slm.evaluate()(#1) ships the SDK + CLI surface, but nothing inside the product calls it yet — it's a leaf reachable only viaslm.evaluate(...)andshadowlm eval. The task-quality number it produces is the "eval gate" the product thesis depends on ("run the shadow until it does the job as well as the frontier, then switch"), so it should be wired into the loop:finetune()eval-on-holdout — pass an eval set, attach the task-quality score toTrainingRunalongsideeval_loss. This is the actual eval gate./v1/evaluateendpoint — so the studio can evaluate.None are blocking the merge of #1; this just tracks turning the primitive into something the loop consumes.