You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I went through the data pipeline to check the future-leakage concerns from #227 and #265, and wanted to confirm my reading of the current code:
Inference path looks clean.KronosPredictor.predict() computes the z-score mean/std on the lookback window only (model/kronos.py:544), so no future information leaks at predict time.
The finetune dataset bug from the dataset leaks future information #227 also appears fixed.finetune/dataset.py:107-117 now computes x_mean/x_std strictly from the lookback portion (past_x = x[:past_len]) rather than over the full lookback+horizon window.
What's still unclear to me — and I think this is what #277 was asking before it went quiet — is whether the pretrained checkpoints on HuggingFace (NeoQuasar/Kronos-small / -base / the tokenizer) were themselves retrained after this normalization fix, or whether they predate it. If the released weights were pretrained with the old full-window normalization, the leakage would effectively be baked into the checkpoints even though the current dataset code is correct.
Could you clarify:
Were the released HF weights retrained after the normalization fix, or are they from before it?
If they predate the fix, are there plans to re-release retrained checkpoints?
Hi, thanks for open-sourcing Kronos.
I went through the data pipeline to check the future-leakage concerns from #227 and #265, and wanted to confirm my reading of the current code:
KronosPredictor.predict()computes the z-score mean/std on the lookback window only (model/kronos.py:544), so no future information leaks at predict time.finetune/dataset.py:107-117now computesx_mean/x_stdstrictly from the lookback portion (past_x = x[:past_len]) rather than over the full lookback+horizon window.What's still unclear to me — and I think this is what #277 was asking before it went quiet — is whether the pretrained checkpoints on HuggingFace (
NeoQuasar/Kronos-small/-base/ the tokenizer) were themselves retrained after this normalization fix, or whether they predate it. If the released weights were pretrained with the old full-window normalization, the leakage would effectively be baked into the checkpoints even though the current dataset code is correct.Could you clarify:
Thanks!