Skip to content

Sgrasse/develop/2353 plot train#2354

Open
grassesi wants to merge 4 commits into
ecmwf:developfrom
grassesi:sgrasse/develop/2353-plot_train
Open

Sgrasse/develop/2353 plot train#2354
grassesi wants to merge 4 commits into
ecmwf:developfrom
grassesi:sgrasse/develop/2353-plot_train

Conversation

@grassesi
Copy link
Copy Markdown
Contributor

@grassesi grassesi commented May 12, 2026

Description

Issue Number

closes #2353

Is this PR a draft? Mark it as draft.

Checklist before asking for review

  • I have performed a self-review of my code
  • My changes comply with basic sanity checks:
    • I have fixed formatting issues with ./scripts/actions.sh lint
    • I have run unit tests with ./scripts/actions.sh unit-test
    • I have documented my code and I have updated the docstrings.
    • I have added unit tests, if relevant
  • I have tried my changes with data and code:
    • I have run the integration tests with ./scripts/actions.sh integration-test
    • (bigger changes) I have run a full training and I have written in the comment the run_id(s): launch-slurm.py --time 60
    • (bigger changes and experiments) I have shared a hegdedoc in the github issue with all the configurations and runs for this experiments
  • I have informed and aligned with people impacted by my change:
    • for config changes: the MatterMost channels and/or a design doc
    • for changes of dependencies: the MatterMost software development channel

@grassesi grassesi force-pushed the sgrasse/develop/2353-plot_train branch from 8c1343a to d65d133 Compare May 12, 2026 14:17
@grassesi grassesi requested a review from clessig May 12, 2026 14:27
@sophie-xhonneux
Copy link
Copy Markdown
Contributor

Is this urgent? Because if not I would not merge this, because it breaks everyone's plotting of runs, not sure what the value added is.

@github-actions github-actions Bot added the eval anything related to the model evaluation pipeline label May 12, 2026
@grassesi
Copy link
Copy Markdown
Contributor Author

grassesi commented May 13, 2026

Is this urgent? Because if not I would not merge this, because it breaks everyone's plotting of runs, not sure what the value added is.

Thanks for the quick response. This is not urgent, just something I noticed. Can you point me to what exactly is breaking peoples workflows? The removed code in train_logger.py was dysfunctional as far as I see. The change in plot_training.py enables previously (erroniously?) disabled code for additional plots. If indeed it breaks established workflows it definitively should not be merged.

@sophie-xhonneux
Copy link
Copy Markdown
Contributor

I didn't test, but from looking at the code (briefly) I worry, plot_train wouldn't be able to read old jsons.

@grassesi grassesi force-pushed the sgrasse/develop/2353-plot_train branch from 93b49f6 to ba7f11c Compare May 13, 2026 11:45
@grassesi
Copy link
Copy Markdown
Contributor Author

I didn't test, but from looking at the code (briefly) I worry, plot_train wouldn't be able to read old jsons.

If "old" jsons means:

  • what currently gets logged by develop => logs/<run_id>/<run_id>_train_metrics.json (has not been changed since 2 Months): This logic should not be affected at all.
  • the old .txt files: This logic is dead for almost a year.

But please lets not try to have this back-and-forth discussion. Please pinpoint your concerns in a review.

Copy link
Copy Markdown
Contributor

@TillHae TillHae left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Just some comments regarding things I would do different

Comment thread src/weathergen/utils/plot_training.py
Comment thread src/weathergen/utils/plot_training.py Outdated
if col_split[2].lower() == err.lower() and col_split[3] in channels:
data_cols += [col]

data_cols = list(data_cols)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data_cols already is a list

@clessig
Copy link
Copy Markdown
Collaborator

clessig commented May 13, 2026

I didn't test, but from looking at the code (briefly) I worry, plot_train wouldn't be able to read old jsons.

If "old" jsons means:

  • what currently gets logged by develop => logs/<run_id>/<run_id>_train_metrics.json (has not been changed since 2 Months): This logic should not be affected at all.
  • the old .txt files: This logic is dead for almost a year.

But please lets not try to have this back-and-forth discussion. Please pinpoint your concerns in a review.

Agreed, the text files can be removed.

@sophie-xhonneux
Copy link
Copy Markdown
Contributor

I didn't test, but from looking at the code (briefly) I worry, plot_train wouldn't be able to read old jsons.

If "old" jsons means:

* what currently gets logged by develop => logs/<run_id>/<run_id>_train_metrics.json (has not been changed since 2 Months): This logic should not be affected at all.

* the old .txt files: This logic is dead for almost a year.

But please lets not try to have this back-and-forth discussion. Please pinpoint your concerns in a review.

old jsons is the train_metrics files that plot_train reads...

Did you test if it works with those e.g. JEPA/forecasting ?

JEPA -s LatentLossSSLStudentTeacher.JEPA
forecasting --streams ERA5 --channels z_500 u_850 v_850 t_850 q_850 2t 10u avg --metrics mse --forecast-steps 3 --log-x

@grassesi grassesi force-pushed the sgrasse/develop/2353-plot_train branch from 8ccd80a to 889da45 Compare May 19, 2026 11:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

eval anything related to the model evaluation pipeline

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

Cleanup TrainLogger and plot_training

4 participants