Skip to content

call BatchSample method instead of repeating content#2348

Open
TillHae wants to merge 4 commits into
ecmwf:developfrom
TillHae:thauer/develop/issue_1791
Open

call BatchSample method instead of repeating content#2348
TillHae wants to merge 4 commits into
ecmwf:developfrom
TillHae:thauer/develop/issue_1791

Conversation

@TillHae
Copy link
Copy Markdown
Contributor

@TillHae TillHae commented May 11, 2026

Description

In get_num_source_steps - ModelBatch everything from get_num_steps - BatchSample is just repeated. Instead we can also just call the method on the parameter target_samples, which is a BacthSample object.

CC @clessig

Issue Number

Closes #1791

Is this PR a draft? Mark it as draft.

Checklist before asking for review

  • I have performed a self-review of my code
  • My changes comply with basic sanity checks:
    • I have fixed formatting issues with ./scripts/actions.sh lint
    • I have run unit tests with ./scripts/actions.sh unit-test
    • I have documented my code and I have updated the docstrings.
    • I have added unit tests, if relevant
  • I have tried my changes with data and code:
    • I have run the integration tests with ./scripts/actions.sh integration-test
    • (bigger changes) I have run a full training and I have written in the comment the run_id(s): launch-slurm.py --time 60
    • (bigger changes and experiments) I have shared a hegdedoc in the github issue with all the configurations and runs for this experiments
  • I have informed and aligned with people impacted by my change:
    • for config changes: the MatterMost channels and/or a design doc
    • for changes of dependencies: the MatterMost software development channel

@github-actions github-actions Bot added data Anything related to the datasets used in the project model Related to model training or definition (not generic infra) labels May 11, 2026
Copy link
Copy Markdown
Contributor

@grassesi grassesi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does exactly what it says. I also noticed this function is not used throughout the entire codebase. Instead downstream code in the weathergen.model subpackage directly uses BatchSamples.get_num_steps. Maybe this method should be entirely removed. What do you think @clessig ?

@clessig
Copy link
Copy Markdown
Collaborator

clessig commented May 15, 2026

The underlying problem is that the function should read:

    def get_num_source_steps(self) -> int:
        """
        Get number of input/source steps from smallest of all available streams
        """
        # TODO: define explicitly
        lens = [
            len(stream.source_tokens_cells)
            for _, stream in self.source_samples.samples[0].streams_data.items()
        ]

        return min(lens)

That is, it should be for _, stream in self.source_samples.samples[0].streams_data.items() instead of for _, stream in self.target_samples.samples[0].streams_data.items(). Even better would be to also add a function get_num_source_steps() to StreamData and perhaps to the intermediate classes like Sample.

@TillHae : could you take care of this?

Copy link
Copy Markdown
Collaborator

@clessig clessig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See last text comment

@github-project-automation github-project-automation Bot moved this to In Progress in WeatherGen-dev May 15, 2026
@TillHae
Copy link
Copy Markdown
Contributor Author

TillHae commented May 18, 2026

What I did with commit d79f737:

  • added getter for source and target steps to StreamData
  • added getter for source and target steps, which iterates through all streams, gets the corresponding length from the StreamData getter and returns the minimum to Sample
  • removed get_num_steps from BatchSamples and added get_num_source_steps and get_num_target_steps, which both just use the Sample getter
  • due to the removal of get_num_steps I needed to change all occurences to get_num_source_steps
  • the getter in ModelBatch also just use the getter from BatchSamples

@TillHae TillHae requested a review from clessig May 18, 2026 07:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Anything related to the datasets used in the project model Related to model training or definition (not generic infra)

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

Explicit definition of get_num_steps in BatchSamples

3 participants