Write selected channels during inference by sbAsma · Pull Request #2120 · ecmwf/WeatherGenerator

sbAsma · 2026-03-27T08:29:40Z

Description (Edited)

Changes in `config/default_config.yml`:

Added a new filed in output to filter channels.

Changes in `packages/common/src/weathergen/common/io.py`:

Added a field in the zarr output for channels being written

Changes in `src/weathergen/utils/validation_io.py`:

Now trims targets_all and preds_all tensors to a per-stream allowlist or denylist before writing to the output zarr, via a new output_channels list derived from a deep copy of target_channels.

Changes in `packages/evaluate/src/weathergen/evaluate/io/wegen_reader.py`:

Added get_channels() that reads channel names directly from the Zarr output metadata

To run with:

uv run --offline inference --from-run-id run_id --options test_config.samples_per_mini_epoch=1 validation_config.output.filter_output_channels.allow=True  validation_config.output.filter_output_channels.ERA5=["2t"]

The output zarr file has the tree:

>>> print(root.tree())

/
└── 0
    └── ERA5
        ├── 0
        │   └── source
        │       ├── coords (40320, 2) float32
        │       ├── data (40320, 71) float32
        │       ├── geoinfo (40320, 0) float32
        │       └── times (40320,) datetime64
        ├── 1
        │   ├── prediction
        │   │   ├── coords (40320, 2) float32
        │   │   ├── data (40320, 1, 1) float32
        │   │   ├── geoinfo (40320, 0) float32
        │   │   └── times (40320,) datetime64
        │   └── target
        │       ├── coords (40320, 2) float32
        │       ├── data (40320, 1) float32
        │       ├── geoinfo (40320, 0) float32
        │       └── times (40320,) datetime64
        └── 2
            ├── prediction
            │   ├── coords (40320, 2) float32
            │   ├── data (40320, 1, 1) float32
            │   ├── geoinfo (40320, 0) float32
            │   └── times (40320,) datetime64
            └── target
                ├── coords (40320, 2) float32
                ├── data (40320, 1) float32
                ├── geoinfo (40320, 0) float32
                └── times (40320,) datetime64

>>> print(dict(root["0"]["ERA5"]["1"]["target"].attrs))
{'channels': ['2t'], 'geoinfo_channels': [], 'source_interval': {'start': '2023-10-01T00:00:00.000000000', 'end': '2023-10-01T06:00:00.000000000'}}

Issue Number

Closes #1705

Checklist before asking for review

I have performed a self-review of my code
My changes comply with basic sanity checks:
- I have fixed formatting issues with ./scripts/actions.sh lint
- I have run unit tests with ./scripts/actions.sh unit-test
- I have documented my code and I have updated the docstrings.
- I have added unit tests, if relevant
I have tried my changes with data and code:
- I have run the integration tests with ./scripts/actions.sh integration-test
- (bigger changes) I have run a full training and I have written in the comment the run_id(s): launch-slurm.py --time 60
- (bigger changes and experiments) I have shared a hegdedoc in the github issue with all the configurations and runs for this experiments
I have informed and aligned with people impacted by my change:
- for config changes: the MatterMost channels and/or a design doc
- for changes of dependencies: the MatterMost software development channel

iluise

Thanks for the heads up! Minor suggestion to avoid retrieving the channels twice

iluise · 2026-03-27T10:49:21Z

            A list of channel names.
        """
        _logger.debug(f"Getting channels for stream {stream}...")
        all_channels = self.get_inference_stream_attr(stream, "val_target_channels")


I would just move it at line 152 (so before self.get_inference_stream_attr(stream, "val_target_channels")):

write_output = self.get_inference_stream_attr(stream, "write_output") if write_output is not None: all_channels = [ch for ch in all_channels if ch in write_output] else: all_channels = self.get_inference_stream_attr(stream, "val_target_channels") _logger.debug(f"Channels found in config: {all_channels}")

Why do we need to change the reader in general here? all_channels should have anyway whatever is written in the output from write_output

So all_channels should be 2t in our example here.

@SavvasMel source is kept as it is. Only target and prediction channels are being filtered.

SavvasMel · 2026-03-27T08:41:59Z

  stream_id : 0
  source_exclude : ['w_', 'skt', 'tcw', 'cp', 'tp']
  target_exclude : ['w_', 'slor', 'sdor', 'tcw', 'cp', 'tp']
+  write_output: ['2t']


We need to be careful this to be removed before merge. Or give an empty list.

I am adding another param in default_config that controls whether channels are being filtered or not.

SavvasMel · 2026-03-27T08:44:06Z

        _logger.debug(f"Channels found in config: {all_channels}")
+
+        # filter to write_output subset if specified
+        write_output = self.get_inference_stream_attr(stream, "write_output")


It seems that this respects backward compatibility but please double check

I mean what happens if the stream does not have "write_output"?

I am working on checking the backward compatibility. To answer your question, get_inference_stream_attr returns an empty list if it doesn't find write_output

Backward compatibility checks

parse validation output filter_output_channels from default config format support allow or deny channel filtering per stream introduce output_channels for written targets and predictions read channels from zarr metadata in WeatherGenZarrReader

sbAsma · 2026-03-28T16:21:06Z

Hi @SavvasMel and @iluise

I had another idea how to implement this issue, and I would really appreciate your input. Among the changes, I decided to remove the filtering of the channels from the stream config, and add it in the default config instead so that it can be overwritten with --options. I also added a change requested by Christian, where the output zarr file would have metadata on the channels being written.

initial commit for write selected channels during inference

f306f2b

github-project-automation bot added this to WeatherGen-dev Mar 27, 2026

github-actions bot added eval anything related to the model evaluation pipeline model Related to model training or definition (not generic infra) model:inference anything related to the inference step (not plotting or score computation). labels Mar 27, 2026

iluise reviewed Mar 27, 2026

View reviewed changes

SavvasMel reviewed Mar 27, 2026

View reviewed changes

sbAsma added 3 commits March 27, 2026 13:17

added config to control filtering of channels in output

8f953a3

renamed variable that filters channels

2e41993

defer val_target_channels lookup when filter_output_channels is set

28b3612

sbAsma changed the title ~~initial commit for write selected channels during inference~~ Write selected channels during inference Mar 28, 2026

github-actions bot added the infra Issues related to infrastructure label Mar 28, 2026

applied ruff requested fixes

967fdbd

sbAsma added 3 commits March 29, 2026 11:19

fixed backward compatibility for filtering of channels in output

fe4ea85

restore default config to default status

22a3f50

removed unecessary introduced code

55cf5b2

sbAsma requested a review from iluise April 10, 2026 05:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Write selected channels during inference#2120

Write selected channels during inference#2120
sbAsma wants to merge 9 commits intoecmwf:developfrom
sbAsma:sbAsma/dev/1705-write-selected-channels

sbAsma commented Mar 27, 2026 •

edited

Loading

Uh oh!

iluise left a comment

Uh oh!

iluise Mar 27, 2026

Uh oh!

SavvasMel Mar 27, 2026

Uh oh!

SavvasMel Mar 27, 2026

Uh oh!

sbAsma Mar 27, 2026

Uh oh!

SavvasMel Mar 27, 2026

Uh oh!

sbAsma Mar 27, 2026

Uh oh!

SavvasMel Mar 27, 2026

Uh oh!

SavvasMel Mar 27, 2026

Uh oh!

sbAsma Mar 27, 2026

Uh oh!

sbAsma Mar 29, 2026

Uh oh!

sbAsma commented Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

sbAsma commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description (Edited)

Changes in config/default_config.yml:

Changes in packages/common/src/weathergen/common/io.py:

Changes in src/weathergen/utils/validation_io.py:

Changes in packages/evaluate/src/weathergen/evaluate/io/wegen_reader.py:

To run with:

Issue Number

Checklist before asking for review

Uh oh!

iluise left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sbAsma commented Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sbAsma commented Mar 27, 2026 •

edited

Loading

Changes in `config/default_config.yml`:

Changes in `packages/common/src/weathergen/common/io.py`:

Changes in `src/weathergen/utils/validation_io.py`:

Changes in `packages/evaluate/src/weathergen/evaluate/io/wegen_reader.py`: