Skip to content

Add smaller fixes to make running from files more robust#15

Open
JoachimKoenigslieb wants to merge 3 commits into
dmidk:mainfrom
JoachimKoenigslieb:main
Open

Add smaller fixes to make running from files more robust#15
JoachimKoenigslieb wants to merge 3 commits into
dmidk:mainfrom
JoachimKoenigslieb:main

Conversation

@JoachimKoenigslieb

Copy link
Copy Markdown
Contributor

I wanted to use sunflow to do nowcasting over the US with data from GOES.

I ultimately get the data from this public AWS source here https://registry.opendata.aws/noaa-goes/.

Unfortunately my files are stored converted to grib2, so I'm getting into some quite off-label usage here. Maybe this is not something you guys are interested in supporting up stream here? The specific changes needed to support it are quite general and feels like reasonable changes tough.

Turns out that xarray is happy enough to read my files out of the box if I install cfgrib as a peer-dependency of sunflow. One big issue remains: I'm not storing the clearsky in these grib2 files! Instead I made a small config switch that allows for fetching the clearksy via the already-installed pvlib.

A list of fixes implemented are:

  • Make minutes available for formatting file names
  • use .expand_dims instead of .assign_coords to make sure we have both time dimension AND time coordinates when loading
  • Added a helper make_pvlib_clearsky_dataset which makes clearsky by calling pvlib. Also added a new config variable clearsky_source which defaults to file and switches behavior when set to pvlib.
  • check_solar_elevation now does not take Copenhagen as a default as that can easily lead to silent buggy behavior.

With these changes I'm able to do nowcasts over CONUS:
image

Joachim Kønigslieb added 3 commits June 15, 2026 14:35
@KristianHMoller KristianHMoller self-requested a review June 26, 2026 09:01
@KristianHMoller

Copy link
Copy Markdown
Contributor

Thanks a lot @JoachimKoenigslieb, this is awesome!
For my first quick test of the pvlib clear sky values, it worked perfectly.
One question: Is there a particular reason for your choice of the pvlib.clearsky.simplified_solis values for the clear-sky? I just noticed that the library had multiple options: https://pvlib-python.readthedocs.io/en/stable/reference/clearsky.html . But I can see that some options require additional input.

@KristianHMoller

Copy link
Copy Markdown
Contributor

In general, I think that support for reading in data from GRIB files would also be nice. Is it correctly understood, that this would require only cfgrib included as a dependency in the pyproject.toml file?

Comment thread sunflow/data_io.py
{month}: Two-digit month (e.g. 03)
{day}: Two-digit day (e.g. 09)
{hour}: Two-digit hour (e.g. 12)
{minute}: Two-digit minute (e.g. 30)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps, we could add second as well here and below, to have full flexibility?

@JoachimKoenigslieb JoachimKoenigslieb Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Will add this! (probably next week tough :)

Regarding if all that is needed is cfgrib to read grib files: Yes, but this dependency is also quite hefty. I think it brings in eccodes itself as far as I understand.

For pvlib, this was the only way (from not much looking admittedly) of getting a field of clearsky values. Other methods in pvlib wanted some position tuples which would probably be quite slow I think.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point with cfgrib - no need to make the suite unnecessarily big!
And fair with the clearsky methods - I was mainly curious

Comment thread sunflow/main.py
for lon in (lon_min, lon_max)
)
logger.info(f"Maximum corner solar elevation: {solar_elevation:.2f} degrees")
if solar_elevation < 1:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder, if the value of 1 degree should be increased a bit now that we verify the corner with the maximum elevation. For instance, the MSG-CPP dataset is only available for solar elevation greater than 6 degrees. Is 1 degree useful for your applications @JoachimKoenigslieb ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below 6 degrees for maximum corner solar elevation, running with MSG-CPP input fails with:
Nowcast failed for 2026-06-29T02:00:00Z: precip contains only non-finite values

So if lower values than that are needed, it would need to be modifiable (which may be the most elegant solution) by either an argument or a environment variable.

Comment thread sunflow/forecast.py

from .geospatial import get_coordinates

PVLIB_CLEARSKY_VARIABLE_NAME = "pvlib_clearsky_ghi"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest moving this to config.py and importing from that, both here and in main.py. I just tested that and it seems to work without an issue.

@KristianHMoller

Copy link
Copy Markdown
Contributor

The command: pre-commit run --all should fix most linting issues and highlight the ones it cannot automatically fix.

Comment thread sunflow/forecast.py

fields = []
for time in times:
repeated_times = pd.DatetimeIndex([time] * len(flat_latitudes))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
repeated_times = pd.DatetimeIndex([time] * len(flat_latitudes))
repeated_times = [time] * len(flat_latitudes)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to yield identical results and does not require pandas import into this file. But as pandas is already installed, either option is fine

Comment thread sunflow/forecast.py
times: list[datetime],
latitudes: np.ndarray,
longitudes: np.ndarray,
) -> xr.Dataset:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
) -> xr.Dataset:
) -> xr.Dataset:
"""Compute a pvlib simplified-Solis clear-sky GHI dataset on a lat/lon grid.
For each timestep in `times`, solar position is computed at every grid
point by repeating the timestamp across all (lat, lon) pairs. The
simplified Solis clear-sky model is then applied to the apparent solar
elevation to produce global horizontal irradiance (GHI) in W m⁻².
Longitudes are normalised to the range [-180, 180] before the solar
position calculation.
Args:
times: Ordered list of timezone-aware datetimes for which to
compute clear-sky irradiance.
latitudes: 1-D array of latitude values in degrees North.
longitudes: 1-D array of longitude values in degrees East
(values > 180 are wrapped to [-180, 180]).
Returns:
xr.Dataset with a single variable keyed by
PVLIB_CLEARSKY_VARIABLE_NAME of shape (time, latitude, longitude)
containing GHI in W m⁻². The time coordinate is stored without
timezone information.
"""

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A suggestion for a docstring

Comment thread CHANGELOG.md
Comment on lines 8 to 23
## [Unreleased]
* Made minutes available for formatting file names, @JoachimKoenigslieb
- use `.expand_dims` instead of `.assign_coords` to make sure we have both time dimension and time coordinates when loading from files, @JoachimKoenigslieb
- Added a helper `make_pvlib_clearsky_dataset` which creates clearsky data by calling `pvlib`. Also added a new config variable `clearsky_source` which defaults to `file` and switches behavior when set to `pvlib`, @JoachimKoenigslieb
- `check_solar_elevation` now does not assume location is in Copenhagen by default, @JoachimKoenigslieb


### Added

- Added a check for the number of ensemble members, as the code currently supports only one [!13](https://github.com/dmidk/sunflow/pull/13), @KristianHMoller
- Subsetting to bounding box is now also done in the `s3` and `files` code paths [!11](https://github.com/dmidk/sunflow/pull/11), @JoachimKoenigslieb
- Subsetting to bounding box now correctly handles both ascending and descending lat/lon coordinates [!11](https://github.com/dmidk/sunflow/pull/11), @JoachimKoenigslieb

### Changed

- Pass down number of ensembles from configurations to `ProbabilisticAdvection`. This gives a roughly 3x speedup in the no ensembles case [!12](https://github.com/dmidk/sunflow/pull/12), @JoachimKoenigslieb

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## [Unreleased]
### Added
- Added a helper `make_pvlib_clearsky_dataset` which creates clearsky data by calling `pvlib`. Also added a new config variable `clearsky_source` which defaults to `file` and switches behavior when set to `pvlib` [!15](https://github.com/dmidk/sunflow/pull/15), @JoachimKoenigslieb
- Made minutes available for formatting file names [!15](https://github.com/dmidk/sunflow/pull/15), @JoachimKoenigslieb
- Added a check for the number of ensemble members, as the code currently supports only one [!13](https://github.com/dmidk/sunflow/pull/13), @KristianHMoller
- Subsetting to bounding box is now also done in the `s3` and `files` code paths [!11](https://github.com/dmidk/sunflow/pull/11), @JoachimKoenigslieb
- Subsetting to bounding box now correctly handles both ascending and descending lat/lon coordinates [!11](https://github.com/dmidk/sunflow/pull/11), @JoachimKoenigslieb
### Changed
- `check_solar_elevation` now does not assume location is in Copenhagen by default [!15](https://github.com/dmidk/sunflow/pull/15), @JoachimKoenigslieb
- Use `.expand_dims` instead of `.assign_coords` to make sure we have both time dimension and time coordinates when loading from files [!15](https://github.com/dmidk/sunflow/pull/15), @JoachimKoenigslieb
- Pass down number of ensembles from configurations to `ProbabilisticAdvection`. This gives a roughly 3x speedup in the no ensembles case [!12](https://github.com/dmidk/sunflow/pull/12), @JoachimKoenigslieb

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if the "suggestion" works as intended here.

@KristianHMoller

Copy link
Copy Markdown
Contributor

All implemented functionality seems to work as intended! This is great!
I have a few suggestions of which the one regarding maximum solar elevation is scientific, whereas the others are mainly format

@AdamRJensen AdamRJensen left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few comments, but my main concern that I want to raise is that climatological clear sky models as used in this PR are not ideal for forecasting applications given that water vapor varies significantly from day to day. But if this is for testing purposes, then it's fine to add.

Using CAMS clear sky data based on forecasting of aerosols is the way forward in my opinion.

Comment thread sunflow/forecast.py
solar_position = pvlib.solarposition.get_solarposition(
repeated_times,
flat_latitudes,
flat_longitudes,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
flat_longitudes,
flat_longitudes,
method='nrel_numpy',

I suggest explicitly denoting which solar position model is used.

On a different note, the 'nrel_numpy' model is the most accurate but unnecessarily slow. Based on my recent research, solar position models such as Michalsky or USNO are plenty fine: https://solposx.readthedocs.io/. How fast they are is very implementation dependent, and most of these implementations are not developed for a grid of latitudes/longitudes.

Comment thread CHANGELOG.md
Comment on lines +11 to +12
- Added a helper `make_pvlib_clearsky_dataset` which creates clearsky data by calling `pvlib`. Also added a new config variable `clearsky_source` which defaults to `file` and switches behavior when set to `pvlib`, @JoachimKoenigslieb
- `check_solar_elevation` now does not assume location is in Copenhagen by default, @JoachimKoenigslieb

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pvlib supports a lot of clear sky models, even remotely retrieving clear sky data from CAMS. I recommend being explicit here about which method is being used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants