Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,11 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]
* Made minutes available for formatting file names, @JoachimKoenigslieb
- use `.expand_dims` instead of `.assign_coords` to make sure we have both time dimension and time coordinates when loading from files, @JoachimKoenigslieb
- Added a helper `make_pvlib_clearsky_dataset` which creates clearsky data by calling `pvlib`. Also added a new config variable `clearsky_source` which defaults to `file` and switches behavior when set to `pvlib`, @JoachimKoenigslieb
- `check_solar_elevation` now does not assume location is in Copenhagen by default, @JoachimKoenigslieb
Comment on lines +11 to +12

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pvlib supports a lot of clear sky models, even remotely retrieving clear sky data from CAMS. I recommend being explicit here about which method is being used.



### Added

Expand Down
12 changes: 11 additions & 1 deletion sunflow/data_io.py
Original file line number Diff line number Diff line change
Expand Up @@ -133,6 +133,7 @@ def generate_input_filename(
{month}: Two-digit month (e.g. 03)
{day}: Two-digit day (e.g. 09)
{hour}: Two-digit hour (e.g. 12)
{minute}: Two-digit minute (e.g. 30)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps, we could add second as well here and below, to have full flexibility?

@JoachimKoenigslieb JoachimKoenigslieb Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Will add this! (probably next week tough :)

Regarding if all that is needed is cfgrib to read grib files: Yes, but this dependency is also quite hefty. I think it brings in eccodes itself as far as I understand.

For pvlib, this was the only way (from not much looking admittedly) of getting a field of clearsky values. Other methods in pvlib wanted some position tuples which would probably be quite slow I think.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point with cfgrib - no need to make the suite unnecessarily big!
And fair with the clearsky methods - I was mainly curious


Path separators in the result are supported, so a format like
``{year}/{month}/{day}/{dataset_name}_{timestamp}.nc`` resolves to a
Expand All @@ -154,6 +155,7 @@ def generate_input_filename(
month=time_step.strftime("%m"),
day=time_step.strftime("%d"),
hour=time_step.strftime("%H"),
minute=time_step.strftime("%M"),
)
return filename

Expand Down Expand Up @@ -226,7 +228,15 @@ def load_data_from_files(

try:
ds = xr.open_dataset(filepath)
collected.append(ds.assign_coords(time=[time_step.replace(tzinfo=None)]))
time_step_naive = time_step.replace(tzinfo=None)

# Ensure time is in both dimension and in coords.
if "time" in ds.dims:
ds = ds.assign_coords(time=[time_step_naive])
else:
ds = ds.expand_dims(time=[time_step_naive])

collected.append(ds)
logger.info(f"Loaded {data_type} from {filepath}")
except Exception as e:
logger.error(f"Failed to load {data_type} {filepath}: {e}")
Expand Down
40 changes: 40 additions & 0 deletions sunflow/forecast.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,51 @@
from datetime import datetime

import numpy as np
import pandas as pd
import pvlib
import xarray as xr
from Models.ProbabilisticAdvection import ProbabilisticAdvection

from .geospatial import get_coordinates

PVLIB_CLEARSKY_VARIABLE_NAME = "pvlib_clearsky_ghi"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest moving this to config.py and importing from that, both here and in main.py. I just tested that and it seems to work without an issue.



def make_pvlib_clearsky_dataset(
times: list[datetime],
latitudes: np.ndarray,
longitudes: np.ndarray,
) -> xr.Dataset:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
) -> xr.Dataset:
) -> xr.Dataset:
"""Compute a pvlib simplified-Solis clear-sky GHI dataset on a lat/lon grid.
For each timestep in `times`, solar position is computed at every grid
point by repeating the timestamp across all (lat, lon) pairs. The
simplified Solis clear-sky model is then applied to the apparent solar
elevation to produce global horizontal irradiance (GHI) in W m⁻².
Longitudes are normalised to the range [-180, 180] before the solar
position calculation.
Args:
times: Ordered list of timezone-aware datetimes for which to
compute clear-sky irradiance.
latitudes: 1-D array of latitude values in degrees North.
longitudes: 1-D array of longitude values in degrees East
(values > 180 are wrapped to [-180, 180]).
Returns:
xr.Dataset with a single variable keyed by
PVLIB_CLEARSKY_VARIABLE_NAME of shape (time, latitude, longitude)
containing GHI in W m⁻². The time coordinate is stored without
timezone information.
"""

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A suggestion for a docstring

lon_grid, lat_grid = np.meshgrid(longitudes, latitudes)
lon_grid = np.where(lon_grid > 180, lon_grid - 360, lon_grid)
flat_latitudes = lat_grid.ravel()
flat_longitudes = lon_grid.ravel()

fields = []
for time in times:
repeated_times = pd.DatetimeIndex([time] * len(flat_latitudes))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
repeated_times = pd.DatetimeIndex([time] * len(flat_latitudes))
repeated_times = [time] * len(flat_latitudes)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to yield identical results and does not require pandas import into this file. But as pandas is already installed, either option is fine

solar_position = pvlib.solarposition.get_solarposition(
repeated_times,
flat_latitudes,
flat_longitudes,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
flat_longitudes,
flat_longitudes,
method='nrel_numpy',

I suggest explicitly denoting which solar position model is used.

On a different note, the 'nrel_numpy' model is the most accurate but unnecessarily slow. Based on my recent research, solar position models such as Michalsky or USNO are plenty fine: https://solposx.readthedocs.io/. How fast they are is very implementation dependent, and most of these implementations are not developed for a grid of latitudes/longitudes.

)
clearsky = pvlib.clearsky.simplified_solis(solar_position["apparent_elevation"])
fields.append(clearsky["ghi"].to_numpy().reshape(lat_grid.shape))

return xr.Dataset(
{
PVLIB_CLEARSKY_VARIABLE_NAME: (
["time", "latitude", "longitude"],
np.stack(fields),
)
},
coords={
"time": [time.replace(tzinfo=None) for time in times],
"latitude": latitudes,
"longitude": longitudes,
},
)


def preprocess_data(
data: xr.Dataset,
Expand Down
18 changes: 12 additions & 6 deletions sunflow/geospatial.py
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,9 @@ def get_coordinates(ds: xr.Dataset) -> tuple[np.ndarray, np.ndarray]:
elif "lat" in ds.coords and "lon" in ds.coords:
latitudes = ds.lat.values
longitudes = ds.lon.values
elif "latitude" in ds.coords and "longitude" in ds.coords:
latitudes = ds.latitude.values
longitudes = ds.longitude.values
else:
raise RuntimeError("Could not find coordinates in dataset.")

Expand All @@ -97,23 +100,26 @@ def get_coordinates(ds: xr.Dataset) -> tuple[np.ndarray, np.ndarray]:

def check_solar_elevation(
time: datetime,
lat: float = 55.6761,
lon: float = 12.5683,
lat: float,
lon: float,
) -> float:
"""Compute the solar elevation angle at a given time and location.

Uses pvlib to calculate the solar position. Defaults to Copenhagen,
Denmark (55.68°N, 12.57°E).
Uses pvlib to calculate the solar position. Longitudes in 0..360 convention
are converted to the -180..180 convention expected by pvlib.

Args:
time: Datetime (timezone-aware) for which to compute the elevation.
lat: Latitude in decimal degrees. Default 55.6761 (Copenhagen).
lon: Longitude in decimal degrees. Default 12.5683 (Copenhagen).
lat: Latitude in decimal degrees.
lon: Longitude in decimal degrees.

Returns:
Solar elevation angle in degrees above the horizon. Negative values
indicate the sun is below the horizon.
"""
if lon > 180:
lon -= 360

location = pvlib.location.Location(lat, lon)
solar_elevation = location.get_solarposition(time)["elevation"].values[0]
return solar_elevation
75 changes: 54 additions & 21 deletions sunflow/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,14 @@
save_forecast,
)
from .downloaders import download_past_data
from .forecast import multiply_clearsky, preprocess_data, simple_advection_forecast
from .geospatial import check_solar_elevation, get_bbox
from .forecast import (
PVLIB_CLEARSKY_VARIABLE_NAME,
make_pvlib_clearsky_dataset,
multiply_clearsky,
preprocess_data,
simple_advection_forecast,
)
from .geospatial import check_solar_elevation, get_bbox, get_coordinates
from .time_handler import generate_time_steps, round_time
from .validation import (
MissingClearskyDataError,
Expand Down Expand Up @@ -193,6 +199,10 @@ def run_nowcast(
"""
time_step_str = time_step.strftime("%Y-%m-%dT%H:%M:%SZ")
logger.info(f"--- Running nowcast for {time_step_str} ---")
nc_variable_names = config["nc_variable_names"].copy()
clearsky_source = config.get("clearsky_source", "file")
if clearsky_source == "pvlib":
nc_variable_names["sds_cs"] = PVLIB_CLEARSKY_VARIABLE_NAME

# Fetch current data (with retry loop in operational mode)
fetch_current_data_with_retry(
Expand All @@ -209,8 +219,13 @@ def run_nowcast(

# Check solar elevation
try:
solar_elevation = check_solar_elevation(time_step)
logger.info(f"Solar elevation: {solar_elevation:.2f} degrees")
lon_min, lat_min, lon_max, lat_max = map(float, bbox.split(","))
solar_elevation = max(
check_solar_elevation(time_step, lat=lat, lon=lon)
for lat in (lat_min, lat_max)
for lon in (lon_min, lon_max)
)
logger.info(f"Maximum corner solar elevation: {solar_elevation:.2f} degrees")
if solar_elevation < 1:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder, if the value of 1 degree should be increased a bit now that we verify the corner with the maximum elevation. For instance, the MSG-CPP dataset is only available for solar elevation greater than 6 degrees. Is 1 degree useful for your applications @JoachimKoenigslieb ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below 6 degrees for maximum corner solar elevation, running with MSG-CPP input fails with:
Nowcast failed for 2026-06-29T02:00:00Z: precip contains only non-finite values

So if lower values than that are needed, it would need to be modifiable (which may be the most elegant solution) by either an argument or a environment variable.

reason = "sun too low"
logger.warning(f"{reason.capitalize()}. Skipping.\n")
Expand Down Expand Up @@ -261,10 +276,20 @@ def run_nowcast(
if n_loaded == 0:
raise RuntimeError("No past data loaded. Cannot proceed.")

if clearsky_source == "pvlib":
latitudes, longitudes = get_coordinates(data)
data = data.merge(
make_pvlib_clearsky_dataset(
past_time_steps,
latitudes,
longitudes,
)
)

# Preprocess
logger.info("Preprocessing data...")
ratio_data, latitudes, longitudes = preprocess_data(
data, past_time_steps, config["nc_variable_names"]
data, past_time_steps, nc_variable_names
)

# Validate data shape
Expand Down Expand Up @@ -294,19 +319,27 @@ def run_nowcast(
clearsky_t0_time = time_step - timedelta(days=1)
all_clearsky_time_steps = [clearsky_t0_time] + previous_day_time_steps

# Fetch clearsky data with fallback to earlier days if a file is missing
logger.info("Fetching clearsky data...")
clearsky_data = fetch_clearsky_with_fallback(
all_clearsky_time_steps,
run_mode,
nowcast_config.max_clearsky_fallback_days,
config,
bbox,
dataset_name,
bbox_choice,
nowcast_config,
s3_config,
)
if clearsky_source == "pvlib":
logger.info("Generating pvlib clearsky data...")
clearsky_data = make_pvlib_clearsky_dataset(
all_clearsky_time_steps,
latitudes,
longitudes,
)
else:
# Fetch clearsky data with fallback to earlier days if a file is missing
logger.info("Fetching clearsky data...")
clearsky_data = fetch_clearsky_with_fallback(
all_clearsky_time_steps,
run_mode,
nowcast_config.max_clearsky_fallback_days,
config,
bbox,
dataset_name,
bbox_choice,
nowcast_config,
s3_config,
)

# Validate clearsky data
try:
Expand All @@ -319,7 +352,7 @@ def run_nowcast(
clearsky_data,
previous_day_time_steps,
expected_spatial_shape,
config["nc_variable_names"],
nc_variable_names,
)

# Multiply by clearsky to get actual solar irradiance
Expand All @@ -328,12 +361,12 @@ def run_nowcast(
ratio_forecast,
clearsky_data,
previous_day_time_steps,
config["nc_variable_names"],
nc_variable_names,
)

# Prepend timestep 0: current observation (ratio_data[-1]) × clearsky at t=0
sds_cs_t0 = clearsky_data.sel(time=clearsky_t0_time.replace(tzinfo=None))[
config["nc_variable_names"]["sds_cs"]
nc_variable_names["sds_cs"]
].values
solar_t0 = ratio_data[-1] * sds_cs_t0
solar_forecast = np.concatenate([solar_t0[np.newaxis, :, :], solar_forecast], axis=0)
Expand Down
Loading