Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
general:
description: Configuration for Germany solar forecasting with GFS data
name: germany_gfs_config

input_data:
gsp:
zarr_path: "./data/germany/generation/germany_pv_2021.zarr"
interval_start_minutes: -60
interval_end_minutes: 480
time_resolution_minutes: 30
dropout_timedeltas_minutes: []
dropout_fraction: 0.0
public: False

nwp:
gfs:
time_resolution_minutes: 180
interval_start_minutes: -180
interval_end_minutes: 540
dropout_fraction: 0.0
dropout_timedeltas_minutes: []
zarr_path: "./data/germany/gfs/zarr/germany_gfs_2021_01.zarr"
provider: "gfs"
image_size_pixels_height: 32
image_size_pixels_width: 40
public: False
channels:
- dlwrf
- dswrf
- hcc
- mcc
- lcc
- prate
- r
- t
- tcc
- u10
- u100
- v10
- v100
- vis
normalisation_constants:
dlwrf:
mean: 287.7809
std: 35.7133
dswrf:
mean: 22.2770
std: 37.8986
hcc:
mean: 39.0351
std: 40.4131
lcc:
mean: 81.1567
std: 31.2783
mcc:
mean: 43.8942
std: 41.4526
prate:
mean: 0.000017
std: 0.000049
r:
mean: 92.2116
std: 10.6941
t:
mean: 273.4538
std: 2.9723
tcc:
mean: 88.8634
std: 23.8125
u10:
mean: -1.0224
std: 3.2745
u100:
mean: -1.4574
std: 4.7261
v10:
mean: -1.3676
std: 2.3645
v100:
mean: -1.9655
std: 3.1743
vis:
mean: 12681.0664
std: 10842.4053

solar_position:
interval_start_minutes: -60
interval_end_minutes: 480
time_resolution_minutes: 30
41 changes: 41 additions & 0 deletions src/open_data_pvnet/configs/germany_gfs_config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
general:
name: "germany_gfs_config"
description: "Configuration for Germany GFS data"

geographic_bounds:
latitude_min: 47.0
latitude_max: 55.0
longitude_min: 5.0
longitude_max: 15.0

data_source:
base_url: "https://nomads.ncep.noaa.gov/pub/data/nccf/com/gfs/prod"
resolution: "0p25"
forecast_hours: [0, 3, 6, 9, 12, 15, 18]

variables:
- dlwrf
- dswrf
- hcc
- mcc
- lcc
- prate
- r
- t
- tcc
- u10
- u100
- v10
- v100
- vis

download:
retry_attempts: 3
retry_delay_seconds: 5
timeout_seconds: 300

output:
format: "zarr"
chunk_time: 10
chunk_lat: 32
chunk_lon: 40
29 changes: 29 additions & 0 deletions src/open_data_pvnet/configs/germany_pv_data_config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
general:
name: "germany_pv_config"
description: "Configuration for Germany PV data from SMARD API"

api:
base_url: "https://www.smard.de/app/chart_data"
filter_id: 4068
region: "DE"
resolution: "quarterhour"

collection:
start_date: "2021-01-01"
end_date: "2021-12-31"
request_delay_seconds: 0.3

processing:
time_resolution_minutes: 15
aggregate_to_minutes: 30
timezone: "UTC"

metadata:
country: "Germany"
gsp_id: "DE"
latitude: 51.0
longitude: 10.0

output:
format: "zarr"
csv_backup: true
6 changes: 6 additions & 0 deletions src/open_data_pvnet/configs/germany_regions.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
region_id,region_name,latitude_min,latitude_max,longitude_min,longitude_max,center_lat,center_lon
DE,Germany,47.0,55.0,5.0,15.0,51.0,10.0
DE_NORTH,Northern Germany,53.0,55.0,7.0,14.0,54.0,10.5
DE_SOUTH,Southern Germany,47.0,50.0,7.0,13.0,48.5,10.0
DE_EAST,Eastern Germany,50.5,53.0,12.0,15.0,51.75,13.5
DE_WEST,Western Germany,49.0,52.0,5.0,8.0,50.5,6.5
138 changes: 138 additions & 0 deletions src/open_data_pvnet/scripts/GERMANY_README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
# Germany Solar Forecasting Pipeline

A complete pipeline for training solar forecasting models for Germany using GFS weather data and SMARD PV generation data.

## Quick Start

### 1. Download Data

```bash
# Download PV generation data
python src/open_data_pvnet/scripts/downloading_pv_germany.py

# Download GFS weather data (recent data)
python src/open_data_pvnet/scripts/download_gfs_germany_fast.py --year 2026 --months 2 --max-days 1

# For historical data, install herbie-data first
pip install herbie-data
python src/open_data_pvnet/scripts/download_gfs_germany_fast.py --year 2024 --months 1 --max-days 1 --source herbie
```

### 2. Process Data

```bash
python src/open_data_pvnet/scripts/germany_pipeline.py process \
--pv-zarr ./data/germany/generation/germany_pv_2021.zarr \
--gfs-zarr ./data/germany/gfs/zarr/germany_gfs_2021_01.zarr \
--output-dir ./data/germany/processed
```

### 3. Run Tests

```bash
python src/open_data_pvnet/scripts/germany_pipeline.py test \
--pv-zarr ./data/germany/generation/germany_pv_2021.zarr \
--gfs-zarr ./data/germany/gfs/zarr/germany_gfs_2021_01.zarr
```

### 4. Train Model

```bash
python src/open_data_pvnet/scripts/train_germany_baseline.py \
--epochs 10 \
--output-dir ./models/germany
```

## Pipeline Commands

### Inspect Zarr Files

```bash
python src/open_data_pvnet/scripts/germany_pipeline.py inspect \
--zarr ./data/germany/gfs/zarr/germany_gfs_2021_01.zarr
```

### Process Data

Validates, aligns, and calculates normalization constants for PV and GFS data.

```bash
python src/open_data_pvnet/scripts/germany_pipeline.py process \
--pv-zarr <path> \
--gfs-zarr <path> \
--output-dir <output_directory>
```

**Outputs:**
- `normalization_constants.yaml` - Normalization statistics
- `processing_report.txt` - Data processing summary

### Run Tests

Validates data loading, temporal alignment, and data quality.

```bash
python src/open_data_pvnet/scripts/germany_pipeline.py test \
--pv-zarr <path> \
--gfs-zarr <path> \
--output-dir <output_directory>
```

**Outputs:**
- `test_report.txt` - Test results

## Data Formats

### PV Data
- **Dimensions:** (datetime_gmt, gsp_id)
- **Variables:** generation_mw, capacity_mwp, installedcapacity_mwp
- **Coordinates:** datetime_gmt, gsp_id

### GFS Data
- **Dimensions:** (init_time_utc, step, latitude, longitude)
- **Variables:** 14 weather channels (dlwrf, dswrf, hcc, mcc, lcc, prate, r, t, tcc, u10, u100, v10, v100, vis)
- **Resolution:** 0.25° (~25km)

## Configuration

- `src/open_data_pvnet/configs/germany_gfs_config.yaml` - GFS download settings
- `src/open_data_pvnet/configs/germany_pv_data_config.yaml` - PV data settings
- `src/open_data_pvnet/configs/germany_regions.csv` - Regional boundaries
- `src/open_data_pvnet/configs/PVNet_configs/datamodule/configuration/germany_configuration.yaml` - Model config

## Scripts

| Script | Purpose |
|--------|---------|
| `downloading_pv_germany.py` | Download PV generation data from SMARD API |
| `download_gfs_germany_fast.py` | Download GFS weather data with subregion filtering |
| `germany_pipeline.py` | Main pipeline orchestrator (inspect, process, test) |
| `germany_utils.py` | Shared utility functions |
| `train_germany_baseline.py` | Train baseline forecasting model |

## Troubleshooting

### GFS Download Issues
- **403 Forbidden:** Historical data (>10 days old) requires S3 access. Use recent dates or install `herbie-data`.
- **Connection timeouts:** Check internet connection and NOAA server availability.

### Data Processing Issues
- Ensure Zarr files exist at specified paths
- Check temporal overlap in processing report
- Verify configuration paths are correct

### Test Failures
- Review test report at `./data/germany/tests/test_report.txt`
- Check data loading and temporal alignment
- Verify data quality and completeness

## Requirements

- Python 3.9+
- xarray, zarr, requests, pyyaml, tqdm, cfgrib, torch, pandas, numpy

## Data Sources

- **PV Generation:** [SMARD API](https://www.smard.de/) (Bundesnetzagentur)
- **Weather Data:** [GFS](https://www.ncei.noaa.gov/products/weather-global-forecast-system) (NOAA)
- **Region:** Germany (47-55°N, 5-15°E)
Loading