Time Series reconstruction for Power Systems
The recommended worklow is to use uv.
- Install
uv:- For HPC users where
uvis already installed, loaduvwithmodule load uv
- To install your own
uvdistribution see installation instructions
- For HPC users where
- Install dependencies
uv sync
The D-GITT-RTE7000 dataset takes ~500GB of disk space and may take several hours to download. If running in an HPC environment, it is recommended to use a symlink to avoid straining network drives.
⚠️ warning⚠️
- Install git lfs
git lfs install
- Create directory structure and download RTE7000 dataset
mkdir data/D-GITT-RTE7000 git clone https://huggingface.co/datasets/OpenSynth/D-GITT-RTE7000-2021 git clone https://huggingface.co/datasets/OpenSynth/D-GITT-RTE7000-2022 git clone https://huggingface.co/datasets/OpenSynth/D-GITT-RTE7000-2023
| Data | Source | License |
|---|---|---|
| Network topology snapshots | D-GITT-RTE7000 (split across 3 datasets) | CC-BY-SA 4.0 |
| Regional load & generation time series | eco2mix | Open License v2 |
| Actual generation per unit | ENTSOE Transparency Platform | CC-BY 4.0 (See terms and conditions) |
| Shapefiles of France's regions | France GeoJSON | Open License v2 |
| List of generators in France | Open Data Reseaux Energies (in French) | Open License v2 |
| RTE substation geodata | Open Data Reseaux Energies (in French) | Open License v2 |
For convenience, geodata is included in this repository (see data/geodata directory).
Substation geodata only includes the name and code of the region where each substation is located.
The first step of the data procedure is to process the 300,000-ish XIIDM files into I/O efficient, tabular-oriented format.
-
Create output directory structure
mkdir -p data/TS4PS-RTE7000/CSV/branch mkdir -p data/TS4PS-RTE7000/CSV/bus mkdir -p data/TS4PS-RTE7000/CSV/gen mkdir -p data/TS4PS-RTE7000/CSV/load mkdir -p data/TS4PS-RTE7000/CSV/sub mkdir -p data/TS4PS-RTE7000/CSV/switch mkdir -p data/TS4PS-RTE7000/CSV/vol mkdir -p data/TS4PS-RTE7000/parquet/branch mkdir -p data/TS4PS-RTE7000/parquet/bus mkdir -p data/TS4PS-RTE7000/parquet/gen mkdir -p data/TS4PS-RTE7000/parquet/load mkdir -p data/TS4PS-RTE7000/parquet/sub mkdir -p data/TS4PS-RTE7000/parquet/switch mkdir -p data/TS4PS-RTE7000/parquet/vol
-
Extract network components from XIIDM to CSV files
- If you are on a SLURM cluster, update the charge account in
slurm/extract_components.sbatchas appropriate and runmkdir slurm/logs/RTE7k-extract sbatch slurm/extract_components.sbatch
- If you are not running on a SLURM cluster, or want to process a small batch of files.
To view the code's documentation, run
For instance, to process only the XIIDM files on Jan 1st, 2021, run
uv run src/ts4ps/datatools/rte7000/extract_components.py --help
uv run src/ts4ps/datatools/rte7000/extract_components.py data/D-GITT-RTE7000 data/TS4PS-RTE7000 --datetime-begin 2021-01-01 --datetime-end 2021-01-02
- If you are on a SLURM cluster, update the charge account in
-
Convert from CSV to Parquet files, which can be read with pyarrow
If you are on a SLURM cluster, update
slurm/csv2parquet.sbatchas appropriate and runmkdir slurm/logs/RTE7k-csv2pq sbatch slurm/csv2parquet.sbatch
-
Download and process regional time series
bash scripts/download_eco2mix_timeseries.sh uv run src/ts4ps/datatools/eco2mix/process_regional_time_series.py data/TS4PS-RTE7000/eco2mix/regional_cons
-
Download and process generator information
bash scripts/download_generator_register.sh uv run src/ts4ps/datatools/eco2mix/generator_register.py data/TS4PS-RTE7000/eco2mix/generators
-
Download and process actual generation for large units (>100MW)
-
Download ENTSOE's Actual Generation per Unit data. This yields 36 CSV files (one per month), which should be placed in
data/TS4PS-RTE7000/entsoe/raw -
Extract France-only data
mkdir data/TS4PS-RTE7000/entsoe/fr chmod u+x scripts/clean_entsoe.sh # ensure script is executable bash scripts/clean_entsoe.sh⚠️ Some of the raw CSV lines have corrupted entries, which trigger errors in pandas' CSV reader. Fortunately, no data corresponding to RTE generators is corrupted⚠️ -
Consolidate France-only data
uv run src/ts4ps/datatools/entsoe.py data/TS4PS-RTE7000/entsoe
-
[Optional] Verify data integrity
md5sum data/TS4PS-RTE7000/entsoe/actual_generation_unit_output.csv
should output
ac7df11c9cff145e4c11f3b453a96244.⚠️ The above signature may not be valid if changes were made to raw data.⚠️
-
Network geodata information is reconstructed approximately by matching substations in the RTE7000 dataset to substations listed in the ODRE data.
Execute the geodata reconstruction script as follows:
uv run src/ts4ps/reconstruction/match_substations.py data/geodata data/TS4PS-RTE7000which will export the reconstructed data into a CSV file located at data/TS4PS-RTE7000/substations_matching.csv.
The exported CSV file has the following structure
| id_rte7000 | vnom_rte7000 | id_eco2mix | vnom_eco2mix | how_matched | region_code | generators_rte7000 |
|---|---|---|---|---|---|---|
| .CTLH | 63.0 | COTEL | 90.0 | score-based | 44 | "[["".CTLO3GROUP.1"", ""HYDRO""], ["".CTLO3GROUP.2"", ""HYDRO""]]" |
| .CTLO | 63.0 | Q.CRO | 63.0 | score-based | 24 | [] |
| .G.RO | 225.0 | G.ROU | 225.0 | score-based | 52 | [] |
| .NAVA | 63.0 | NAVA6 | 380.0 | score-based | 28 | [] |
where
id_rte7000andid_eco2mixare the substation ID in the RTE7000 dataset and eco2mix data, respectivelyvnom_rte7000andvnom_eco2mixare the substation's maximum nominal voltage levels in the RTE7000 and eco2mix data, respectively, in kilovolts (kV).how_matcheddescribes the rationale for the matchingregion_codeis the INSEE code of the region where the substation is located. Missing values are replaced with0.generators_rte7000denotes the generators (ID and fuel type) that are attached to the substation in the RTE7000 dataset. This information is stored in JSON format.
Run
uv run src/ts4ps/viz/capacity.pyand following the instructions on the terminal to open the Dash app in your browser
-
Generation dispatches are disaggregated by region and fuel type, proportional to online capacity
sbatch slurm/disaggregate_generation.sbatch
This script will replace the
"target_p"column in the generator parquet files. -
Individual loads are disaggregated using pre-populated disaggregation weights
sbatch slurm/disaggregate_load.sbatch
This script will populate the
"p0"column in the load parquet files.