Skip to content

AI4OPT/TS4PS-RTE7000

Repository files navigation

TS4PS

Time Series reconstruction for Power Systems

  1. Installation instructions
    1. Python setup
    2. RTE OpenAPI
    3. Download RTE7000 dataset
  2. External software
  3. Data sources
  4. Running the code

Installation instructions

Python setup

The recommended worklow is to use uv.

  1. Install uv:
    • For HPC users where uv is already installed, load uv with
      module load uv
    • To install your own uv distribution see installation instructions
  2. Install dependencies
    uv sync

Download RTE7000 dataset

⚠️ warning ⚠️

The D-GITT-RTE7000 dataset takes ~500GB of disk space and may take several hours to download. If running in an HPC environment, it is recommended to use a symlink to avoid straining network drives.

  1. Install git lfs
    git lfs install
  2. Create directory structure and download RTE7000 dataset
    mkdir data/D-GITT-RTE7000
    git clone https://huggingface.co/datasets/OpenSynth/D-GITT-RTE7000-2021
    git clone https://huggingface.co/datasets/OpenSynth/D-GITT-RTE7000-2022
    git clone https://huggingface.co/datasets/OpenSynth/D-GITT-RTE7000-2023

Data sources

Data Source License
Network topology snapshots D-GITT-RTE7000 (split across 3 datasets) CC-BY-SA 4.0
Regional load & generation time series eco2mix Open License v2
Actual generation per unit ENTSOE Transparency Platform CC-BY 4.0 (See terms and conditions)
Shapefiles of France's regions France GeoJSON Open License v2
List of generators in France Open Data Reseaux Energies (in French) Open License v2
RTE substation geodata Open Data Reseaux Energies (in French) Open License v2

For convenience, geodata is included in this repository (see data/geodata directory). Substation geodata only includes the name and code of the region where each substation is located.

Running the code

Process RTE7000 data

The first step of the data procedure is to process the 300,000-ish XIIDM files into I/O efficient, tabular-oriented format.

  1. Create output directory structure

    mkdir -p data/TS4PS-RTE7000/CSV/branch
    mkdir -p data/TS4PS-RTE7000/CSV/bus
    mkdir -p data/TS4PS-RTE7000/CSV/gen
    mkdir -p data/TS4PS-RTE7000/CSV/load
    mkdir -p data/TS4PS-RTE7000/CSV/sub
    mkdir -p data/TS4PS-RTE7000/CSV/switch
    mkdir -p data/TS4PS-RTE7000/CSV/vol
    mkdir -p data/TS4PS-RTE7000/parquet/branch
    mkdir -p data/TS4PS-RTE7000/parquet/bus
    mkdir -p data/TS4PS-RTE7000/parquet/gen
    mkdir -p data/TS4PS-RTE7000/parquet/load
    mkdir -p data/TS4PS-RTE7000/parquet/sub
    mkdir -p data/TS4PS-RTE7000/parquet/switch
    mkdir -p data/TS4PS-RTE7000/parquet/vol
  2. Extract network components from XIIDM to CSV files

    • If you are on a SLURM cluster, update the charge account in slurm/extract_components.sbatch as appropriate and run
      mkdir slurm/logs/RTE7k-extract
      sbatch slurm/extract_components.sbatch
    • If you are not running on a SLURM cluster, or want to process a small batch of files. To view the code's documentation, run
      uv run src/ts4ps/datatools/rte7000/extract_components.py --help
      For instance, to process only the XIIDM files on Jan 1st, 2021, run
      uv run src/ts4ps/datatools/rte7000/extract_components.py data/D-GITT-RTE7000 data/TS4PS-RTE7000 --datetime-begin 2021-01-01 --datetime-end 2021-01-02
  3. Convert from CSV to Parquet files, which can be read with pyarrow

    If you are on a SLURM cluster, update slurm/csv2parquet.sbatch as appropriate and run

    mkdir slurm/logs/RTE7k-csv2pq
    sbatch slurm/csv2parquet.sbatch

Download and process public time series data

  1. Download and process regional time series

    bash scripts/download_eco2mix_timeseries.sh
    uv run src/ts4ps/datatools/eco2mix/process_regional_time_series.py data/TS4PS-RTE7000/eco2mix/regional_cons
  2. Download and process generator information

    bash scripts/download_generator_register.sh
    uv run src/ts4ps/datatools/eco2mix/generator_register.py data/TS4PS-RTE7000/eco2mix/generators
  3. Download and process actual generation for large units (>100MW)

    1. Download ENTSOE's Actual Generation per Unit data. This yields 36 CSV files (one per month), which should be placed in data/TS4PS-RTE7000/entsoe/raw

    2. Extract France-only data

      mkdir data/TS4PS-RTE7000/entsoe/fr
      chmod u+x scripts/clean_entsoe.sh  # ensure script is executable
      bash scripts/clean_entsoe.sh

      ⚠️ Some of the raw CSV lines have corrupted entries, which trigger errors in pandas' CSV reader. Fortunately, no data corresponding to RTE generators is corrupted ⚠️

    3. Consolidate France-only data

      uv run src/ts4ps/datatools/entsoe.py data/TS4PS-RTE7000/entsoe
    4. [Optional] Verify data integrity

      md5sum data/TS4PS-RTE7000/entsoe/actual_generation_unit_output.csv

      should output ac7df11c9cff145e4c11f3b453a96244.

      ⚠️ The above signature may not be valid if changes were made to raw data. ⚠️

Geodata reconstruction

Network geodata information is reconstructed approximately by matching substations in the RTE7000 dataset to substations listed in the ODRE data.

Execute the geodata reconstruction script as follows:

uv run src/ts4ps/reconstruction/match_substations.py data/geodata data/TS4PS-RTE7000

which will export the reconstructed data into a CSV file located at data/TS4PS-RTE7000/substations_matching.csv.

⚠️⚠️ This reconstruction is heuristic, and its output should not be expected to be exact ⚠️⚠️

The exported CSV file has the following structure

id_rte7000 vnom_rte7000 id_eco2mix vnom_eco2mix how_matched region_code generators_rte7000
.CTLH 63.0 COTEL 90.0 score-based 44 "[["".CTLO3GROUP.1"", ""HYDRO""], ["".CTLO3GROUP.2"", ""HYDRO""]]"
.CTLO 63.0 Q.CRO 63.0 score-based 24 []
.G.RO 225.0 G.ROU 225.0 score-based 52 []
.NAVA 63.0 NAVA6 380.0 score-based 28 []

where

  • id_rte7000 and id_eco2mix are the substation ID in the RTE7000 dataset and eco2mix data, respectively
  • vnom_rte7000 and vnom_eco2mix are the substation's maximum nominal voltage levels in the RTE7000 and eco2mix data, respectively, in kilovolts (kV).
  • how_matched describes the rationale for the matching
  • region_code is the INSEE code of the region where the substation is located. Missing values are replaced with 0.
  • generators_rte7000 denotes the generators (ID and fuel type) that are attached to the substation in the RTE7000 dataset. This information is stored in JSON format.

Online capacity vs actual regional totals

Run

uv run src/ts4ps/viz/capacity.py

and following the instructions on the terminal to open the Dash app in your browser

Load and generation disaggregation

  • Generation dispatches are disaggregated by region and fuel type, proportional to online capacity

    sbatch slurm/disaggregate_generation.sbatch

    This script will replace the "target_p" column in the generator parquet files.

  • Individual loads are disaggregated using pre-populated disaggregation weights

    sbatch slurm/disaggregate_load.sbatch

    This script will populate the "p0" column in the load parquet files.

About

Code for producing time series for the RTE7000 dataset

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors