diff --git a/data_schemas/pypsa_data_model.yaml b/data_schemas/pypsa_data_model.yaml index 8fc6108..9e68f07 100644 --- a/data_schemas/pypsa_data_model.yaml +++ b/data_schemas/pypsa_data_model.yaml @@ -20,143 +20,253 @@ # --------------------------------------------------------------------------- identity: schema_name: PyPSA Data Model - organization: + organization: Technische Universitaet Berlin maintainers: - - name: - affiliation: - github: <@handle> - email: - repository: - documentation: - license: - version: - maturity: + - name: Fabian Neumann + affiliation: Technische Universitaet Berlin + github: fneum + email: f.neumann@tu-berlin.de + - name: Fabian Hofmann + affiliation: Open Energy Transition + github: FabianHofmann + email: fabian.hofmann@openenergytransition.org + - name: Lukas Trippe + affiliation: Technische Universitaet Berlin + github: lkstrp + email: lkstrp@pm.me + repository: https://github.com/PyPSA/PyPSA + documentation: https://docs.pypsa.org + license: MIT + version: "1.1.2" + maturity: Production # Point us to the code — we'll review the technical details ourselves - link_to_schema_definition: - link_to_validation_logic: - link_to_timeseries_management: - link_to_entity_relation_diagram: + link_to_schema_definition: https://github.com/PyPSA/PyPSA/tree/master/pypsa/data + link_to_validation_logic: https://github.com/PyPSA/PyPSA/blob/master/pypsa/consistency.py + # Note: validation is being overhauled, migrating to Pydantic and Pandera. + # The linked file will be replaced soon. + link_to_timeseries_management: https://github.com/PyPSA/PyPSA/blob/master/pypsa/network/index.py + link_to_entity_relation_diagram: ~ # --------------------------------------------------------------------------- # 2. What It Is & What It Covers # --------------------------------------------------------------------------- summary: description: | - - + The PyPSA data model uses a network as the top-level Python object, holding a + dict like store of all component instances (buses, generators, lines, etc.) and + shared dimensions: time series, optional multi-horizon investment periods, and an + optional scenario dimension for stochastic networks. Each component type carries a + DataFrame for static attributes and a dict of DataFrames for time-varying ones, all + anchored to the network's shared dimensions. The attribute schema is declared in + CSV files and is currently being migrated to Pydantic and Pandera for a stricter + validation. The underlying data can be accessed through different interfaces, such + as an xarray accessor which exposes labeled N-dimensional arrays with named + dimensions that enable automatic alignment, broadcasting, and index-based selection + across the full parameter space. + modeling_domains_supported: | - + - Economic dispatch (ED) with integer and/or continuous unit commitment + formulations, renewable availability, storage incl. hydro reservoirs, + demand elasticity, and carrier conversion + - Linear optimal power flow (LOPF) for meshed AC-DC networks with + optional loss approximations + - Security-constrained LOPF (SCLOPF) with N-1 contingency analysis + - Capacity expansion planning (CEP) with continuous and discrete + investment decisions for generation, storage, conversion, and + transmission + - Pathway planning with co-optimised multi-period investment + - Multi-carrier energy sector coupling (electricity, heat, hydrogen, gas, etc.) + - Static non-linear AC/DC (Newton-Raphson) and linearised power flow + approximation + Various solution methods are supported across these domains, including + stochastic programming (risk-neutral and risk-averse CVaR formulations), + Modeling to Generate Alternatives (MGA), rolling-horizon foresight + optimisation, and optimization problem reduction techniques such as + spatial clustering and temporal clustering (representative periods, + segmentation). what_does_it_NOT_cover: | - + PyPSA is designed from an economical and operational research + perspective. Some features are technically available but are not the + focus of the framework: + - Electrical engineering detail needed for low voltage distribution models. + Newton-Raphson is available but the core workflow uses LOPF. + - Imperfect competition / game-theoretic market models + - Multi-objective optimisation + The following can be implemented via custom constraints but are not + supported out of the box: + - Endogenous technology learning + - Maintenance scheduling optimisation + - Capacity market mechanisms + - ... data_captured: | - + - Grid topology (buses, lines, transformers, links) + - Component attributes (generators, loads, stores, storage units, shunt + impedances) — each carrying technical, economic, and operational parameters + as defined in the attribute schema + - Time series (availability profiles, load profiles, marginal costs, snapshot weightings) + - Scenarios (network level dimension for stochastic programming, adding a scenario + axis to the full component data) + - Investment periods (network level dimension for multi-period planning, with + per-period weightings, discount rates, and lifetimes) + - Energy carriers and CO2 emissions, global constraints (e.g. CO2 budgets) + - Standard type libraries for lines and transformers + - Geographic coordinates and shapes for spatial analysis conceptual_structure: | - + Component based with bus centric topology: + - Components (Generators, Loads, Stores, ..) + - Buses form the nodes of a graph, branches (Lines, Transformers, + Links, Processes) form the edges + - Components can be abstract: energy assets can be composed from groups of + components (e.g. a hydrogen storage could be modelled with 2 Links for + electrolyser and fuel cell, 1 Store and 1 Bus connecting them) + Simple tabular format: + - Network topology information as well as n-dimensions are underlyingly stored in a + tabular format + - Split into static (components × attributes) and dynamic + (snapshots × components) tables, extended with multi-indexing + for scenarios and investment periods + - Each attribute can be static or dynamic, avoiding forced time + series when a constant suffices. This is currently being expanded with a + segment dimension for piecewise linear cost curves. # --------------------------------------------------------------------------- # 3. Key Design Decisions # --------------------------------------------------------------------------- design: key_decisions: - - decision: - rationale: - - decision: <...> - rationale: <...> + - decision: Tabular format + rationale: Easiest for users to work with and closest to the input data form + - decision: Pandas based storage + rationale: Easiest to learn and most users already know pandas + - decision: Static and dynamic attribute split + rationale: > + Avoids forcing time series when a constant suffices, simple access to the most + important data without handling of n dimensions + - decision: Multiple accessors via PyPSA API + rationale: > + Component classes provide filtering, statistics, optimization helpers, etc. + on top of the raw tabular data. Additional accessors let users + switch between pandas, n-dimensional, and aggregated views without + data duplication. + - decision: Full backwards compatibility + rationale: Large established user base depends on stable interfaces + - decision: Components organised per type, not per entity + rationale: Natural fit for tabular storage and vectorised operations + - decision: Bus-based network topology with abstract components + rationale: > + Components are fundamental building blocks that can be composed + in a variety of ways to represent energy assets schema_format: | - + Simple CSV tables and runtime checks. We are currently migrating this to Pydantic + and Pandera for stricter validation. implementation_languages: - - - - + - Python - database_storage_backend: + database_storage_backend: | + - NetCDF + - HDF5 + - CSV folders + - Excel + - DuckDB (coming soon) + - parquet (coming soon) interoperability: imports_from: - - - - + - All formats above + - pandapower + - pypower exports_to: - - - - + - All formats above - data_tool_relation: + data_tool_relation: Tightly coupled extensibility: | - + - Custom attributes on any component type can be set + - Any custom constraints can be defined via the API + - Components are fundamental building blocks that can be composed to represent a wide range of energy assets. units_handling: | - + Units are documented per attribute but not enforced at runtime. No + unit conversion library is used. validation_approach: | - - + With current overhaul while migrating to Pydantic and Pandera: schema structure, + types, range checks, settings and cross-field validation as well as network + topology. If the user can provide it, PyPSA can support it. Otherwise it should fail. + governance: | - + Core team of maintainers at TU Berlin and Open Energy Transition with community + contributions via GitHub. Conflicts between maintainers are resolved with simple + majority vote. # --------------------------------------------------------------------------- # 4. Real-World Usage # --------------------------------------------------------------------------- usage: tools_built_on_schema: - - tool: - relationship: - link: - + - tool: See https://docs.pypsa.org/latest/home/models/ + largest_real_world_dataset: | - + PyPSA-Eur: sector-coupled capacity expansion and dispatch model of the + full ENTSO-E area (~35 countries). Raw network has ~5,300 buses, + ~6,600 HVAC lines and 46 HVDC links, typically clustered to + 50–250 nodes. Hourly resolution, multi-period pathway optimisation. + Data from OpenStreetMap (grid), ERA5/SARAH-3 (renewables), ENTSO-E + (demand), Danish Energy Agency (costs). + PyPSA-Earth extends geographic coverage globally with workflows available + for nearly every country on Earth. who_is_using_it: - - - - <...> + - We lost track of that. For known users, see https://docs.pypsa.org/latest/home/users/ data_available: - - geographic_area: + - geographic_area: International content: | - - access: + We lost track of that. There are many different PyPSA based model implementations + for many different regions across the world. Many are public, but there are also + many private and proprietary ones we know of, but don't have access to. + The link above lists the biggest public ones. + access: public, licensed, and proprietary # --------------------------------------------------------------------------- # 5. Limitations & Challenges # --------------------------------------------------------------------------- challenges: known_limitations: - - - - <...> + - The tabular format is not the most natural fit for holding network topology data. + This is lifted with strict validation and convenience functions / accessors. + - Strict backwards compatibility constrains the pace of schema evolution. hardest_problems_encountered: | - + PyPSA is a toolbox used for many different use cases, we sometimes + don't even know all the ways it is applied. Maintaining full backwards + compatibility while extending the schema and validation is challenging, + but has been strictly followed since the release of v1.0. # --------------------------------------------------------------------------- # 6. Interoperability & Convergence # --------------------------------------------------------------------------- interoperability: areas_of_overlap_with_other_schemas: | - + ~ what_would_convergence_require: | - - + The PyPSA data model is very much integrated and used for a variety of workflows, + models and applications by many users. We want to provide full backwards + compatibility for all of those users. There is also a lot of functionality already + based on PyPSA. Either integrated into the framework (like clustering functionality + or statistics/ plotting module, which can be plugged into dashboards and GUI + applications without much further work) or in external tools created by the + community. + Support for any interoperability focused tool or any other tool in general, can most + likely therefore only be incorporated via translators. + biggest_thing_others_should_know: | @@ -165,6 +275,6 @@ interoperability: # Metadata # --------------------------------------------------------------------------- card_metadata: - prepared_by: - date: + prepared_by: Lukas Trippe + date: "2026-03-11" info_sheet_version: "1.0"