Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ jobs:
python-version: ${{ matrix.python-version }}

- name: Install uv
uses: astral-sh/setup-uv@v5
uses: astral-sh/setup-uv@v6

- name: Install project and dependencies
run: |
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/cleanup.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,10 +20,10 @@ jobs:
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.12"
python-version: "3.13"

- name: Install uv
uses: astral-sh/setup-uv@v5
uses: astral-sh/setup-uv@v6

- name: Install project and dependencies
run: |
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,10 +23,10 @@ jobs:
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.12"
python-version: "3.13"

- name: Install uv
uses: astral-sh/setup-uv@v5
uses: astral-sh/setup-uv@v6

- name: Install project and dependencies
run: |
Expand Down
2 changes: 1 addition & 1 deletion .python-version
Original file line number Diff line number Diff line change
@@ -1 +1 @@
3.12
3.13
58 changes: 56 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,17 @@
# OEO Data Management

[![codecov](https://codecov.io/gh/ParticularlyPythonicBS/oeo_data/branch/develop/graph/badge.svg?token=O1ZU4OE5UY)](https://codecov.io/gh/ParticularlyPythonicBS/oeo_data)
[![pre-commit.ci status](https://results.pre-commit.ci/badge/github/ParticularlyPythonicBS/oeo_data/main.svg)](https://results.pre-commit.ci/latest/github/ParticularlyPythonicBS/oeo_data/main)
[![CI](https://github.com/ParticularlyPythonicBS/oeo_data/actions/workflows/ci.yml/badge.svg?branch=develop)](https://github.com/ParticularlyPythonicBS/oeo_data/actions/workflows/ci.yml)
[![Publish Dataset to R2](https://github.com/ParticularlyPythonicBS/oeo_data/actions/workflows/publish.yml/badge.svg?branch=develop)](https://github.com/ParticularlyPythonicBS/oeo_data/actions/workflows/publish.yml)
[![Cleanup Staging Bucket](https://github.com/ParticularlyPythonicBS/oeo_data/actions/workflows/cleanup.yml/badge.svg)](https://github.com/ParticularlyPythonicBS/oeo_data/actions/workflows/cleanup.yml)

This repository provides a command-line tool (`datamanager`) to manage large, versioned datasets (like SQLite files) using Git for metadata and Cloudflare R2 for object storage.
This is the official repository for versioned input databases used by the Open Energy Outlook (OEO) initiative. It contains a command-line tool (datamanager) designed to manage these Temoa-compatible SQLite databases using a secure, auditable, and CI/CD-driven workflow.

This approach avoids the pitfalls of storing large binary files directly in Git while still providing a robust, auditable version history for your data assets through a secure, CI/CD-driven workflow.
## About the Data

The SQLite databases hosted here are designed to be used as inputs for [Temoa](https://github.com/TemoaProject/temoa), an open-source energy system optimization model.
This data is curated and maintained by the Open Energy Outlook (OEO) team. The goal is to provide a transparent, version-controlled, and publicly accessible set of data for energy systems modeling and analysis.

## The Core Concept

Expand Down Expand Up @@ -119,6 +124,55 @@ flowchart TD

![Verify Output](assets/verification.png)

## 📖 The Data Publishing Workflow

All changes to the data—whether creating, updating, or deleting—follow a strict, safe, and reviewable Git-based workflow.

### Step 1: Create a New Branch

Always start by creating a new branch from the latest version of `main`. This isolates your changes.

```bash
git checkout main
git pull
git checkout -b feat/update-census-data
```

### Step 2: Prepare Your Changes

Use the `datamanager` tool to stage your changes. The `prepare` command handles both creating new datasets and updating existing ones.

```bash
# This uploads the file to the staging bucket and updates manifest.json locally
uv run datamanager prepare census-data.sqlite ./local-files/new-census.sqlite
```

The tool will guide you through the process. For other maintenance tasks like `rollback` or `delete`, use the corresponding command.

### Step 3: Commit and Push

Commit the modified `manifest.json` file to your branch with a descriptive message. This message will become the official description for the new data version.

```bash
git add manifest.json
git commit -m "feat: Add 2025 census data with new demographic columns"
git push --set-upstream origin feat/update-census-data
```

### Step 4: Open a Pull Request

Go to GitHub and open a pull request from your feature branch to `main`. The diff will clearly show the proposed changes to the manifest for your team to review.

### Step 5: Merge and Automate

Once the PR is reviewed, approved, and all status checks pass, merge it. The CI/CD pipeline takes over automatically:

- It copies the data from the staging bucket to the production bucket.
- It finalizes the `manifest.json` with the new commit hash and description.
- It pushes a final commit back to `main`.

The new data version is now live and available to all users via `datamanager pull`.

## 🚀 Usage

The primary workflow is now to **prepare** a dataset, then use standard Git practices to propose the change.
Expand Down
20 changes: 20 additions & 0 deletions docs/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Minimal makefile for Sphinx documentation
#

# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = source
BUILDDIR = build

# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
4 changes: 4 additions & 0 deletions docs/build/html/.buildinfo
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Sphinx build info version 1
# This file records the configuration used when building these files. When it is not found, a full rebuild will be done.
config: 1e45b9d71a28334c12a29c167ae132fb
tags: 645f666f9bcd5a90fca523b33c5a78b7
Binary file added docs/build/html/.doctrees/api_reference.doctree
Binary file not shown.
Binary file added docs/build/html/.doctrees/environment.pickle
Binary file not shown.
Binary file added docs/build/html/.doctrees/index.doctree
Binary file not shown.
Binary file added docs/build/html/.doctrees/workflow.doctree
Binary file not shown.
16 changes: 16 additions & 0 deletions docs/build/html/_sources/api_reference.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
API Reference
=============

This section provides auto-generated documentation from the project's source code.

datamanager.core
----------------

.. automodule:: datamanager.core
:members:

datamanager.manifest
--------------------

.. automodule:: datamanager.manifest
:members:
19 changes: 19 additions & 0 deletions docs/build/html/_sources/index.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
.. OEO Data Management documentation master file, created by
sphinx-quickstart on Wed Jul 9 16:27:43 2025.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.

OEO Data Management documentation
=================================

Welcome to the documentation for the Open Energy Outlook (OEO) Data Manager.

This tool provides a command-line interface and a secure, CI/CD-driven workflow for managing Temoa-compatible SQLite databases used by the OEO initiative.

.. toctree::
:maxdepth: 2
:caption: Contents:

workflow
usage
api_reference
39 changes: 39 additions & 0 deletions docs/build/html/_sources/workflow.md.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# The Data Publishing Workflow

All changes to the data---whether creating, updating, or deleting---follow a strict, safe, and reviewable Git-based workflow.

### Step 1: Create a New Branch

Always start by creating a new branch from the latest version of `main`.

```bash
git checkout main
git pull
git checkout -b feat/update-census-data
```

### Step 2: Prepare Your Changes

Use the `datamanager` tool to stage your changes. The `prepare` command handles both creating new datasets and updating existing ones.

```bash
uv run datamanager prepare census-data.sqlite ./local-files/new-census.sqlite
```

### Step 3: Commit and Push

Commit the modified `manifest.json` file with a descriptive message. This message will become the official description for the new data version.

```bash
git add manifest.json
git commit -m "feat: Add 2025 census data with new demographic columns"
git push --set-upstream origin feat/update-census-data
```

### Step 4: Open a Pull Request

Go to GitHub and open a pull request from your feature branch to `main`.

### Step 5: Merge and Automate

Once the PR is reviewed, approved, and all status checks pass, merge it. The CI/CD pipeline takes over automatically, publishing the data to the production bucket and finalizing the manifest.
Loading