feat(elt-pipelines): Add initial project with example pipeline by WHTaylor · Pull Request #368 · ISISNeutronMuon/analytics-data-platform

WHTaylor · 2026-06-24T14:19:07Z

Creates an elt-pipelines project with a statusdisplay pipeline, which uses elt-common to ingest data from the ISIS cycles endpoint. The pipeline can be run using the instructions from the README.

elt-common is currently included as a dependency in elt-pipelines using a relative path pointing at the package in the parent folder. This makes it easy to work on both locally, but means anything wanting to run pipelines needs both packages in its working directory; don't think it's a big deal, but maybe not ideal? Are we aiming to publish elt-common to PyPI so we can use it as a 'normal' dependency?
The new pipeline is ingesting into an elt_cycles table for testing purposes. Once we want to migrate to this pipeline in production, it should probably start ingesting into cycles instead, but because it's using a different schema from the current DLT pipeline we'll need to replace the table entirely, so there will be a bit of extra work needed at the time
Something that only just occurred to me - why is it called statusdisplay? Should it change to something like cycles or isiscycles?

Summary by CodeRabbit

New Features
- Added a new pipeline that fetches and formats status data for ingestion.
- Introduced support for an optional extra with the required data-processing and HTTP libraries.
Documentation
- Added a project README with setup steps, dependency installation, and example run instructions.
Chores
- Added ignore rules for common Python, environment, and build artefacts.

ref #321

coderabbitai · 2026-06-24T14:19:21Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6fa262d1-537b-4007-8f06-aee5d37a6c66

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

A new elt-pipelines Python package is introduced with pyproject.toml, .gitignore, and a README.md. The first pipeline implementation, statusdisplay.py, fetches accelerator cycle/phase data from a fixed API endpoint, reformats ISO date strings, and yields PyArrow tables via an Extract class.

Changes

elt-pipelines project setup and statusdisplay pipeline

Layer / File(s)	Summary
Project scaffold: metadata, gitignore, and README `elt-pipelines/pyproject.toml`, `elt-pipelines/.gitignore`, `elt-pipelines/README.md`	`pyproject.toml` declares package metadata, `elt-common` as an editable local dep, a `statusdisplay` optional extra (`pyarrow`, `requests`), and a `dev` group. `.gitignore` excludes Python caches and build artefacts. `README.md` documents `uv`-based setup and `elt` CLI usage.
statusdisplay ingestion pipeline `elt-pipelines/pipelines/ingest/accelerator/statusdisplay/statusdisplay.py`	Adds `Extract` registering an `elt_cycles` replace-mode resource. `fetch()` performs HTTP GET with `RuntimeError` on failure. `clean()`/`reformat()` mutate phase `start`/`end` ISO strings to `"%Y-%m-%d %H:%M:%S"`. Converted payload is read into a PyArrow table via `pyarrow.json.read_json()`.

Poem

A new pipeline burrows into the ground 🐇
With cycles and phases all neatly found,
Dates reformatted, arrows drawn tight,
The statusdisplay gleams in the night.
uv sync --extra statusdisplay — away we go,
Watch the elt-pipelines put on a show! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarises the main change: adding the initial elt-pipelines project with an example pipeline.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

elt-pipelines/.gitignore (1)

1-8: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Ignore the local virtual environment directory.

The setup instructions create .venv, but it is not ignored here, so it can be accidentally committed.

Suggested patch

 # ignore basic python artifacts
 .env
+.venv/
 **/__pycache__/
 **/*.py[cod]
 **/*$py.class
 **/build/
 **/*.egg-info/

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@elt-pipelines/.gitignore` around lines 1 - 8, The .gitignore entry set is
missing the local virtual environment directory, so update the ignore rules to
also exclude the project’s .venv folder alongside the existing Python artifacts.
Keep the change in the same ignore list near the other environment/build entries
so the setup-created virtual environment is not accidentally committed.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@elt-pipelines/pipelines/ingest/accelerator/statusdisplay/statusdisplay.py`:
- Around line 33-38: The fetch() helper currently calls requests.get(CYCLES_URL)
without a timeout and only handles non-OK responses, so transport failures can
escape unhandled. Update fetch() to wrap the requests.get call in try/except
requests.RequestException, add a timeout to the request, and raise one
RuntimeError that includes the CYCLES_URL context for both request failures and
bad responses.
- Around line 26-30: The JSON loading path in statusdisplay’s fetch/read flow
needs an empty-input guard because pyarrow.json.read_json will fail on a
zero-length stream. Update the logic around clean(fetch()) and the yield
pyarrow.json.read_json(f) call to detect when no rows are returned and
immediately yield an empty table or return early instead of building and parsing
an empty buffer.

In `@elt-pipelines/README.md`:
- Around line 16-22: The setup instructions for the `uv sync` flow are
incomplete because `elt-pipelines` depends on the local sibling checkout of
`elt-common`. Update the README section that shows `uv venv`, `source
.venv/bin/activate`, and `uv sync` to explicitly tell users to clone or place
`elt-common` alongside `elt-pipelines` before running those commands, so the
editable local source can be resolved. Use the existing setup text in the README
and mention the dependency on `../elt-common` near the `uv sync` instructions.

---

Outside diff comments:
In `@elt-pipelines/.gitignore`:
- Around line 1-8: The .gitignore entry set is missing the local virtual
environment directory, so update the ignore rules to also exclude the project’s
.venv folder alongside the existing Python artifacts. Keep the change in the
same ignore list near the other environment/build entries so the setup-created
virtual environment is not accidentally committed.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 91b01f08-3b41-40bf-bb15-c76f161ed326

📥 Commits

Reviewing files that changed from the base of the PR and between 857eba7 and 781e1d1.

⛔ Files ignored due to path filters (1)

elt-pipelines/uv.lock is excluded by !**/*.lock

📒 Files selected for processing (4)

elt-pipelines/.gitignore
elt-pipelines/README.md
elt-pipelines/pipelines/ingest/accelerator/statusdisplay/statusdisplay.py
elt-pipelines/pyproject.toml

martyngigg · 2026-06-25T10:32:36Z

Thanks for this. I'll take a look at the code shortly but to answer the questions:

* `elt-common` is currently included as a dependency in `elt-pipelines` using a relative path pointing at the package in the parent folder. This makes it easy to work on both locally, but means anything wanting to run pipelines needs both packages in its working directory; don't think it's a big deal, but maybe not ideal? Are we aiming to publish `elt-common` to PyPI so we can use it as a 'normal' dependency?

I think for the pipelines here that's fine, at least for now. I'd been mostly hoping to avoid publishing to PyPI if possible as I wasn't really aiming to create a general purpose package for all of the world. In that case the naming is then more challenging. Given you can install with pip from a git url, including a versioned one, then I thought that was a fine way to go. It's pure Python so installing this way should be as easy as a package from PyPi.

* The new pipeline is ingesting into an `elt_cycles` table for testing purposes. Once we want to migrate to this pipeline in production, it should probably start ingesting into `cycles` instead, but because it's using a different schema from the current DLT pipeline we'll need to replace the table entirely, so there will be a bit of extra work needed at the time

Makes sense!

* Something that only just occurred to me - why is it called `statusdisplay`? Should it change to something like `cycles` or `isiscycles`?

I was naming things after the system they came from and the source of the cycles is the system that supports the isis status display. Happy to change if it's found to be too confusing - I'm not particularly wedded to it. It's emphasizing the need for more documentation around this though!

For some reason this didn't pick up when running the linter manually, but did fail in CI https://github.com/ISISNeutronMuon/analytics-data-platform/actions/runs/28166532207/job/83419815751. Making a purposefully incorrect change was enough to trigger the formatter

WHTaylor · 2026-06-25T11:29:14Z

ref the last two commits, I was poking around the pre-commit set up (because the markdown-lint step is slightly annoyingly slow) and saw that the ruff linter/formatter needed to be specifically set up to include the new directory. It then didn't pick up the already existing incorrect formatting on commit because the file hadn't changed.

martyngigg · 2026-06-25T13:23:21Z

+    return data
+
+
+if __name__ == "__main__":


I guess this facilitates easier debugging?

Yeah, it was useful for quick checks whilst working on the pipeline. Me as of yesterday thought it'd be a good idea to leave it in as an example, but me as of today disagrees, so I've taken it out.

martyngigg · 2026-06-25T13:55:22Z

My original thought here would be that the child directories of elt-pipelines would be named after the lakekeeper warehouse that the transformed models end up in, i.e. the cycles tables are associated with facility operationsso end up in thefacility_ops` warehouse.

For the FASE data our thinking was to have a separate warehouse given there are more access controls required for, e.g. who can access what. In the faciity_ops case the data can all be simply read only. It would also then be feasible to have separate repositories for each set of pipelines targeting a given warehouse.

What do you think about having:

elt-pipelines/ |-- facility_ops/ | |-- ingest/ | | |-- accelerator/ | |-- transform/ | | |-- # not here yet but would be... | |-- pyproject.toml | |-- .gitignore | |-- ... |-- other_warehouse

child directories of elt-pipelines would be named after the lakekeeper warehouse that the transformed models end up in

For the FASE data our thinking was to have a separate warehouse

I think this makes sense, and it might be possible to also use the directories for configuring pyiceberg to control the destination warehouse (either using the directory name instead of getting the default catalog here, or putting some amount of the config into the directories).

It would also then be feasible to have separate repositories for each set of pipelines targeting a given warehouse

This feels like it'd fragment the project, especially given the use of the relative path for the elt-common dependency; what benefits do you see it having?

This is technically the only use of pydantic-settings in the project, so it could be removed as a dependency. However, any pipeline that wants to include custom configuration will need to extend BaseSettings, so I think we should leave the dependency as is.

WHTaylor added 3 commits June 24, 2026 14:53

feat(elt-pipelines): Create initial project

b631c13

ref #321

feat(elt-pipelines): Add statusdisplay pipeline

e4e4592

ref #321

docs(elt-pipelines): Add running instructions to README

781e1d1

WHTaylor requested a review from a team as a code owner June 24, 2026 14:19

coderabbitai Bot reviewed Jun 24, 2026

View reviewed changes

Comment thread elt-pipelines/pipelines/ingest/accelerator/statusdisplay/statusdisplay.py

Comment thread elt-pipelines/pipelines/ingest/accelerator/statusdisplay/statusdisplay.py

Comment thread elt-pipelines/README.md

feat(elt-pipelines): Add timeout to cycles fetch

08e8d9a

martyngigg self-assigned this Jun 25, 2026

WHTaylor added 2 commits June 25, 2026 12:20

Run ruff on elt-pipelines

92aee66

martyngigg reviewed Jun 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(elt-pipelines): Add initial project with example pipeline#368

feat(elt-pipelines): Add initial project with example pipeline#368
WHTaylor wants to merge 7 commits into
mainfrom
321-pipelines-project

WHTaylor commented Jun 24, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 24, 2026 •

edited

Loading

Review skipped

Walkthrough

Changes

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

martyngigg commented Jun 25, 2026

Uh oh!

WHTaylor commented Jun 25, 2026

Uh oh!

martyngigg Jun 25, 2026

Uh oh!

WHTaylor Jun 25, 2026

Uh oh!

martyngigg Jun 25, 2026

Uh oh!

WHTaylor Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

WHTaylor commented Jun 24, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

martyngigg commented Jun 25, 2026

Uh oh!

WHTaylor commented Jun 25, 2026

Uh oh!

martyngigg Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

WHTaylor Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

martyngigg Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

WHTaylor Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

WHTaylor commented Jun 24, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 24, 2026 •

edited

Loading