Refactor harvest orchestration and enhance Prefect worker setup#98
Open
JessyBarrette wants to merge 20 commits into
Open
Refactor harvest orchestration and enhance Prefect worker setup#98JessyBarrette wants to merge 20 commits into
JessyBarrette wants to merge 20 commits into
Conversation
…rt for remote workers
…README for clarity
…d streamline folder handling
… tasks for clarity
…dicated task and enhance error handling
…atus and log files as artifacts for better visibility in the Prefect UI
…et harvesting logic for clarity and efficiency
…etching and CSV writing, improve error handling, and streamline CKAN record fetching with caching
…ord fetching process
…corresponding API routes - Implement HarvestRun component to display details of a specific harvest run. - Implement HarvestServer component to show datasets from a specific server with filtering options. - Implement Sparkline component for visualizing dataset status history. - Implement StatusBadge component for displaying status labels. - Add slug utility functions for encoding and decoding URLs. - Create useHarvestFetch hook for fetching data from the harvest API. - Add styles for Harvest components and tables. - Update routing in index.js to include new Harvest routes. - Add translations for Harvest-related text in English and French. - Implement new API routes for harvest data retrieval in the backend.
…scope, triggered source, and duration
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Improve the Prefect worker setup for in-process harvesting and add support for remote workers. Update the README for clarity on metadata storage and configuration handling. Introduce Coolify-specific docker-compose overrides and streamline the orchestration of harvest jobs per server. Enhance error handling and logging in the ERDDAP harvester, and publish dataset status and logs as artifacts for better visibility. Normalize cron schedule handling and improve documentation throughout.
This pull request introduces significant improvements to the deployment, scaling, and orchestration of the harvester and related services, with a focus on better support for Prefect-based orchestration, multi-host/remote worker setups, and environment-specific overrides. The changes streamline how harvest flows are run and scheduled, improve documentation, and introduce new Docker Compose configurations for both production and specialized environments like Coolify.
Prefect orchestration and worker management:
Replaces the previous Prefect deployment/worker setup with a new
prefect_workerservice that registers work pools and deployments on startup, runs harvest flows in-process (no per-run containers or Docker socket), and can be scaled horizontally with Docker Compose. Adds environment variables to control scheduling, deployment registration, and on-deploy triggers (HARVESTER_CRON,VERNACULARS_CRON,RUN_ON_DEPLOY,REGISTER_DEPLOYMENTS). ([[1]](https://github.com/cioos-siooc/explore-cioos/pull/98/files#diff-0a1c3356cafa536f2da1e810fe8ae075ca001848b63c20d86b004626789cfa88L76-L122),[[2]](https://github.com/cioos-siooc/explore-cioos/pull/98/files#diff-088d9f35d23a4347d221d71dd49b02b95001dff4abe637a40fe0bc04d502049cL54-R67),[[3]](https://github.com/cioos-siooc/explore-cioos/pull/98/files#diff-088d9f35d23a4347d221d71dd49b02b95001dff4abe637a40fe0bc04d502049cL68-R85),README.md[4]Adds a new
docker-compose.worker.yamlfor launching remote Prefect workers on additional hosts, allowing for distributed harvesting capacity. These workers poll the central Prefect server, do not register deployments, and require access to the central database and Prefect API. ([docker-compose.worker.yamlR1-R51](https://github.com/cioos-siooc/explore-cioos/pull/98/files#diff-84721629841278ceb728e5039aec40cf3bc45a21ad3dd65df058de2cc1e044ebR1-R51))Updates the Prefect server to use a non-conda image with asyncpg, ensuring metadata is stored in Postgres (not SQLite) for better concurrency and reliability. Includes logic to auto-create the
prefectdatabase if missing. ([docker-compose.production.yamlL139-R161](https://github.com/cioos-siooc/explore-cioos/pull/98/files#diff-0a1c3356cafa536f2da1e810fe8ae075ca001848b63c20d86b004626789cfa88L139-R161))Environment and deployment configuration:
Adds
docker-compose.coolify.ymlas a Coolify-specific override, which removes published host ports and sets up environment variables for Coolify's proxy/FQDN system. ([docker-compose.coolify.ymlR1-R40](https://github.com/cioos-siooc/explore-cioos/pull/98/files#diff-28660a25c4eefbe9ae070d880f200978aea222a171ea49e62d00a5516e3a9eb0R1-R40))Removes the sample local development override file (
docker-compose.override.yaml.sample) to avoid confusion and clarify deployment practices. ([docker-compose.override.yaml.sampleL1-L18](https://github.com/cioos-siooc/explore-cioos/pull/98/files#diff-416e8201f68e086bd5fc13472e9ef05dca0310f18ba4247453d278907fb8053aL1-L18))Updates
.env.samplewith new variables for harvest scheduling, config file selection, and deployment registration, along with improved documentation. ([[1]](https://github.com/cioos-siooc/explore-cioos/pull/98/files#diff-088d9f35d23a4347d221d71dd49b02b95001dff4abe637a40fe0bc04d502049cL54-R67),[[2]](https://github.com/cioos-siooc/explore-cioos/pull/98/files#diff-088d9f35d23a4347d221d71dd49b02b95001dff4abe637a40fe0bc04d502049cL68-R85))Docker Compose networking and environment updates:
Modifies
docker-compose.yamlto publish required ports fordbandnginxby default, simplifying local development and aligning with the new override strategy for production/Coolify. ([[1]](https://github.com/cioos-siooc/explore-cioos/pull/98/files#diff-3fde9d1a396e140fefc7676e1bd237d67b6864552b6f45af1ebcc27bcd0bb6e9L5-R6),[[2]](https://github.com/cioos-siooc/explore-cioos/pull/98/files#diff-3fde9d1a396e140fefc7676e1bd237d67b6864552b6f45af1ebcc27bcd0bb6e9L33-R41))Removes Coolify-specific environment variables and logic from the base Compose file, moving them to the dedicated override. (
[[1]](https://github.com/cioos-siooc/explore-cioos/pull/98/files#diff-3fde9d1a396e140fefc7676e1bd237d67b6864552b6f45af1ebcc27bcd0bb6e9L33-R41),[[2]](https://github.com/cioos-siooc/explore-cioos/pull/98/files#diff-3fde9d1a396e140fefc7676e1bd237d67b6864552b6f45af1ebcc27bcd0bb6e9L56-R54))Code and task changes:
db-loader/cde_db_loader/__main__.pyto define the main loader as a Prefect@taskinstead of a@flow, aligning it with the new orchestration approach. ([[1]](https://github.com/cioos-siooc/explore-cioos/pull/98/files#diff-b72bd0dde5d880ec74272e3030892ede678350af9f592f91242363b1c39a9e71L20-R20),[[2]](https://github.com/cioos-siooc/explore-cioos/pull/98/files#diff-b72bd0dde5d880ec74272e3030892ede678350af9f592f91242363b1c39a9e71L190-R190))Prefect orchestration and scaling:
prefect_workerservice and new scheduling/environment controls (HARVESTER_CRON,VERNACULARS_CRON,RUN_ON_DEPLOY,REGISTER_DEPLOYMENTS). ([[1]](https://github.com/cioos-siooc/explore-cioos/pull/98/files#diff-0a1c3356cafa536f2da1e810fe8ae075ca001848b63c20d86b004626789cfa88L76-L122),[[2]](https://github.com/cioos-siooc/explore-cioos/pull/98/files#diff-088d9f35d23a4347d221d71dd49b02b95001dff4abe637a40fe0bc04d502049cL54-R67),[[3]](https://github.com/cioos-siooc/explore-cioos/pull/98/files#diff-088d9f35d23a4347d221d71dd49b02b95001dff4abe637a40fe0bc04d502049cL68-R85),[[4]](https://github.com/cioos-siooc/explore-cioos/pull/98/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5L175-R215))docker-compose.worker.yamlfor launching remote Prefect workers on other hosts, enabling distributed harvesting. ([docker-compose.worker.yamlR1-R51](https://github.com/cioos-siooc/explore-cioos/pull/98/files#diff-84721629841278ceb728e5039aec40cf3bc45a21ad3dd65df058de2cc1e044ebR1-R51))prefectDB if needed. ([docker-compose.production.yamlL139-R161](https://github.com/cioos-siooc/explore-cioos/pull/98/files#diff-0a1c3356cafa536f2da1e810fe8ae075ca001848b63c20d86b004626789cfa88L139-R161))Deployment/environment configuration:
docker-compose.coolify.ymlfor Coolify-specific overrides, removing host port exposure and integrating with Coolify's FQDN/proxy system. ([docker-compose.coolify.ymlR1-R40](https://github.com/cioos-siooc/explore-cioos/pull/98/files#diff-28660a25c4eefbe9ae070d880f200978aea222a171ea49e62d00a5516e3a9eb0R1-R40))[docker-compose.override.yaml.sampleL1-L18](https://github.com/cioos-siooc/explore-cioos/pull/98/files#diff-416e8201f68e086bd5fc13472e9ef05dca0310f18ba4247453d278907fb8053aL1-L18)).env.samplewith new scheduling and config variables, and improved documentation. ([[1]](https://github.com/cioos-siooc/explore-cioos/pull/98/files#diff-088d9f35d23a4347d221d71dd49b02b95001dff4abe637a40fe0bc04d502049cL54-R67),[[2]](https://github.com/cioos-siooc/explore-cioos/pull/98/files#diff-088d9f35d23a4347d221d71dd49b02b95001dff4abe637a40fe0bc04d502049cL68-R85))Docker Compose and networking:
[[1]](https://github.com/cioos-siooc/explore-cioos/pull/98/files#diff-3fde9d1a396e140fefc7676e1bd237d67b6864552b6f45af1ebcc27bcd0bb6e9L5-R6),[[2]](https://github.com/cioos-siooc/explore-cioos/pull/98/files#diff-3fde9d1a396e140fefc7676e1bd237d67b6864552b6f45af1ebcc27bcd0bb6e9L33-R41),[[3]](https://github.com/cioos-siooc/explore-cioos/pull/98/files#diff-3fde9d1a396e140fefc7676e1bd237d67b6864552b6f45af1ebcc27bcd0bb6e9L56-R54))Code/task refactoring:
@taskinstead of@flow. ([[1]](https://github.com/cioos-siooc/explore-cioos/pull/98/files#diff-b72bd0dde5d880ec74272e3030892ede678350af9f592f91242363b1c39a9e71L20-R20),[[2]](https://github.com/cioos-siooc/explore-cioos/pull/98/files#diff-b72bd0dde5d880ec74272e3030892ede678350af9f592f91242363b1c39a9e71L190-R190))