Skip to content

Latest commit

 

History

History
581 lines (489 loc) · 36.1 KB

File metadata and controls

581 lines (489 loc) · 36.1 KB

Anypod – Task List

1 Repo Bootstrap

  • Init git repogit init --initial-branch=main && gh repo create.
  • pyproject.toml – minimal project metadata, uv backend, Python ≥ 3.14.
  • uv pip install --groups dev – add dev deps: ruff, pytest, pytest-asyncio, mypy, pre-commit.
  • Pre-commit hooks – formatters & linter.
  • CI – GitHub Actions workflow running pytest on every PR.

2 Configuration Loader

  • Pydantic V2 models reflecting YAML keys.
  • load_config(path) -> dict[str, FeedConfig] (implemented via Pydantic Settings).
  • Unit tests using fixture YAML.

3 Database Layer

  • CRUD helpers:
    • add_download
    • update_status
    • next_queued_downloads
    • get_download_by_id
    • get_errors
    • get_downloads_to_prune_by_keep_last
    • get_downloads_to_prune_by_since
    • delete_downloads
  • Tests with tmp in-memory DB.
  • Tests to make sure db access is optimized (e.g. uses indexes)

3.2 File Manager Layer

  • Abstraction seam: encapsulate base directory so future S3/GCS back‑ends can subclass
  • Implement save_download_file(feed, file_name, data_stream) -> Path (atomic write)
  • Implement delete_download_file(feed, file_name) -> bool
  • Implement download_exists(feed, file_name) -> bool
  • Implement get_download_stream(feed, file_name) -> IO[bytes]
  • Ensure directory hygiene: base download and feed directories exist (covered by save_download_file)
  • Write unit tests (tmp dir fixtures)

3.5 Data Orchestration & Services Layer

This section details the components that manage the lifecycle of downloads, from discovery to storage and pruning. They are organized into a data_coordinator module and related services/wrappers.

3.5.1 db.py::Download model updates

  • from_row(cls, db_row: sqlite3.Row) -> Download class method for mapping.

3.5.2 YtdlpWrapper (ytdlp_wrapper.py)

  • Create ytdlp_wrapper.py and class YtdlpWrapper.
  • YtdlpWrapper.fetch_metadata(feed_id: str, url: str, yt_cli_args: list[str]) -> list[Download]: Fetches metadata for all downloads at the given URL using yt-dlp's metadata extraction capabilities.
  • YtdlpWrapper.download_media_to_file(download: Download, yt_cli_args: list[str], download_target_dir: Path) -> Path:
    • Purpose: Downloads media (video or audio) for the given entry to a specified directory, handling potential merges (e.g., video + audio) via yt-dlp and FFmpeg.
    • Arguments:
      • download: Metadata of the entry to download, used for naming and context.
      • yt_cli_args: List of command-line arguments for yt-dlp (e.g., format selection from feed config).
      • download_target_dir: The base directory where the feed-specific subfolder and media file will be created.
    • Returns: Path to the successfully downloaded media file.
  • Sandbox test using a Creative-Commons short video, covered by integration tests using real URLs.
  • Unit tests for YtdlpWrapper.

3.5.2.1 Logger Side Quest

  • implement a global logging framework

3.5.3 Enqueuer (data_coordinator/enqueuer.py)

  • Constructor accepts DatabaseManager, YtdlpWrapper.
  • enqueue_new_downloads(feed_config: FeedConfig) -> int:
    • Phase 1: Re-fetch metadata for existing DB entries with status 'upcoming'; update those now VOD to 'queued'.
    • Phase 2: Fetch metadata for latest N videos via YtdlpWrapper.fetch_metadata().
      • For each download not in DB:
        • If VOD (live_status=='not_live' and is_live==False), insert with status 'queued'.
        • If live or scheduled (live_status=='upcoming' or is_live==True), insert with status 'upcoming'.
      • For each existing 'upcoming' entry now VOD, update status to 'queued'.
    • Returns count of newly enqueued or transitioned-to-queued downloads.
  • preprocess cli args so you dont do it every time
  • handle cookies differently? they need to be included even at discovery stage
    • just include all args for all stages, including discovery; user will have limitations that need to be outlined in docs
  • db should not leak any details about sqlite -- should abstract all that away
    • for example, remove all references to sqlite.Row, sqlite.Error
  • retries should apply more widely, and with enough failures, should transition to error state
    • maybe db.py needs a bump_error_count fn that handles this - bumps it until it becomes too high, then marks as error
  • Unit tests for Enqueuer with mocked dependencies.
  • a couple integration tests

3.5.3.1 TODO Side Quest

  • Refactor Download Status Management (State Machine Implementation)
    • Phase 1: Database Layer (db.py)
      • Remove the existing DatabaseManager.update_status method.
      • Implement DatabaseManager.mark_as_queued_from_upcoming(feed: str, id: str) -> None:
        • Checks that the current status is in fact UPCOMING
        • Sets status = QUEUED.
        • Preserves retries and last_error.
      • Implement DatabaseManager.requeue_download(feed: str, id: str) -> None:
        • Add a note that this will happen due to:
          • manually requeueing an ERROR'd download,
          • manually requeueing in order to get the latest version of a download (i.e. it was previously DOWNLOAD)
          • un-SKIPping a video
          • don't implement this as logic, but just as a note on the docstring
        • Sets status = QUEUED.
        • Sets retries = 0, last_error = NULL.
      • Implement DatabaseManager.mark_as_downloaded(feed: str, id: str) -> None:
        • Sets status = DOWNLOADED.
        • Sets retries = 0, last_error = NULL.
      • Implement DatabaseManager.skip_download(feed: str, id: str) -> None:
        • Sets status = SKIPPED.
        • Preserves retries and last_error.
        • Raises DownloadNotFoundError or DatabaseOperationError on failure.
      • Implement DatabaseManager.unskip_download(feed_id: str, download_id: str) -> DownloadStatus:
        • Checks that the download is currently SKIPPED.
        • Calls requeue_download(feed_id, download_id).
        • Returns DownloadStatus.QUEUED.
        • Raises DownloadNotFoundError or DatabaseOperationError on failure.
      • Implement DatabaseManager.archive_download(feed: str, id: str) -> None:
        • Sets status = ARCHIVED.
        • Preserves retries and last_error.
        • Raises DownloadNotFoundError or DatabaseOperationError on failure.
      • Modify DatabaseManager.get_download_by_id(feed: str, id: str) -> Download:
        • Change return type from Download | None to Download.
        • Raises DownloadNotFoundError if not found.
        • Raises DatabaseOperationError for other DB issues.
        • Raises ValueError if row parsing fails.
      • Verify DatabaseManager.upsert_download correctly handles initial setting of UPCOMING or QUEUED status based on the input Download object, ensuring retries and last_error are appropriate for new items.
      • Verify DatabaseManager.bump_retries remains the sole mechanism for incrementing retries and transitioning to ERROR status (and handles DownloadNotFoundError from get_download_by_id).
    • Phase 2: Service Layer Updates
      • Enqueuer (data_coordinator/enqueuer.py):
        • Update _process_single_download to correctly handle DownloadNotFoundError from get_download_by_id.
        • Refactor _update_download_status_in_db (or remove it) and its call sites (_update_status_to_queued_if_vod, _handle_existing_fetched_download):
          • When an UPCOMING download becomes a VOD, call download_db.mark_as_queued_from_upcoming. Adapt to its new signature (returns None, raises exceptions).
          • For other status changes previously handled by _update_download_status_in_db (e.g., an existing ERROR record being re-processed from feed and needing to be QUEUED), evaluate if download_db.requeue_download should be used or if current upsert_download logic in _handle_existing_fetched_download is sufficient.
        • Ensure bump_retries calls remain correct for metadata fetch failures.
      • Pruner (data_coordinator/pruner.py):
        • When pruning items, call download_db.archive_download. Adapt to its new signature (returns None, raises exceptions). File deletion logic is already handled by Pruner correctly before this step.
    • Phase 3: Test Updates
      • tests/anypod/db/test_db.py:
        • Remove tests for the old download_db.update_status.
        • Add/Update comprehensive unit tests for all new/modified download_db.mark_as_*, download_db.requeue_*, download_db.unskip_download, and download_db.get_download_by_id methods, including exception checking.
        • Ensure tests for upsert_download cover setting initial UPCOMING and QUEUED states.
        • Ensure tests for bump_retries are still valid and cover its role, especially DownloadNotFoundError handling.
      • tests/anypod/data_coordinator/test_enqueuer.py:
        • Update mocks and assertions for download_db.get_download_by_id to reflect new exception-raising behavior.
        • Update mocks for status update calls to the new download_db.mark_as_queued_from_upcoming or download_db.requeue_download methods. Verify correct arguments and exception handling.
        • Verify upsert_download is called with correctly statused Download objects.
  • address various TODOs throughout code base

3.5.4 Downloader Service (data_coordinator/downloader.py)

  • Constructor accepts DatabaseManager, FileManager, YtdlpWrapper.
  • download_queued(feed_id: str, feed_config: FeedConfig, limit: int = -1) -> tuple[int, int]: (success_count, failure_count)
    • Gets queued Download objects via DatabaseManager.get_downloads_by_status.
    • For each Download:
      • Call YtdlpWrapper.download_media_to_file(download, yt_cli_args).
        • Generate final file_name (e.g., using download.title and updated_metadata['ext']).
        • Call FileManager.save_download_file(feed_config.name, final_file_name, source_file_path=completed_file_path).
          • (Note: FileManager.save_download_file will need to implement moving a file from source_file_path to its final managed location.)
        • Update DB: status to 'downloaded', store final path from FileManager, update ext, filesize from updated_metadata.
      • On failure:
        • Update DB: status to 'error', log error, increment retries.
      • Ensure cleanup of source file regardless of success/failure of the individual download.
  • Unit tests for Downloader (Service) with mocked dependencies.
  • Debug mode for Enqueuer

3.5.5 Pruner (data_coordinator/pruner.py)

  • Use old implementation for reference, but prepare for largely a full rewrite
  • Constructor accepts DatabaseManager, FileManager.
  • prune_feed_downloads(feed_id: str, keep_last: int | None, prune_before_date: datetime | None) -> tuple[int, int]: (archived_count, files_deleted_count)
    • Uses DatabaseManager to get candidates
    • Uses FileManager.delete_download_file() for download.
    • Uses DatabaseManager.archive_download() to archive.
  • Unit tests for Pruner with mocked dependencies.

3.5.5.1 Database Refactoring & Feed Table

  • Split database classes: Refactor src/anypod/db/db.py into separate modules:
    • DownloadDatabase class for download-level operations (keep existing methods)
    • FeedDatabase class for feed-level operations (new functionality)
  • Feed table schema & operations:
    • Create feeds table with schema: last_sync_attempt, last_successful_sync, consecutive_failures, last_error, is_enabled, title, subtitle, description, language, author, image_url, source_type, total_downloads, downloads_since_last_rss, last_rss_generation
    • add created_at and updated_at with defaults
    • Implement feed CRUD operations in FeedDatabase:
      • Add FeedNotFoundError exception (similar to DownloadNotFoundError)
      • upsert_feed(feed: Feed) -> None - Insert or update a feed record, handling None timestamps to allow database defaults
      • get_feed_by_id(feed_id: str) -> Feed - Retrieve a specific feed by ID, raise FeedNotFoundError if not found
      • get_feeds(enabled: bool | None = None) -> list[Feed] - Get all feeds, or filter by enabled status if provided
      • mark_sync_success(feed_id: str) -> None - Set last_successful_sync to current timestamp, reset consecutive_failures to 0, clear last_error
      • mark_sync_failure(feed_id: str, error_message: str) -> None - Set last_failed_sync to current timestamp, increment consecutive_failures, set last_error
      • mark_rss_generated(feed_id: str, new_downloads_count: int) -> None - Set last_rss_generation to current timestamp, increment total_downloads by new_downloads_count, set downloads_since_last_rss to new_downloads_count
      • set_feed_enabled(feed_id: str, enabled: bool) -> None - Set is_enabled to the provided value
      • update_feed_metadata(feed_id: str, *, title: str | None = None, subtitle: str | None = None, description: str | None = None, language: str | None = None, author: str | None = None, image_url: str | None = None) -> None - Update feed metadata fields; only updates provided (non-None) fields; no-op if all None
  • Download table enhancements:
    • Add fields: quality_info
    • add fields: discovered_at and updated_at, potentially downloaded_at with sqlite triggers
    • Update DownloadDatabase methods to handle new fields
    • Update all places that create/modify downloads to populate new fields (Enqueuer, Downloader, etc.)
  • Config and model updates:
    • Rename FeedMetadata to FeedMetadataOverrides in feed_config.py
    • Add enabled field to FeedConfig
  • Feed metadata synchronization:
    • Compare FeedMetadataOverrides from config with stored feed metadata in DB
    • Update DB when config overrides change
    • Modify YtdlpWrapper to make best-effort extraction of non-overridden FeedMetadataOverrides fields
    • Mark fields for best-effort extraction when overrides are removed
  • Unit tests for both DownloadDatabase and FeedDatabase
  • on pruning, also update total_downloads value
  • ensure there aren't any read/modify/write loops that arent protected by a transaction

3.5.6 DataCoordinator Orchestrator (data_coordinator/coordinator.py)

  • Create data_coordinator/types/ folder with __init__.py and processing_results.py
  • Create ProcessingResults dataclass with counts, error tracking, status, and timing
  • Add archive_feed() method to Pruner class (sets is_enabled=False)
  • Constructor accepts Enqueuer, Downloader, Pruner, RSSFeedGenerator, FeedDatabase
  • process_feed(feed_id: str, feed_config: FeedConfig) -> ProcessingResults:
    • Calculate fetch_since_date from feed.last_successful_sync (NOT feed_config.since)
    • Execute phases in sequence: enqueue → download → prune → RSS generation
    • Inline error handling with graceful degradation between phases
    • Update last_successful_sync or last_failed_sync based on outcome
    • Return comprehensive ProcessingResults with all counts and errors
  • Update data_coordinator/__init__.py to export DataCoordinator
  • Integration tests for DataCoordinator focusing on full process_feed flow

3.5.7 Discrepancy Detection (in Pruner or new service)

  • Implement discrepancy detection logic:
    • Find DB entries with DOWNLOADED status but no corresponding download file.
    • Find download files on disk with no corresponding DOWNLOADED DB entry.
    • (Optional) Automated resolution strategies or reporting for discrepancies.
  • Unit tests for discrepancy detection logic.

4 Feed Generation

  • Determine if a read/write lock for in-memory feed XML cache is needed for concurrency
  • add new fields to Download
    • this will also involve potentially changing how i update values, since some (like title) might get changed down the line. so we should try to store the most recent value
  • Implement generate_feed_xml(feed_id) to write to in-memory XML after acquiring write lock
  • Implement get_feed_xml(feed_id) for HTTP handlers to read from in-memory XML after acquiring read lock
  • Write unit tests to verify enclosure URLs and MIME types in generated feeds
  • Figure out how to bring in host url.
  • duration should be an int

4.1 Path Management Centralization

  • PathManager Implementation – Create centralized path/URL coordination class:
    • Single source of truth for file system paths and URLs based on feed_id + download_id
    • Consistent 1:1 mapping between network paths and file paths
    • Methods for feed directories, RSS URLs, and media file paths/URLs
    • Google-style docstrings with proper Args/Returns/Raises sections
  • FileManager refactor
  • Pruner refactor
  • RSSFeedGenerator refactor
  • tests refactor

5 Scheduler

5.1 Create Scheduler Module (src/anypod/schedule/)

  • Core scheduler implementation:
    • Add apscheduler to dependencies in pyproject.toml
    • Create type-safe APScheduler wrapper (apscheduler_core.py)
    • Use APScheduler with AsyncIOScheduler for async support
    • Schedule jobs based on feed cron expressions from config
    • Manage job lifecycle (add/remove/pause/resume)
    • Handle graceful shutdown with proper job cleanup
    • Each feed gets its own job with unique ID (the feed ID)
    • Job-level error handling with proper exception chaining
    • Direct DataCoordinator integration (no separate worker)
    • Context ID injection for log correlation
    • Invalid cron expression validation with SchedulerError
    • remove explicit references to monkeypatch

5.1.1 yt-dlp Day-Level Date Precision Accommodation

  • Date Window Calculation Logic (DataCoordinator):
    • Replace _calculate_fetch_until_date with day-aligned logic
    • fetch_since_date should still be last_successful_sync
    • fetch_until_date should just be now(); let's simplify this logic, no 2 * cron tick or anything. we can remove that from coordinator.py and debug_enqueuer.py
    • that may mean that most of the time, these values will fall on the same day. that's fine, and we will dedup results later
    • Update last_successful_sync to fetch_until_date to ensure full coverage (was previously now())
    • Enhanced logging: log both high-resolution calculated window and day-aligned yt-dlp window while in the context of ytdlp_wrapper
  • Deduplication Enforcement:
    • Verify Enqueuer properly handles duplicate video IDs across multiple day fetches
    • if whatever is in the db is identical to what we retrieved, don't update (which will trigger updated_at)
      • it is possible that some metadata might have updated (e.g. uploader might have changed description); so check for that and update if needed
    • Add deduplication tests: same video appearing in multiple day windows
    • Verify no updates occurred (updated_at is unchanged)
    • Verify deduplication works when same video appears in multiple day windows
  • Documentation Updates:
    • Update method docstrings: document day-aligned window strategy clearly
    • Update DESIGN_DOC.md: add section explaining yt-dlp day-level precision limitation
  • State Reconciler Alignment:
    • Update since parameter handling: should only be a date, not a datetime
    • When since changes, use day-aligned logic for requeuing archived downloads
    • Ensure consistency between enqueue windows and retention policy windows
  • Tie Feed table total_downloads to the Download table with triggers
  • When downloading an individual file but it is out of range, I get an incomplete response back from yt-dlp, which causes internal errors

5.2 Init State Reconciliation

5.2.1 Create State Reconciler Module (src/anypod/state_reconciler.py)

  • Startup reconciliation implementation:
    • Compare config feeds with database feeds
    • Handle new feeds: insert into DB and set initial last_successful_sync
    • Handle removed feeds: mark as disabled in DB (set is_enabled=False)
    • Handle changed feeds: update metadata and configuration
    • Ensure every active feed has valid last_successful_sync before scheduling
    • Evaluate what would happen if it fails midway through. Would simply restarting get back to correct state?
    • time box the sync time -- currently only has start time, but will also need end time

5.2.2 Config Change Handling

  • Detect and apply changes to:
    • enabled: Update feed's is_enabled in database, add/remove from scheduler, trigger initial sync if false->true
      • last_successful_sync does not need to be optional as it is set proactively on new feed creation
    • url: Update existing feed's source_url, reset consecutive_failures to 0, clear last_error, reset last_successful_sync as if it were a new feed; keep download history
    • since expansion (earlier date): Query archived downloads with published >= new since, change status from ARCHIVED to QUEUED (will redownload)
      • modify get_downloads_by_status to allow for filtering by date so we don't retrieve the entire db
      • also consider storing these values in the Feed db (since and keep_last) so we only query the db if there's a change
      • modify requeue_download -> requeue_downloads that can take a variadic list and batch modify
      • modify pydantic handling of since to accept JUST a day, and then derive TZ from tiered sources:
        1. from the since value itself, if included
        2. from a TZ env var
        3. from the system clock (user would have had to override /etc/localtime)
    • since contraction (later date): Mark downloads with published < new since for archival on next prune cycle
    • keep_last increase: Query archived downloads ordered by published DESC, restore up to (new_keep_last - current_count) from ARCHIVED to QUEUED (will redownload)
      • modify count_downloads_by_status to accept multiple possible statuses and return all of them
    • keep_last decrease: No immediate action - will apply naturally on next prune cycle
    • metadata changes: Update feed table immediately (title, subtitle, description, language, author, image_url, categories, explicit), trigger RSS regeneration

5.3 Dependencies and Testing

  • Unit tests for scheduler with mocked jobs
  • Unit tests for state reconciler covering:
    • New feed addition
    • Feed removal
    • Feed configuration changes
    • Metadata override changes
  • Integration tests for full startup sequence
  • Tests for graceful shutdown handling

5.4 Update CLI Default Mode (src/anypod/cli/default.py)

  • Main service orchestration:
    • Initialize all components (databases, services)
    • Run state reconciler on startup
    • Start scheduler with reconciled feeds
    • Perform initial sync for all feeds to populate RSS
    • Keep service running until shutdown signal
    • change path_manager to automatically assume tmp and media dirs -- should only need base dir
      • also, we should divide into data dir and config files (cookies.txt and config_path), so we can separate those out to be docker-friendly
      • also, it seems a little excessive to add PathManager to Enqueuer and Downloader JUST so they can retrieve the cookie
      • especially because theyve duplicated logic on retrieving the cookie -- this needs to be centralized somewhere else
      • ytdlp impl looks fine tho
    • optimize discover/metadata/download loop to cut down on calls to yt-dlp
      • it looks like we are able to retrieve full video detail when querying a playlist without --flat-playlist option
      • jury's still out on channels, but maybe?
      • maybe we can pre-emptively classify these when they are added, store the type in the db, and pick the optimal way to retrieve based on that classification
      • Future Optimization: Could fetch detailed metadata in one call (86 fields vs 21) but 10x slower - out of scope
      • not sure we need ReferenceType anymore. SourceType might be good enough
      • i think we can get rid of DISCOVERY type
      • get rid of set_source_specific_ytdlp_options
    • Cut down on excessive logs
    • use the shared conftest for more fixtures
    • make the db a folder instead of a file -- it creates .db-wal type in the same folder.

5.4.1 Convert to async model for ytdlp

Context/Goals: Convert anypod from sync to async to enable cancellable long-running operations (especially yt-dlp calls). Currently yt-dlp operations block and can't be interrupted. The async conversion will wrap yt-dlp in subprocess calls that can be properly cancelled, and ripple async throughout the codebase. Key insight: keep CLI args as list[str] instead of converting to dict, eliminating complex dict→CLI conversion.

Implementation Tasks:

  • CLI Args Strategy: Remove dict conversion in feed_config.py - keep yt_args as list[str] throughout pipeline
  • YtdlpCore Async: Implement subprocess calls with --dump-single-json --flat-playlist for metadata, parse JSON to YtdlpInfo
  • Cancellation: Proper subprocess cleanup (proc.kill() + await proc.wait() on CancelledError)
  • Isolate yt-dlp cli args into YtdlpCore
  • Consistent naming on the ydl vs ytdlp fns
    • also separate out the classes into different files
  • Remove unused YtdlpCore methods: parse_options(), set_date_range(), set_playlist_limit(), set_cookies()
  • Conversion Order: YtdlpCore → YtdlpWrapper → Enqueuer/Downloader/Pruner → DataCoordinator → StateReconciler
  • Consider if RSSFeedGenerator needs async updates (probably minimal since it's mostly CPU-bound)
    • the answer is no, for now at least
  • Implement graceful shutdown handling - it hard crashes on ctrl-c right now
    • this includes during init when we're not in APScheduler yet (maybe we should be?)

5.4.2 Use SQLAlchemy AsyncIO

  • Phase 1: Environment Setup & Dependencies
    • Add sqlalchemy[asyncio], sqlmodel, aiosqlite, and alembic to pyproject.toml using uv add.
  • Phase 2: Refactor Models to SQLModel
    • Convert data models in src/anypod/db/types/ (download.py, feed.py) to inherit from SQLModel, marking them with table=True.
    • Use sqlmodel.Field to define primary keys, indexes, and other constraints.
    • Define the one-to-many relationship between Feed and Download using sqlmodel.Relationship.
    • Integrate enum types into SQLModels:
      • For Feed.source_type, declare sa_column=Column(Enum(SourceType)).
      • For Download.status, declare sa_column=Column(Enum(DownloadStatus)).
      • Remove legacy register_adapter calls.
  • Phase 3: Implement the Asynchronous Core
    • Create src/anypod/db/sqlalchemy_core.py to centralize database connectivity.
    • Implement create_async_engine using the sqlite+aiosqlite dialect and QueuePool (default).
    • Create an async_session_maker for producing AsyncSession instances.
    • Implement a session() async generator for dependency injection.
  • Phase 4: Refactor Data Access Logic
    • Convert all methods in src/anypod/db/feed_db.py and src/anypod/db/download_db.py to async def.
    • Refactor methods to accept an AsyncSession parameter instead of using a shared instance.
    • Replace sqlite-utils calls (upsert, rows_where, get) with SQLAlchemy ORM operations (session.add, session.execute(select(...))).
    • Propagate async keyword up the call stack through the data_coordinator and schedule modules.
  • Phase 5: Database Migrations with Alembic
    • Initialize Alembic with alembic init -t async migrations.
    • Configure alembic.ini with the sqlalchemy.url for the async driver.
    • Configure migrations/env.py to use SQLModel.metadata as the target_metadata.
    • Generate an initial migration script: alembic revision --autogenerate -m "Initial schema from SQLModels".
    • Replace database triggers (create_trigger) with Alembic-managed versions.
    • Review and apply the initial migration: alembic upgrade head.

5.4.3 Use aiofiles for file operations

  • Add aiofiles dependency and convert FileManager to use async file operations
    • Dependencies: Add aiofiles for async file operations
    • File Operations: Use aiofiles.os to replace path operations

5.5 Initial Sync Strategy

  • After reconciliation, trigger immediate sync:
    • Process all enabled feeds to populate RSS
    • Ensure RSS feeds available before HTTP server starts
    • Handle failures gracefully without blocking startup, unless config is wrong -- that should cause failure until fixed

6 HTTP Server

  • how do i break out the api and static serving? different ports? for security reasons, we need to expose static but not apis
    • Went with different ports

6.1 Project Structure Setup

  • Create new HTTP server module at src/anypod/server/
    • __init__.py - Server module exports
    • app.py - FastAPI application factory
    • dependencies.py - Dependency injection setup
    • models/ - Pydantic request/response models
    • routers/ - API route handlers organized by domain

6.1.1 Image Hosting

  • first check on getting the full filepath in there
    • https://i.ytimg.com/pl_c/PL8mG-RkN2uTw7PhlnAr4pZZz2QubIbujH/studio_square_thumbnail.jpg isn't enough, you need the extra ?sqp=CNnJ9cQG-oaymwEICOADEOADSFqi85f_AwYIwe77sQY%3D&rs=AOn4CLB5y7iZmQcD8vHcdJ4WtzLCK_wOuQ on the end
  • download and host images locally

6.2 FastAPI Application Setup

  • Add fastapi, uvicorn to dependencies in pyproject.toml
  • Create FastAPI app factory with proper dependency injection
  • Set up CORS, logging, and error handling middleware
    • Ensure logging also includes contextvar
  • Configure OpenAPI documentation with proper metadata

6.3 API Models (Pydantic)

  • FeedResponse - Feed data for API responses
  • FeedCreateRequest/FeedUpdateRequest - Feed modification requests
  • DownloadResponse - Download data for API responses
  • PaginatedResponse[T] - Generic paginated response wrapper
  • StatsResponse - System and feed statistics
  • ErrorResponse - Standardized error responses

6.4 Router Implementation

  • feeds.py - All feed management endpoints
    • GET /api/feeds - List all feeds with pagination, filtering, and sorting
    • POST /api/feeds - Create new feed, will write to config file
    • GET /api/feeds/{feed_id} - Get detailed feed information
    • PUT /api/feeds/{feed_id} - Update feed configuration by modifying config file
    • DELETE /api/feeds/{feed_id} - Disables feed and archives all downloads
    • POST /api/feeds/{feed_id}/enable - Enable feed processing
    • POST /api/feeds/{feed_id}/disable - Disable feed processing
    • POST /api/feeds/{feed_id}/sync - Trigger manual sync/processing
    • GET /api/feeds/valid - Validate feed config before writing to config file
  • downloads.py - Download management endpoints
    • GET /api/feeds/{feed_id}/downloads - List downloads for feed (paginated, filtered)
    • GET /api/feeds/{feed_id}/downloads/{download_id} - Get specific download details
    • POST /api/feeds/{feed_id}/downloads/{download_id}/retry - Retry failed download
    • POST /api/feeds/{feed_id}/downloads/{download_id}/skip - Mark download as skipped
    • POST /api/feeds/{feed_id}/downloads/{download_id}/unskip - Remove skip status
    • DELETE /api/feeds/{feed_id}/downloads/{download_id} - Archive download and delete file
  • stats.py - Statistics and monitoring endpoints
    • GET /api/feeds/{feed_id}/stats - Detailed feed statistics
    • GET /api/stats/summary - System-wide statistics summary including storage
  • health.py - Health check endpoints
    • GET /api/health - Application health check
  • static.py - Content delivery endpoints
    • GET /feeds - List all rss feeds in directory
    • GET /feeds/{feed_id}.xml - RSS feed XML
    • GET /media - List all feeds in directory
    • GET /media/{feed_id} - List all files for a feed in directory
    • GET /media/{feed_id}/{filename}.{ext} - Media file download
    • GET /thumbnails - List all feeds in directory
    • GET /thumbnails/{feed_id} - List all thumbnails for a feed in directory
    • GET /thumbnails/{feed_id}/{filename}.{ext} - Thumbnail images
  • Unit tests with TestClient for all API endpoints
  • Integration tests with actual database operations
  • maybe enforce feed id must align with same regex as is in validation.py

6.5 Integration with Existing Components

  • Create service layer to bridge HTTP API with existing DataCoordinator
  • Extend FeedDatabase/DownloadDatabase with new query methods for API needs
  • Add config file read/write utilities for feed CRUD operations
  • Implement proper error mapping from domain exceptions to HTTP responses

6.6 Key Features Implementation

  • Pagination: Implement cursor-based or offset-based pagination
  • Filtering: Add query parameters for status, date ranges, search
  • Sorting: Support multiple sort fields and directions
  • Validation: Comprehensive request validation using Pydantic
  • Error Handling: Consistent error responses with proper HTTP status codes

6.7 CLI Integration

  • Configure server host/port via environment variables
  • Ensure proper graceful shutdown handling
  • Entry in default.py to start uvicorn

6.8 Documentation

  • Comprehensive OpenAPI documentation
    • Example requests/responses for all endpoints

7 CLI & Flags

  • python -m anypod parses flags: --ignore-startup-errors, --retry-failed, --log-level.
  • Docstrings and argparse help messages.
  • Evaluate logged statements and make sure that only relevant things get logged
  • when using --retry-failed, should also include a date so that we disregard VERY old failures
    • errors will be common because live videos may be deleted and reuploaded as regular VODs
  • write README
    • outline limitations with using ytdlp flags -- which ones do you have to avoid using?
    • look up some well established open source projects and follow their documentation style

8 Docker & Dev Flow

  • Dockerfile (debian:bookworm-slim, default root, overridable UID/GID).
  • .dockerignore to exclude tests, .git, caches.
  • set up a dev env with containers.

9 Release Automation

  • GH Action release-yt-dlp.yaml: on yt-dlp tag → rebuild, test, draft release.
  • GH Action deps-bump.yaml: weekly minor‑bump PR; require manual approval for major
    • done with dependabot instead

10 Extraneous

  • make override enum settings caps agnostic (e.g. requires EPISODIC or SERIAL right now)
  • create an http endpoint to reset error status videos
  • create a top-level http endpoint to reset ERROR status videos across all feeds
  • is it possible to introduce tier-based filtering for patreon posts?
  • Support an "after-download metadata refresh" workflow: allow a download to be flagged for delayed re-processing so we can re-sync title/description/duration later without requeueing the media. Provide CLI/API controls plus scheduler hooks to re-pull metadata (e.g., for YouTube description edits) while keeping the original enclosure stable.

11 manual feeds

  • delete a download from a feed?
  • optional (additional?) yt args override in req
  • endpoint for manually triggering a feed

When all boxes are checked, you'll be able to run:

docker run \
  -v $(pwd)/config:/config \
  -v $(pwd)/data:/data \
  -p 8000:8000 \
  ghcr.io/thurstonsand/anypod:dev

…and subscribe to http://localhost:8000/feeds/this_american_life.xml in your podcast player.