Anypod – Task List

1 Repo Bootstrap

Init git repo – git init --initial-branch=main && gh repo create.
pyproject.toml – minimal project metadata, uv backend, Python ≥ 3.14.
uv pip install --groups dev – add dev deps: ruff, pytest, pytest-asyncio, mypy, pre-commit.
Pre-commit hooks – formatters & linter.
CI – GitHub Actions workflow running pytest on every PR.

2 Configuration Loader

Pydantic V2 models reflecting YAML keys.
load_config(path) -> dict[str, FeedConfig] (implemented via Pydantic Settings).
Unit tests using fixture YAML.

3 Database Layer

3.2 File Manager Layer

Abstraction seam: encapsulate base directory so future S3/GCS back‑ends can subclass
Implement save_download_file(feed, file_name, data_stream) -> Path (atomic write)
Implement delete_download_file(feed, file_name) -> bool
Implement download_exists(feed, file_name) -> bool
Implement get_download_stream(feed, file_name) -> IO[bytes]
Ensure directory hygiene: base download and feed directories exist (covered by save_download_file)
Write unit tests (tmp dir fixtures)

3.5 Data Orchestration & Services Layer

This section details the components that manage the lifecycle of downloads, from discovery to storage and pruning. They are organized into a data_coordinator module and related services/wrappers.

3.5.1 `db.py::Download` model updates

from_row(cls, db_row: sqlite3.Row) -> Download class method for mapping.

3.5.2 `YtdlpWrapper` (`ytdlp_wrapper.py`)

Create ytdlp_wrapper.py and class YtdlpWrapper.
YtdlpWrapper.fetch_metadata(feed_id: str, url: str, yt_cli_args: list[str]) -> list[Download]: Fetches metadata for all downloads at the given URL using yt-dlp's metadata extraction capabilities.
YtdlpWrapper.download_media_to_file(download: Download, yt_cli_args: list[str], download_target_dir: Path) -> Path:
- Purpose: Downloads media (video or audio) for the given entry to a specified directory, handling potential merges (e.g., video + audio) via yt-dlp and FFmpeg.
- Arguments:
  - download: Metadata of the entry to download, used for naming and context.
  - yt_cli_args: List of command-line arguments for yt-dlp (e.g., format selection from feed config).
  - download_target_dir: The base directory where the feed-specific subfolder and media file will be created.
- Returns: Path to the successfully downloaded media file.
Sandbox test using a Creative-Commons short video, covered by integration tests using real URLs.
Unit tests for YtdlpWrapper.

3.5.2.1 Logger Side Quest

implement a global logging framework

3.5.3 `Enqueuer` (`data_coordinator/enqueuer.py`)

Constructor accepts DatabaseManager, YtdlpWrapper.
enqueue_new_downloads(feed_config: FeedConfig) -> int:
- Phase 1: Re-fetch metadata for existing DB entries with status 'upcoming'; update those now VOD to 'queued'.
- Phase 2: Fetch metadata for latest N videos via YtdlpWrapper.fetch_metadata().
  - For each download not in DB:
    - If VOD (live_status=='not_live' and is_live==False), insert with status 'queued'.
    - If live or scheduled (live_status=='upcoming' or is_live==True), insert with status 'upcoming'.
  - For each existing 'upcoming' entry now VOD, update status to 'queued'.
- Returns count of newly enqueued or transitioned-to-queued downloads.
preprocess cli args so you dont do it every time
handle cookies differently? they need to be included even at discovery stage
- just include all args for all stages, including discovery; user will have limitations that need to be outlined in docs
db should not leak any details about sqlite -- should abstract all that away
- for example, remove all references to sqlite.Row, sqlite.Error
retries should apply more widely, and with enough failures, should transition to error state
- maybe db.py needs a bump_error_count fn that handles this - bumps it until it becomes too high, then marks as error
Unit tests for Enqueuer with mocked dependencies.
a couple integration tests

3.5.3.1 TODO Side Quest

3.5.4 `Downloader` Service (`data_coordinator/downloader.py`)

Constructor accepts DatabaseManager, FileManager, YtdlpWrapper.
download_queued(feed_id: str, feed_config: FeedConfig, limit: int = -1) -> tuple[int, int]: (success_count, failure_count)
- Gets queued Download objects via DatabaseManager.get_downloads_by_status.
- For each Download:
  - Call YtdlpWrapper.download_media_to_file(download, yt_cli_args).
    - Generate final file_name (e.g., using download.title and updated_metadata['ext']).
    - Call FileManager.save_download_file(feed_config.name, final_file_name, source_file_path=completed_file_path).
      - (Note: FileManager.save_download_file will need to implement moving a file from source_file_path to its final managed location.)
    - Update DB: status to 'downloaded', store final path from FileManager, update ext, filesize from updated_metadata.
  - On failure:
    - Update DB: status to 'error', log error, increment retries.
  - Ensure cleanup of source file regardless of success/failure of the individual download.
Unit tests for Downloader (Service) with mocked dependencies.
Debug mode for Enqueuer

3.5.5 `Pruner` (`data_coordinator/pruner.py`)

Use old implementation for reference, but prepare for largely a full rewrite
Constructor accepts DatabaseManager, FileManager.
prune_feed_downloads(feed_id: str, keep_last: int | None, prune_before_date: datetime | None) -> tuple[int, int]: (archived_count, files_deleted_count)
- Uses DatabaseManager to get candidates
- Uses FileManager.delete_download_file() for download.
- Uses DatabaseManager.archive_download() to archive.
Unit tests for Pruner with mocked dependencies.

3.5.5.1 Database Refactoring & Feed Table

3.5.6 `DataCoordinator` Orchestrator (`data_coordinator/coordinator.py`)

Create data_coordinator/types/ folder with __init__.py and processing_results.py
Create ProcessingResults dataclass with counts, error tracking, status, and timing
Add archive_feed() method to Pruner class (sets is_enabled=False)
Constructor accepts Enqueuer, Downloader, Pruner, RSSFeedGenerator, FeedDatabase
process_feed(feed_id: str, feed_config: FeedConfig) -> ProcessingResults:
- Calculate fetch_since_date from feed.last_successful_sync (NOT feed_config.since)
- Execute phases in sequence: enqueue → download → prune → RSS generation
- Inline error handling with graceful degradation between phases
- Update last_successful_sync or last_failed_sync based on outcome
- Return comprehensive ProcessingResults with all counts and errors
Update data_coordinator/__init__.py to export DataCoordinator
Integration tests for DataCoordinator focusing on full process_feed flow

3.5.7 Discrepancy Detection (in `Pruner` or new service)

Implement discrepancy detection logic:
- Find DB entries with DOWNLOADED status but no corresponding download file.
- Find download files on disk with no corresponding DOWNLOADED DB entry.
- (Optional) Automated resolution strategies or reporting for discrepancies.
Unit tests for discrepancy detection logic.

4 Feed Generation

Determine if a read/write lock for in-memory feed XML cache is needed for concurrency
add new fields to Download
- this will also involve potentially changing how i update values, since some (like title) might get changed down the line. so we should try to store the most recent value
Implement generate_feed_xml(feed_id) to write to in-memory XML after acquiring write lock
Implement get_feed_xml(feed_id) for HTTP handlers to read from in-memory XML after acquiring read lock
Write unit tests to verify enclosure URLs and MIME types in generated feeds
Figure out how to bring in host url.
duration should be an int

4.1 Path Management Centralization

PathManager Implementation – Create centralized path/URL coordination class:
- Single source of truth for file system paths and URLs based on feed_id + download_id
- Consistent 1:1 mapping between network paths and file paths
- Methods for feed directories, RSS URLs, and media file paths/URLs
- Google-style docstrings with proper Args/Returns/Raises sections
FileManager refactor
Pruner refactor
RSSFeedGenerator refactor
tests refactor

5 Scheduler

5.1 Create Scheduler Module (`src/anypod/schedule/`)

5.1.1 yt-dlp Day-Level Date Precision Accommodation

5.2 Init State Reconciliation

5.2.1 Create State Reconciler Module (`src/anypod/state_reconciler.py`)

Startup reconciliation implementation:
- Compare config feeds with database feeds
- Handle new feeds: insert into DB and set initial last_successful_sync
- Handle removed feeds: mark as disabled in DB (set is_enabled=False)
- Handle changed feeds: update metadata and configuration
- Ensure every active feed has valid last_successful_sync before scheduling
- Evaluate what would happen if it fails midway through. Would simply restarting get back to correct state?
- time box the sync time -- currently only has start time, but will also need end time

5.2.2 Config Change Handling

5.3 Dependencies and Testing

Unit tests for scheduler with mocked jobs
Unit tests for state reconciler covering:
- New feed addition
- Feed removal
- Feed configuration changes
- Metadata override changes
Integration tests for full startup sequence
Tests for graceful shutdown handling

5.4 Update CLI Default Mode (`src/anypod/cli/default.py`)

5.4.1 Convert to async model for ytdlp

Context/Goals: Convert anypod from sync to async to enable cancellable long-running operations (especially yt-dlp calls). Currently yt-dlp operations block and can't be interrupted. The async conversion will wrap yt-dlp in subprocess calls that can be properly cancelled, and ripple async throughout the codebase. Key insight: keep CLI args as list[str] instead of converting to dict, eliminating complex dict→CLI conversion.

Implementation Tasks:

CLI Args Strategy: Remove dict conversion in feed_config.py - keep yt_args as list[str] throughout pipeline
YtdlpCore Async: Implement subprocess calls with --dump-single-json --flat-playlist for metadata, parse JSON to YtdlpInfo
Cancellation: Proper subprocess cleanup (proc.kill() + await proc.wait() on CancelledError)
Isolate yt-dlp cli args into YtdlpCore
Consistent naming on the ydl vs ytdlp fns
- also separate out the classes into different files
Remove unused YtdlpCore methods: parse_options(), set_date_range(), set_playlist_limit(), set_cookies()
Conversion Order: YtdlpCore → YtdlpWrapper → Enqueuer/Downloader/Pruner → DataCoordinator → StateReconciler
Consider if RSSFeedGenerator needs async updates (probably minimal since it's mostly CPU-bound)
- the answer is no, for now at least
Implement graceful shutdown handling - it hard crashes on ctrl-c right now
- this includes during init when we're not in APScheduler yet (maybe we should be?)

5.4.2 Use SQLAlchemy AsyncIO

5.4.3 Use aiofiles for file operations

Add aiofiles dependency and convert FileManager to use async file operations
- Dependencies: Add aiofiles for async file operations
- File Operations: Use aiofiles.os to replace path operations

5.5 Initial Sync Strategy

After reconciliation, trigger immediate sync:
- Process all enabled feeds to populate RSS
- Ensure RSS feeds available before HTTP server starts
- Handle failures gracefully without blocking startup, unless config is wrong -- that should cause failure until fixed

6 HTTP Server

how do i break out the api and static serving? different ports? for security reasons, we need to expose static but not apis
- Went with different ports

6.1 Project Structure Setup

Create new HTTP server module at src/anypod/server/
- __init__.py - Server module exports
- app.py - FastAPI application factory
- dependencies.py - Dependency injection setup
- models/ - Pydantic request/response models
- routers/ - API route handlers organized by domain

6.1.1 Image Hosting

first check on getting the full filepath in there
- https://i.ytimg.com/pl_c/PL8mG-RkN2uTw7PhlnAr4pZZz2QubIbujH/studio_square_thumbnail.jpg isn't enough, you need the extra ?sqp=CNnJ9cQG-oaymwEICOADEOADSFqi85f_AwYIwe77sQY%3D&rs=AOn4CLB5y7iZmQcD8vHcdJ4WtzLCK_wOuQ on the end
download and host images locally

6.2 FastAPI Application Setup

Add fastapi, uvicorn to dependencies in pyproject.toml
Create FastAPI app factory with proper dependency injection
Set up CORS, logging, and error handling middleware
- Ensure logging also includes contextvar
Configure OpenAPI documentation with proper metadata

6.3 API Models (Pydantic)

FeedResponse - Feed data for API responses
FeedCreateRequest/FeedUpdateRequest - Feed modification requests
DownloadResponse - Download data for API responses
PaginatedResponse[T] - Generic paginated response wrapper
StatsResponse - System and feed statistics
ErrorResponse - Standardized error responses

6.4 Router Implementation

6.5 Integration with Existing Components

Create service layer to bridge HTTP API with existing DataCoordinator
Extend FeedDatabase/DownloadDatabase with new query methods for API needs
Add config file read/write utilities for feed CRUD operations
Implement proper error mapping from domain exceptions to HTTP responses

6.6 Key Features Implementation

Pagination: Implement cursor-based or offset-based pagination
Filtering: Add query parameters for status, date ranges, search
Sorting: Support multiple sort fields and directions
Validation: Comprehensive request validation using Pydantic
Error Handling: Consistent error responses with proper HTTP status codes

6.7 CLI Integration

Configure server host/port via environment variables
Ensure proper graceful shutdown handling
Entry in default.py to start uvicorn

6.8 Documentation

Comprehensive OpenAPI documentation
- Example requests/responses for all endpoints

7 CLI & Flags

python -m anypod parses flags: --ignore-startup-errors, --retry-failed, --log-level.
Docstrings and argparse help messages.
Evaluate logged statements and make sure that only relevant things get logged
when using --retry-failed, should also include a date so that we disregard VERY old failures
- errors will be common because live videos may be deleted and reuploaded as regular VODs
write README
- outline limitations with using ytdlp flags -- which ones do you have to avoid using?
- look up some well established open source projects and follow their documentation style

8 Docker & Dev Flow

Dockerfile (debian:bookworm-slim, default root, overridable UID/GID).
.dockerignore to exclude tests, .git, caches.
set up a dev env with containers.

9 Release Automation

GH Action release-yt-dlp.yaml: on yt-dlp tag → rebuild, test, draft release.
GH Action deps-bump.yaml: weekly minor‑bump PR; require manual approval for major
- done with dependabot instead

10 Extraneous

make override enum settings caps agnostic (e.g. requires EPISODIC or SERIAL right now)
create an http endpoint to reset error status videos
create a top-level http endpoint to reset ERROR status videos across all feeds
is it possible to introduce tier-based filtering for patreon posts?
Support an "after-download metadata refresh" workflow: allow a download to be flagged for delayed re-processing so we can re-sync title/description/duration later without requeueing the media. Provide CLI/API controls plus scheduler hooks to re-pull metadata (e.g., for YouTube description edits) while keeping the original enclosure stable.

11 manual feeds

delete a download from a feed?
optional (additional?) yt args override in req
endpoint for manually triggering a feed

When all boxes are checked, you'll be able to run:

docker run \
  -v $(pwd)/config:/config \
  -v $(pwd)/data:/data \
  -p 8000:8000 \
  ghcr.io/thurstonsand/anypod:dev

…and subscribe to http://localhost:8000/feeds/this_american_life.xml in your podcast player.

FilesExpand file tree

TASK_LIST.md

Latest commit

History