Kemono posts downloader

A Node.js application for downloading posts and images from kemono.cr profiles with concurrent downloads, retry logic, and comprehensive error handling.

Features

Per-Profile Download State: Docker-optimized state management stored in download folders (v1.7.0)
Bulk Profile Processing: Download from multiple profiles using a simple text file
Concurrent Downloads: Configurable concurrent image downloads for faster processing
Smart Resume: Automatically detects and skips already downloaded content and completed profiles
Thumbnail Upgrade System: Automatically detects and upgrades small files (<500KB) to full resolution
Thumbnail Fallback: Downloads full resolution first, falls back to thumbnail on 404 errors
Browser Automation: Integrated Puppeteer with stealth mode for anti-bot bypass
Mega.nz Download Support: Automatically detects and downloads files/folders from mega.nz links with speed/ETA tracking
Google Drive Download Support: Automatically detects and downloads public files from Google Drive links
Dropbox Download Support: Automatically detects and downloads public files from Dropbox share links
Anti-Bot Detection: Proper HTTP headers (Referer, Origin, Sec-Fetch-*) to bypass protection
Retry Logic: Automatic retry with exponential backoff (5s → 10s → 20s) for failed downloads
Multiple Data Sources: Uses API endpoints with comprehensive HTML fallback for maximum compatibility
Robust Error Handling: Comprehensive error handling with detailed logging
Configurable Settings: Extensive configuration options via config.json
Progress Tracking: Real-time progress bars and detailed statistics

Installation

Clone the repository:

git clone https://github.com/servika/kemono-downloader.git
cd kemono-downloader

Install dependencies:

npm install

Copy example configuration files:

cp config.example.json config.json
cp profiles.example.txt profiles.txt

Edit profiles.txt with the profiles you want to download

Quick Start

Add kemono.cr profile URLs to profiles.txt, one per line:

https://kemono.cr/patreon/user/1
https://kemono.cr/patreon/user/2
https://kemono.cr/fanbox/user/3

Run the downloader:

npm start

Or specify a custom profiles file:

node index.js my-profiles.txt

Configuration

The application creates a config.json file on first run with the following default settings:

{
  "download": {
    "maxConcurrentImages": 3,
    "maxConcurrentPosts": 1,
    "delayBetweenImages": 200,
    "delayBetweenPosts": 500,
    "delayBetweenAPIRequests": 500,
    "delayBetweenPages": 1000,
    "retryAttempts": 3,
    "retryDelay": 1000
  },
  "api": {
    "timeout": 45000,
    "userAgent": "Mozilla/5.0 (compatible; kemono-downloader)"
  },
  "storage": {
    "baseDirectory": "download",
    "createSubfolders": true,
    "sanitizeFilenames": true,
    "preserveOriginalNames": true
  },
  "logging": {
    "verboseProgress": true,
    "showSkippedFiles": true,
    "showDetailedErrors": true
  }
}

Configuration Options

Download Settings

maxConcurrentImages: Number of images to download simultaneously (1-20)
maxConcurrentPosts: Number of posts to process simultaneously (recommended: 1)
delayBetweenImages: Milliseconds to wait between image downloads
delayBetweenPosts: Milliseconds to wait between post processing
delayBetweenAPIRequests: Milliseconds to wait between API calls
delayBetweenPages: Milliseconds to wait between page requests
retryAttempts: Number of retry attempts for failed downloads
retryDelay: Milliseconds to wait before retrying failed downloads

API Settings

timeout: Request timeout in milliseconds
userAgent: User agent string for HTTP requests

Storage Settings

baseDirectory: Base directory for downloads
createSubfolders: Create subfolders for each user
sanitizeFilenames: Remove invalid characters from filenames
preserveOriginalNames: Keep original image filenames when possible

Logging Settings

verboseProgress: Show detailed progress information
showSkippedFiles: Display messages for skipped files
showDetailedErrors: Show detailed error messages

Directory Structure

Downloaded content is organized as follows:

download/
├── username1/
│   ├── .download-state.json     # Per-profile completion state (v1.7.0)
│   ├── post_id_1/
│   │   ├── post-metadata.json
│   │   ├── post.html (if API fails)
│   │   ├── image1.jpg
│   │   ├── image2.png
│   │   ├── mega_downloads/      # Files from mega.nz links
│   │   ├── google_drive_downloads/  # Files from Google Drive links
│   │   ├── dropbox_downloads/   # Files from Dropbox links
│   │   └── ...
│   └── post_id_2/
│       └── ...
└── username2/
    ├── .download-state.json     # Each profile has its own state file
    └── ...

Features in Detail

Smart Resume Functionality

Detects previously downloaded posts and skips them automatically
Verifies image file integrity and re-downloads corrupted files
Resumes partial downloads from where they left off

Concurrent Downloads

Downloads multiple images simultaneously for faster processing
Configurable concurrency limits to avoid overwhelming servers
Automatic rate limiting with configurable delays

Error Handling

Automatic retry with exponential backoff for network failures
Graceful handling of timeouts and connection errors
Detailed error logging for troubleshooting

Data Sources

API Endpoints: Primary method for fetching post data and image URLs
HTML Scraping: Fallback method when API endpoints are unavailable
Multiple Selectors: Uses various CSS selectors to find content across different page layouts

Troubleshooting

Common Issues

No posts found for profile

Verify the profile URL is correct and accessible
Check if the user has public posts
Some profiles may require different scraping methods

Download failures

Check internet connection
Verify kemono.cr is accessible
Increase retry attempts in configuration
Reduce concurrent downloads if experiencing timeouts

"Cannot call write after a stream was destroyed" error

This has been fixed in the latest version
Ensure you're using the updated fileUtils.js

API failures

The application automatically falls back to HTML scraping
Some content may only be available through specific methods

Debug Information

The application provides extensive logging:

Download progress and statistics
Error messages with detailed context
API vs HTML scraping status
File existence and integrity checks

Performance Tuning

For better performance:

Increase maxConcurrentImages (but not above 5-10)
Decrease delays if server allows
Use SSD storage for faster file operations

For server-friendly downloads:

Decrease maxConcurrentImages to 1-2
Increase delays between requests
Reduce retry attempts

Dependencies

Production

axios: HTTP client for API requests and downloads
cheerio: Server-side jQuery implementation for HTML parsing
fs-extra: Enhanced file system operations
megajs: Mega.nz file and folder download client for anonymous downloads
puppeteer-extra: Browser automation with stealth mode for anti-bot bypass
puppeteer-extra-plugin-stealth: Stealth plugin to avoid detection

Development

jest: Testing framework with comprehensive test suite (391 passing tests)
@jest/globals: Jest utilities for modern testing

License

This project is for educational and personal use only. Please respect kemono.cr's terms of service and be mindful of server resources.

Contributing

Fork the repository
Create a feature branch
Make your changes
Test thoroughly
Submit a pull request

Suggested Improvements

Based on code review and analysis, here are prioritized improvements to enhance the project:

High Priority

Testing & Quality ✅ Target: 90%+ Coverage (Current: 77.36%)

391 passing tests across 16 test suites
Excellent test coverage for external downloaders:
- megaDownloader.js (100% statements, 95.06% branches) ✅ - Full coverage with 45 tests including speed/ETA tracking
- googleDriveDownloader.js (98.8% statements, 81.13% branches) ✅ - 41 comprehensive tests for Google Drive downloads
- dropboxDownloader.js (96.9% statements, 90.76% branches) ✅ - 31 comprehensive tests for Dropbox downloads
Good test coverage across core components:
- concurrentDownloader.js (96.77% statements) ✅ - Comprehensive tests for semaphore logic, error handling, and concurrency
- urlUtils.js (100% statements, 98.43% branches) ✅ - Complete URL validation and parsing coverage
- config.js (98.21% statements) ✅ - Configuration management fully tested
- delay.js (100% statements) ✅ - Full coverage
- imageExtractor.js (90% statements) ✅ - Comprehensive media extraction tests
Areas needing improvement:
- fileUtils.js (66.35% statements) - File download edge cases need more tests
- downloadChecker.js (73.72% statements) - Download verification needs more edge case tests
- htmlParser.js (74.5% statements) - HTML parsing edge cases need coverage
- KemonoDownloader.js (67.3% statements) - Integration tests need expansion
- browserClient.js (59.47% statements) - Browser automation edge cases need more tests
- kemonoApi.js (44.85% statements) - API edge cases and error scenarios need coverage
Overall Project Coverage: 77.36% statements, 63.86% branches, 78.67% functions, 78.2% lines
Add integration tests with real API calls using recorded responses
Add E2E tests for complete download scenarios

Error Handling & Resilience

Implement circuit breaker pattern for API calls to prevent cascade failures
Add retry with exponential backoff for transient network errors (partially implemented)
Implement request timeout controls with graceful degradation
Add structured logging with log levels (debug, info, warn, error) to file
Better error messages with actionable suggestions for common failures

Performance Optimization

Implement HTTP connection pooling to reuse connections (currently creates new connections)
Add response caching for API calls to reduce redundant requests
Use streaming downloads for very large files to reduce memory usage
Add checksum verification (MD5/SHA256) to detect corrupted downloads
Implement download resume from partial files using HTTP Range headers

Medium Priority

Code Quality & Maintainability

Refactor large functions: Break down downloadPost() (108 lines) into smaller, testable units
Add JSDoc annotations for better IDE support and type safety
Extract magic numbers to named constants (delays, timeouts, retry attempts)
Use ES6 modules instead of CommonJS for modern JavaScript features
Implement dependency injection pattern for easier testing and mocking

User Experience Enhancements

CLI argument parsing using commander.js or yargs:

kemono-downloader --profile "url" --output "./downloads" --concurrent 5

Interactive mode for profile selection and configuration
Better progress indicators:
- Real-time download speed (KB/s, MB/s)
- ETA for remaining downloads
- Individual file progress bars
Dry-run mode to preview what would be downloaded without actually downloading
Filtering options: Download only specific date ranges, file types, or post IDs
Download history export to CSV/JSON for tracking and analysis

Security Enhancements

Content-type verification: Ensure downloaded files match expected MIME types
File size limits: Prevent downloading unexpectedly large files
Enhanced path sanitization: Additional checks against directory traversal attacks
Rate limiting feedback: Detect and handle 429 (Too Many Requests) responses
Suspicious file detection: Warn about unexpected file types or sizes

Low Priority

Advanced Features

Database storage (SQLite) for download tracking instead of filesystem checks
- Faster lookups for "already downloaded" checks
- Query download history
- Track download statistics over time
Download scheduling: Set time windows for downloads
Cloud storage support: Direct upload to S3, Google Cloud Storage, etc.
Duplicate detection: Find and remove duplicate images across posts using perceptual hashing
Web UI dashboard: Browser-based interface for monitoring and control
Docker containerization: Easy deployment and isolation
Webhook notifications: Send alerts to Discord, Telegram, Slack on completion or errors
Multi-language support: Internationalization (i18n) for global users

Developer Experience

Pre-commit hooks: Automated linting and testing before commits
Continuous Integration: GitHub Actions for automated testing
Code coverage badges: Display coverage metrics in README
API documentation: Generate docs from JSDoc comments
Architecture diagrams: Visual representation of system components
Examples directory: Sample configurations and use cases
Contribution guidelines: CONTRIBUTING.md with development workflow

Quick Wins (Easy Improvements)

These can be implemented quickly with high impact:

Add .nvmrc file for Node.js version consistency
Add .editorconfig for consistent code formatting across editors
Add ESLint configuration for code quality enforcement
Add Prettier for automatic code formatting
Extract configuration to environment variables (.env file support)
Add --version and --help flags to CLI
Add download summary export (save stats to JSON file)
Add --validate-config command to check config.json syntax
Add bandwidth throttling option to limit download speed
Add retry queue visualization showing what's being retried

Technical Debt

Items that should be addressed to improve long-term maintainability:

Replace console.log with proper logger (winston, pino, or bunyan)
Implement proper event emitters for progress tracking instead of callbacks
Standardize error types with custom error classes
Remove hardcoded kemono.cr references to support other similar sites
Separate concerns: Split KemonoDownloader into smaller, focused classes
Add graceful shutdown handling for SIGINT/SIGTERM signals
Memory profiling: Identify and fix memory leaks in long-running downloads

Metrics & Monitoring

Add observability to understand system behavior:

Download statistics dashboard: Success rate, average speed, error types
Performance metrics: Response times, queue depths, memory usage
Health checks: Endpoint to verify system status
Alert thresholds: Notify when error rate exceeds acceptable levels

Changelog

Version 1.7.0 (Latest)

Per-Profile State Files: Docker-optimized state management stored in download folders
- State stored as .download-state.json in each profile's download folder
- Perfect for Docker containers where download volume is persistent but profiles.txt may be read-only
- Automatically skips completed profiles on subsequent runs
- Tracks completion status, timestamps, post/image counts, and errors per profile
- No modification of profiles.txt required
- Easy reset: delete .download-state.json file from profile folder
- Works seamlessly with Docker + NAS storage setups (e.g., Synology)
Version Display: Shows application version on startup for easy Docker verification
- Displays version banner from package.json
- Helps verify correct deployment in containerized environments
Test Suite Expansion: 391 passing tests with 77.36% overall coverage (improved from 75.01%)
- Added 27 comprehensive tests for profile file management (97.64% coverage)
- Improved kemonoApi.js coverage from 44.85% to 79.71%

Version 1.6.0

Download State Management Tools: Added utilities to manage and rebuild download state
- New rebuild-state command to scan existing downloads and create state file
- New check-state command to view current download state statistics
- Automatically marks completed profiles to skip re-verification on subsequent runs
- Critical performance improvement for large profile collections (450+ profiles)
- Persistent state tracking across Docker container restarts
- State file can be mounted as volume for Docker deployments
- Note: Superseded by per-profile state files in v1.7.0 for better Docker compatibility
State Tracking Enhancement: Improved existing download state tracking with utility scripts
- Solves slow startup times caused by re-verifying all previously downloaded posts
- Enables quick resume for interrupted downloads
- Comprehensive profile completion tracking

Version 1.4.0

Dropbox Download Support: Automatically detects and downloads public files from Dropbox share links
- Supports all Dropbox share URL formats (s/, scl/fi/, dropboxusercontent.com)
- Automatic dl=0 to dl=1 conversion for direct downloads
- Gracefully skips folder URLs with informative messages
- 96.9% test coverage with 31 comprehensive tests
- Progress tracking and exponential backoff retry logic
Google Drive Download Support: Automatically detects and downloads public files from Google Drive links
- Supports drive.google.com file URLs and Google Docs/Sheets/Slides
- Gracefully skips folders (requires API key for folder downloads)
- 98.8% test coverage with 41 comprehensive tests
- Exponential backoff retry logic and progress tracking
Mega.nz Progress Enhancement: Added download speed and ETA tracking
- Real-time speed calculation (MB/s)
- Smart ETA formatting (seconds, minutes, hours)
- Enhanced progress display matching modern download managers
Test Suite Expansion: 334 passing tests across 14 test suites, 75.01% overall coverage

Version 1.3.0

Google Drive Download Support: Initial implementation

Version 1.2.0

Thumbnail Upgrade System: Automatically detects and upgrades small files (<500KB) to full resolution
Thumbnail Fallback: Downloads full resolution first, falls back to thumbnail on 404 errors
Browser Automation: Integrated Puppeteer with stealth mode for anti-bot bypass
Anti-Bot Detection: Proper HTTP headers (Referer, Origin, Sec-Fetch-*) to bypass 403 errors
Enhanced HTML Parser: Comprehensive HTML parsing with 4 fallback strategies and 100% test coverage
Exponential Backoff: Retry logic with 5s → 10s → 20s delays for failed requests
Test Coverage Improvements: 218 passing tests, improved coverage from 62% to 79.83%
Added comprehensive tests for htmlParser.js (19 new tests)
Improved test coverage for concurrentDownloader.js (98.64%)
Fixed all failing tests and import path issues

Version 1.1.0

Fixed stream destruction error in download handling
Improved error handling and recovery
Enhanced concurrent download management
Better progress tracking and logging
Domain migration from kemono.party to kemono.cr

Version 1.0.0

Initial public release
Bulk profile processing
Concurrent downloads with configurable limits
Retry logic and error handling
API endpoints with HTML fallback

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
src		src
tests		tests
.gitignore		.gitignore
CONFIG_GUIDE.md		CONFIG_GUIDE.md
HTML_PARSING.md		HTML_PARSING.md
LICENSE		LICENSE
README.md		README.md
check-state.js		check-state.js
config.example.json		config.example.json
index.js		index.js
jest.config.js		jest.config.js
package-lock.json		package-lock.json
package.json		package.json
profiles.example.txt		profiles.example.txt
rebuild-state.js		rebuild-state.js
update-html-files.js		update-html-files.js

Folders and files

Latest commit

History

Repository files navigation

Kemono posts downloader

Features

Installation

Quick Start

Configuration

Configuration Options

Download Settings

API Settings

Storage Settings

Logging Settings

Directory Structure

Features in Detail

Smart Resume Functionality

Concurrent Downloads

Error Handling

Data Sources

Troubleshooting

Common Issues

Debug Information

Performance Tuning

Dependencies

Production

Development

License

Contributing

Suggested Improvements

High Priority

Testing & Quality ✅ Target: 90%+ Coverage (Current: 77.36%)

Error Handling & Resilience

Performance Optimization

Medium Priority

Code Quality & Maintainability

User Experience Enhancements

Security Enhancements

Low Priority

Advanced Features

Developer Experience

Quick Wins (Easy Improvements)

Technical Debt

Metrics & Monitoring

Changelog

Version 1.7.0 (Latest)

Version 1.6.0

Version 1.4.0

Version 1.3.0

Version 1.2.0

Version 1.1.0

Version 1.0.0

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages