A modern, high-performance key-value database written in Rust with support for both local and S3 storage backends. Panda combines the speed of in-memory operations with the durability of persistent storage using Parquet format and write-ahead logging.
- High-Performance: In-memory operations with persistent storage
- Dual Storage Backends: Local filesystem and AWS S3 support
- Parquet Integration: Efficient columnar storage format for analytics
- Write-Ahead Logging (WAL): Immediate durability with crash recovery
- Network Server: TCP-based client-server architecture
- Interactive Client: Command-line interface for database operations
- Automatic Snapshots: Periodic Parquet snapshots for data persistence
- Crash Recovery: Automatic recovery from WAL and Parquet files
- Storage Statistics: Real-time monitoring of storage usage
- Backward Compatibility: Support for legacy JSON snapshot format
- Cloud Storage: Store data in AWS S3 buckets
- Region Support: Configurable AWS regions
- Automatic Failover: Fallback to local storage if S3 is unavailable
- Cost Optimization: Efficient S3 operations with minimal API calls
- Rust: 1.70+ (for async/await support)
- Memory: 512MB+ RAM (depending on dataset size)
- Storage: 1GB+ free space for local storage
- AWS Account: For S3 storage backend
- IAM Permissions: S3 read/write access
- Environment Variables: AWS credentials configured
# Install Rust (if not already installed)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source ~/.cargo/env
# Verify installation
rustc --version
cargo --version# Clone the repository
git clone <repository-url>
cd panda
# Build the project
cargo build --release
# The binaries will be in target/release/
# - server: Database server
# - client: Interactive client# Start with default local storage
./target/release/server
# Or with custom data directory
./target/release/server --data-dir /path/to/data# Connect to the server
./target/release/client
# Or connect to a specific host/port
./target/release/client --host 127.0.0.1 --port 8080# Store a key-value pair
PUT greeting "Hello, World!"
# Retrieve a value
GET greeting
# List all keys
LIST
# Delete a key
DELETE greeting
# Get database statistics
STATS
# Create a manual snapshot
SNAPSHOT
# Show help
HELP
# Exit
QUITThe server automatically creates a ./data directory for local storage:
kv_database.parquet: Main data storage in Parquet formatkv_database.wal: Write-ahead log for immediate durability
To use S3 storage, set the following environment variables:
export AWS_ACCESS_KEY_ID=your_access_key
export AWS_SECRET_ACCESS_KEY=your_secret_key
export AWS_DEFAULT_REGION=us-east-1
export PANDA_S3_BUCKET=your-bucket-name
export PANDA_USE_S3=true| Command | Description | Example |
|---|---|---|
GET <key> |
Retrieve value for key | GET username |
PUT <key> <value> |
Store key-value pair | PUT username "john_doe" |
DELETE <key> |
Remove key and value | DELETE username |
LIST |
Show all keys | LIST |
SIZE |
Show number of keys | SIZE |
| Command | Description | Example |
|---|---|---|
SNAPSHOT |
Create manual Parquet snapshot | SNAPSHOT |
STATS |
Show storage statistics | STATS |
HELP |
Show command help | HELP |
QUIT |
Exit client | QUIT |
The STATS command shows:
- Storage Type: Local or S3 (with bucket info)
- Parquet Size: Size of main data file in KB
- WAL Size: Size of write-ahead log in KB
- In-Memory Keys: Number of keys currently in memory
- Pending Operations: Operations waiting for next snapshot
Example output:
Storage: Local (./data), Parquet: 45KB, WAL: 2KB, In-memory: 150 keys, Pending: 12 ops
| Variable | Description | Default |
|---|---|---|
PANDA_USE_S3 |
Enable S3 storage backend | false |
PANDA_S3_BUCKET |
S3 bucket name | None |
PANDA_S3_REGION |
AWS region | us-east-1 |
PANDA_DATA_DIR |
Local data directory | ./data |
PANDA_SNAPSHOT_INTERVAL |
Snapshot interval (operations) | 50 |
PANDA_SNAPSHOT_TIMEOUT |
Snapshot timeout (seconds) | 180 |
# Start server with custom port
./target/release/server --port 9090
# Start with specific data directory
./target/release/server --data-dir /var/lib/panda
# Start with S3 backend
./target/release/server --s3-bucket my-database-bucket --s3-region us-west-2- In-Memory Storage: Fast HashMap for active data
- Persistent Storage: Parquet files for data durability
- Write-Ahead Log: Immediate durability for operations
- Local Storage: File-based storage with directory management
- S3 Storage: Cloud storage with async operations
- Unified Interface: Common API for both backends
- TCP Server: Multi-threaded server handling concurrent clients
- Command Parser: Robust command parsing with error handling
- Response Format: Structured responses for easy client integration
- Command History: Up/down arrow navigation
- Auto-completion: Tab completion for commands
- Error Handling: Clear error messages and suggestions
-
Write Operations:
Client โ Server โ WAL โ In-Memory โ Pending Operations โ Parquet Snapshot -
Read Operations:
Client โ Server โ In-Memory Cache โ Response -
Recovery Process:
Startup โ Load Parquet โ Apply WAL โ Ready for Operations
- Write Throughput: ~10,000 ops/sec (local storage)
- Read Throughput: ~50,000 ops/sec (in-memory)
- Snapshot Creation: ~1,000 records/sec
- Recovery Time: <1 second for typical datasets
- Base Memory: ~10MB for server process
- Per Key: ~100 bytes overhead
- WAL Buffer: Configurable, typically 1-10MB
- Parquet Compression: ~70% space savings
- WAL Growth: Linear with operation count
- Snapshot Frequency: Configurable (default: 50 operations)
- Write-Ahead Logging: Every operation is immediately persisted
- Periodic Snapshots: Automatic Parquet snapshots
- Crash Recovery: Automatic recovery from WAL and Parquet files
- Graceful Degradation: Fallback to local storage if S3 unavailable
- Corruption Detection: Validation of Parquet and WAL files
- Error Reporting: Detailed error messages for troubleshooting
- Health Checks: Built-in health check endpoints
- Metrics: Real-time storage and performance metrics
- Logging: Comprehensive logging for debugging
# Clone repository
git clone <repository-url>
cd panda
# Install dependencies
cargo build
# Run tests
cargo test
# Run with debug logging
RUST_LOG=debug cargo run --bin serverpanda/
โโโ src/
โ โโโ main.rs # Server implementation
โ โโโ client.rs # Interactive client
โ โโโ lib/
โ โโโ wal_storage.rs # WAL implementation
โโโ data/ # Default data directory
โโโ Cargo.toml # Dependencies and build config
โโโ README.md # This file
- arrow: Apache Arrow for Parquet support
- parquet: Parquet file format handling
- aws-sdk-s3: AWS S3 integration
- tokio: Async runtime for S3 operations
- serde: Serialization for WAL and configuration
- ctrlc: Graceful shutdown handling
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests for new functionality
- Run the test suite
- Submit a pull request
- Follow Rust conventions
- Use meaningful variable names
- Add comments for complex logic
- Include error handling for all operations
This project is licensed under the MIT License - see the LICENSE file for details.
# Check if port is in use
netstat -an | grep 8080
# Check permissions for data directory
ls -la ./data/# Verify AWS credentials
aws sts get-caller-identity
# Check S3 bucket permissions
aws s3 ls s3://your-bucket-name/# Monitor memory usage
top -p $(pgrep server)
# Check disk space
df -h ./data/- Issues: Create an issue on GitHub
- Documentation: Check this README and inline code comments
- Logs: Enable debug logging with
RUST_LOG=debug
- Redis Protocol: Redis-compatible client interface
- HTTP API: RESTful API for web applications
- Replication: Multi-node replication support
- Encryption: At-rest and in-transit encryption
- Monitoring: Prometheus metrics integration
- Backup/Restore: Automated backup strategies
- Connection Pooling: Optimize S3 connection reuse
- Compression: Additional compression algorithms
- Indexing: Secondary indexes for complex queries
- Caching: Multi-level caching strategies
Panda - Fast, reliable, and scalable key-value storage for modern applications.