Building production-grade distributed systems with automated AWS deployments, achieving sub-1ms response times at 1K+ concurrent users
π Actively seeking: Remote backend engineering positions
π Specialization: Python backend + DevOps automation + Distributed systems
π Location: Dhaka, Bangladesh (Open to worldwide remote)
π¬ Ask me about: FastAPI, System Design, AWS Infrastructure
I don't just write backend codeβI architect complete production systems with full automation from infrastructure to deployment:
β Infrastructure as Code Expert - Automated AWS deployments managing 11+ EC2 instances with Pulumi & Ansible β Performance Engineering - Optimized systems achieving sub-1ms response times with 1K+ concurrent users β Polyglot Architecture - Go for performance-critical paths, Python for business logic β DevOps Automation - Zero-touch deployments with CI/CD, containerization, and orchestration β Distributed Systems - Built fault-tolerant architectures with auto-scaling, load balancing, and high availability β Technical Writing - Published articles explaining complex architectures in simple words
π― 6+ Production-Ready Applications Built
β‘ Sub-1ms API Response Times Achieved
π₯οΈ 11+ AWS EC2 Instances Automated
π¦ 1K+ Concurrent Users Supported
π Container Orchestration Systems Designed
π§ͺ 500+ DSA Problems Solved
π 200K+ Technical Blog Readers
π₯ 40+ Educational Videos Created
High Complexity - Polyglot Microservices + Production Frontend + Full Observability
Production-grade URL shortener with sub-1ms Go redirect service, React 18 frontend with two-layer navigation, complete OpenTelemetry observability stack, and cloud-native deployment on Render + Cloudflare Pages
- Architected polyglot microservices with Python FastAPI for
create_service, high-performance Go (Chi) forredirect_service, and Celeryworker_serviceβ each independently scalable viadocker-compose-decoupled.yml. - Achieved sub-1ms Go redirect latency (vs 5-7ms Python) with clean
internal/package architecture, three-state circuit breaker (Closed β Open β Half-Open), and single-binary deployment. - Built the "Snipl" frontend as a separate
link-loverepo consumed as a git submodule β Vite + React 18 + TypeScript, deployed to Cloudflare Pages, served as an installable PWA with offline-capable service worker and web manifest. - Designed a two-layer navigation model for in-page UX: React Router 6 owns shareable URLs (
/r/:shortKey,/_health, 404) while aViewContext(activeView + footerPanel) drives glide transitions between Index, Dashboard, MultiLink, and Custom Links β no URL change, no flash of empty state, no back-button mismatch. - Built direction-aware GlideView with Framer Motion (ease-out-quint
[0.22, 1, 0.36, 1], ~60px X offset) and a modal-styleFooterPanelViewfor legal/feedback content withoverscroll-containscroll isolation,aria-modalfocus management, and ESC-to-close. - Established a lock-step Layout invariant (
h-screen overflow-hidden+ fixed Navbar/Footer chrome) so manual scroll stays trapped inside the active view β required for panel scroll isolation to work. - Respected
prefers-reduced-motionthroughout with a media-query-awarescrollToIdhelper and per-callbehaviorfallbacks; all animations have a no-motion code path. - Implemented Redis sliding window rate limiter for Python FastAPI services using Lua script atomic operations, dual-layer architecture (Nginx 30r/m + App 10r/m), and IP+UA hash client identification with dedicated Redis DB isolation.
- Implemented cache-aside pattern with Redis (30-minute TTL) + MongoDB fallback, optimizing for 95%+ cache hit rate and automatic expiration handling.
- Deployed complete observability stack with
OpenTelemetrycollector,Tempo(distributed tracing),Loki(log aggregation via Promtail), andGrafanadashboards for end-to-end service visibility. - Engineered production-grade resilience with
PgBouncerconnection pooling (53% reduction in overhead), atomic PostgreSQL key acquisition usingSELECT FOR UPDATE SKIP LOCKED, and exponential backoff retries. - Implemented intelligent key pre-population using
Celeryworkers maintaining pool of unused keys for instant URL creation without database latency, with hybrid strategy auto-selecting optimal insertion method. - Built comprehensive testing infrastructure with multi-database mocking (SQLite, mongomock, fakeredis), async pytest framework, httpx API client testing, and isolated test environments.
- Deployed to Render (backend) + Cloudflare Pages (frontend) with branch-based preview environments, env-var driven
BASE_URL/CORSconfiguration, and zero-touch CI/CD viaGitHub Actions.
Technical Deep Dive: Read my Medium articles
Tech Stack: Go Chi Router FastAPI Celery Redis PostgreSQL MongoDB Nginx Docker React 18 Vite TypeScript Framer Motion TanStack Query Zod shadcn/ui Tailwind PWA Cloudflare Pages Render Pulumi Ansible OpenTelemetry Tempo Loki Grafana Promtail PgBouncer Circuit Breaker pytest vitest httpx GitHub Actions
Key Learnings:
- Polyglot microservices: Go for performance-critical paths, Python for business logic
- Clean architecture with internal/ package structure in Go
- Circuit breaker pattern for fault tolerance in distributed systems
- Two-layer navigation: URL routes for shareable destinations, context state for in-page transitions
- Scroll-isolation invariants:
h-screen overflow-hiddenroot + paneloverscroll-contain - a11y in motion:
aria-modal, focus management, ESC handling,prefers-reduced-motion - End-to-end observability with OpenTelemetry + Tempo + Loki
- Multi-database testing strategies with mocking frameworks
- Submodule workflow: pointer-based dependency between backend (this repo) and frontend (
link-love)
π ElastiKube: Production K3s Autoscaler π₯
Most Complex Infrastructure Project - ML-Enhanced Event-Driven Architecture
Production-grade autoscaling system for K3s clusters with 4-layer intelligent scaling architecture, ML-based predictive scaling, and multi-AZ high availability
- Architected 4-layer autoscaling system: (1) Data Collection for ML training, (2) Time-Aware Scaling with peak/off-peak thresholds (85%/60% vs 60%/40%), (3) Flash Sale Detection with emergency response to CPU spikes >30% in 2 minutes, (4) Predictive Scaling using Prophet models forecasting CPU 15 minutes ahead.
Version Milestones:
- v1.3 β Fast Worker Bootstrap with Pre-Baked AMI (91sβ30s bootstrap, auto-detect network iface, k3s-agent-binary role, Lambda SSM-only AMI)
- v1.2 β ML Training Pipeline + Predictive Scaling (Layer 4): Prophet model, Kubernetes CronJob for automated weekly retraining, feature engineering, cross-validation
- v1.1 β Layered Autoscaling Architecture: Time-Aware Scaling, Flash Sale Detection, Permanent Worker Protection, fixed CloudWatch LogGroups
- v1.0 β Event-Driven Lambda Architecture with DynamoDB state management, Multi-AZ distribution, LIFO scale-down, Spot fallback, 17 CloudWatch alarms
Details:
- Implemented ML training pipeline with Kubernetes CronJob for automated weekly model retraining (Sunday 2 AM UTC), feature engineering (temporal cyclical encoding, lag features, rolling statistics), time-series cross-validation with MAE/RMSE metrics, and backtesting with prediction interval coverage.
- Built event-driven Lambda architecture with four specialized functions (Decision, Scale-Up, Scale-Down, Cleanup) orchestrated through EventBridge for fault tolerance, crash recovery via Write-Ahead Log (WAL), and distributed locking with 200s timeout.
- Designed multi-AZ high availability with round-robin worker distribution across 3 availability zones (ap-southeast-1a/b/c), single NAT Gateway optimization, and LIFO scale-down maintaining natural distribution balance.
- Implemented multi-layer idempotency including bootstrap verification, cooldown checks (scale-up: 300s, scale-down: 900s), pending instance detection, and automatic stale flag cleanup to prevent duplicate scaling operations.
- Integrated comprehensive observability with 17 CloudWatch alarms (CRITICAL/WARNING severity), Prometheus health graceful degradation (conservative defaults when unavailable), and fixed LogGroups for stable dashboard references.
- Engineered spot instance support with automatic On-Demand fallback when spot capacity unavailable (InsufficientInstanceCapacity, SpotInstanceCapacityNotAvailable, MaxSpotInstanceCountExceeded), graceful 2-minute node drain via SSM, and proper Kubernetes cleanup.
- Deployed infrastructure as code with Pulumi (VPC, EC2, Lambda, DynamoDB, EventBridge, IAM) and Ansible (k3s-worker-preinstall and k3s-agent-binary roles for bare-metal worker provisioning, worker-bake.yml for Pre-Baked AMI creation)
Tech Stack: AWS Lambda EventBridge DynamoDB EC2 K3s Prometheus CloudWatch Prophet Kubernetes CronJob SSM Secrets Manager S3 Python 3.11 Pulumi Ansible kubectl Node Exporter Pre-Baked AMI
Key Learnings:
- Layered autoscaling architecture combining reactive (time-aware, flash sale) and proactive (ML predictive) scaling
- Event-driven architecture patterns with Lambda chaining via EventBridge and WAL-based crash recovery
- Distributed systems state management with DynamoDB conditional writes (optimistic locking) and 200s distributed locks
- ML pipeline deployment with automated retraining via Kubernetes CronJob, model versioning in S3
- Multi-AZ infrastructure design with cost optimization (single NAT, round-robin scale-up, LIFO scale-down)
- Kubernetes cluster operations including node lifecycle, pod draining via SSM (120s timeout), and CronJob scheduling
- Pre-baked AMI patterns for fast worker bootstrap (91sβ30s), Ansible roles for bare-metal provisioning
High Complexity - Media Processing Pipeline
Full-Stack advanced video streaming solution with adaptive bitrate technology
- Engineered a secure and scalable video platform with a
Django REST APIand aReact/TypeScriptfrontend, architected for high-performance adaptive streaming. - Implemented a robust security model, using
dj-rest-authfor token-based authentication and a protected media workflow (via NginxX-Accel-Redirect) to ensure only authorized users can access streaming content. - Built an asynchronous video processing pipeline using
Celery,Redis, andFFMPEGto transcode videos forDASHplayback, ensuring a smooth, low-latency user experience. - Automated the entire cloud workflow, from provisioning
AWS S3infrastructure withPulumiand configuring servers withAnsible, to deploying theDocker-containerized application viaGitHub Actions.
Tech Stack: Django React Celery Redis PostgreSQL FFMPEG DASH AWS S3 Nginx Docker Pulumi Ansible
β‘ Distributed Job Queue System π₯
Medium-High Complexity - Worker Orchestration
Scalable job processing system with advanced features
- Developed a distributed job queue system using
FastAPIandRedisto manage asynchronous tasks with priority-based queuing and automatic worker scaling. - Implemented a real-time monitoring dashboard with
Jinja2templates to provide visibility into job status, queue metrics, and worker activity. - Engineered an automatic worker scaling mechanism based on job load and worker availability, using
Docker Swarmto dynamically adjust resources. - Created a comprehensive error handling and fault tolerance system, including automatic retries for failed jobs and a dead-letter queue for unrecoverable tasks.
- Designed a job dependency feature to ensure complex workflows are executed in the correct order, improving system reliability.
- Containerized all services (
API,Worker,Monitor) usingDockerfor consistent deployment and simplified management.
Tech Stack: FastAPI Redis Docker Swarm Jinja2
Medium Complexity - Full-Stack Application
Full-stack financial management application for tracking installments and payments
- Backend: High-performance API built with
FastAPI, usingSQLAlchemyfor ORM with aPostgreSQLdatabase. - Frontend: Modern and responsive UI built with
React,TypeScript, andVite, styled withTailwind CSSandShadcn UI. - Asynchronous Tasks:
CeleryandRedismanage background jobs like sending OTP and due date notification emails. - Authentication: Secure JWT-based authentication with role-based access for customers and admins.
- Data Management:
Alembichandles database schema migrations, andTanStack Querymanages server state on the frontend. - DevOps: Fully containerized with
DockerandDocker Composefor reproducible development and deployment environments.
Tech Stack: FastAPI React TypeScript PostgreSQL SQLAlchemy Redis Celery Docker Tailwind CSS Shadcn UI Alembic
High Complexity - Async Communication
Production-ready notification microservice with service-to-service auth and distributed rate limiting
- Modern Backend: Python + FastAPI with async API endpoints, direct RabbitMQ consumer (no Celery) with pika for simplified ops
- Multi-Channel Delivery: Email (SendGrid), SMS (Twilio), Push (Firebase) via factory pattern
- JWT Service Auth: Scoped tokens for service-to-service auth with 60-min expiry
- Rate Limiting: Redis token bucket (100 req/min + burst per service) with Lua script atomic ops
- Redis Caching: 30s TTL cache for notification lookups with automatic invalidation on status update
- Idempotency + Retry: Worker skips already-sent notifications; 3 retry attempts (1s, 2s, 4s delays)
- Structured JSON Logging: Production observability via
extra={}dict pattern - Containerized: Docker Compose with PostgreSQL, RabbitMQ, Redis services
- Comprehensive Testing: pytest suite with unit and integration tests
Tech Stack: FastAPI Python pika PostgreSQL RabbitMQ Redis SQLAlchemy Docker pytest JWT SendGrid Twilio Firebase
Medium Complexity - HA Architecture
Enterprise-grade Todo application with AWS infrastructure
- Engineered full-stack application with FastAPI backend and React frontend
- Implemented Infrastructure as Code using Pulumi for AWS resource management
- Designed fault-tolerant architecture with load balancing across multiple AZs
- Built PostgreSQL replication system with automated backup/recovery
- Integrated Redis Sentinel for high availability caching
Tech Stack: FastAPI React AWS EC2 PostgreSQL Redis Sentinel Nginx Docker
August 2024 - Present | Portfolio Projects
π― Building production systems to demonstrate platform engineering capabilities while actively seeking full-time opportunities
- Architected and deployed 5 production-grade applications serving 5,000+ real users across e-commerce, fintech, and SaaS domains
- Managed 11+ EC2 instances with 99.9% uptime through multi-AZ AWS infrastructure with automated deployment
- Built ElastiKube: ML-enhanced Kubernetes autoscaler achieving 60% cost reduction with 4-layer intelligent scaling (time-aware, flash sale detection, Prophet forecasting)
- Engineered polyglot URL shortener with Go redirect service achieving sub-1ms latency and comprehensive observability (OpenTelemetry, Tempo, Loki, Grafana)
- Automated infrastructure deployment with Pulumi & Ansible, reducing deployment time 93.75% (4 hours β 15 minutes)
Tech Stack: Python, Go, FastAPI, Django, AWS, Kubernetes, Docker, PostgreSQL, Redis, MongoDB, Pulumi, Ansible, OpenTelemetry, Grafana
June 2024 - August 2024 | Dhaka, Bangladesh
π― Delivered measurable business impact:
- Designed role-based admin dashboard for 200+ users with real-time meal analytics
- Automated 40% of manual effort in account management through intelligent workflows
- Built production-ready meal scheduling system using cron jobs with configurable time boundaries
Tech Stack: Python, Django, PostgreSQL, Docker, JavaScript, HTML/CSS
Bachelor of Science in Computer Science & Engineering
Daffodil International University | September 2017 - December 2022
- Building a Scalable URL Shortener: System Design to Production
- Complete architectural breakdown with Infrastructure as Code
- 100+ views, featured in system design discussions
- 200,000+ readers on Quora with tech insights in Bengali
- Nearly 200 followers engaging with technology content
- 40+ instructional videos on YouTube bridging Bengali tech education gap
- 500+ Problems Solved across multiple platforms
- Active on: BeeCrowd, LightOJ, HackerRank, LeetCode
- Contest Achievements:
- DIU Take-Off Programming Contest (Ranked 6th out of 300 participants)
- Multiple university-level programming contest participations
- π³ Kubernetes - Container orchestration at scale
I'm actively seeking opportunities to work on:
- ποΈ Distributed systems requiring high availability and fault tolerance
- βοΈ Cloud-native applications with automated infrastructure
- π Microservices architectures with proper observability
- π Open-source projects where I can contribute infrastructure expertise
Looking for a backend engineer who can:
- β Design scalable distributed systems
- β Automate infrastructure from scratch
- β Write clean, testable, maintainable code
- β Document complex architectures clearly
Let's build something amazing together!
- π§ Email: kaziiriad@gmail.com
- π± Phone: +880 1683152495
- πΌ LinkedIn: Sultan Mahmud
- π Medium: @kazisultanmahmud
- πΊ YouTube: I.T. Darshonik


