Skip to content
View kaziiriad's full-sized avatar

Block or report kaziiriad

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
kaziiriad/README.md

Hi there, I'm Sultan Mahmud πŸ‘‹

πŸš€ Backend Engineer | Infrastructure Automation Specialist | Distributed Systems Architect

Building production-grade distributed systems with automated AWS deployments, achieving sub-1ms response times at 1K+ concurrent users

LinkedIn Email Open to Work Medium Resume


πŸ’Ό Current Status

πŸ” Actively seeking: Remote backend engineering positions
πŸš€ Specialization: Python backend + DevOps automation + Distributed systems
πŸ“ Location: Dhaka, Bangladesh (Open to worldwide remote)
πŸ’¬ Ask me about: FastAPI, System Design, AWS Infrastructure


🎯 What Sets Me Apart

I don't just write backend codeβ€”I architect complete production systems with full automation from infrastructure to deployment:

βœ… Infrastructure as Code Expert - Automated AWS deployments managing 11+ EC2 instances with Pulumi & Ansible βœ… Performance Engineering - Optimized systems achieving sub-1ms response times with 1K+ concurrent users βœ… Polyglot Architecture - Go for performance-critical paths, Python for business logic βœ… DevOps Automation - Zero-touch deployments with CI/CD, containerization, and orchestration βœ… Distributed Systems - Built fault-tolerant architectures with auto-scaling, load balancing, and high availability βœ… Technical Writing - Published articles explaining complex architectures in simple words


πŸ† Technical Expertise

Backend Development

Python Go FastAPI Django Asyncio

Infrastructure & DevOps

AWS Terraform Pulumi Ansible Docker Nginx CI/CD PgBouncer

Distributed Systems

Celery RabbitMQ Redis Docker Swarm

Databases & Storage

PostgreSQL MongoDB Redis

Observability & Monitoring

OpenTelemetry Grafana Tempo Prometheus


πŸ“Š By The Numbers

🎯 6+ Production-Ready Applications Built
⚑ Sub-1ms API Response Times Achieved
πŸ–₯️ 11+ AWS EC2 Instances Automated
πŸ“¦ 1K+ Concurrent Users Supported
πŸ”„ Container Orchestration Systems Designed
πŸ§ͺ 500+ DSA Problems Solved
πŸ“– 200K+ Technical Blog Readers
πŸŽ₯ 40+ Educational Videos Created

🌟 Featured Projects (Ranked by Complexity)

High Complexity - Polyglot Microservices + Production Frontend + Full Observability

Production-grade URL shortener with sub-1ms Go redirect service, React 18 frontend with two-layer navigation, complete OpenTelemetry observability stack, and cloud-native deployment on Render + Cloudflare Pages

  • Architected polyglot microservices with Python FastAPI for create_service, high-performance Go (Chi) for redirect_service, and Celery worker_service β€” each independently scalable via docker-compose-decoupled.yml.
  • Achieved sub-1ms Go redirect latency (vs 5-7ms Python) with clean internal/ package architecture, three-state circuit breaker (Closed β†’ Open β†’ Half-Open), and single-binary deployment.
  • Built the "Snipl" frontend as a separate link-love repo consumed as a git submodule β€” Vite + React 18 + TypeScript, deployed to Cloudflare Pages, served as an installable PWA with offline-capable service worker and web manifest.
  • Designed a two-layer navigation model for in-page UX: React Router 6 owns shareable URLs (/r/:shortKey, /_health, 404) while a ViewContext (activeView + footerPanel) drives glide transitions between Index, Dashboard, MultiLink, and Custom Links β€” no URL change, no flash of empty state, no back-button mismatch.
  • Built direction-aware GlideView with Framer Motion (ease-out-quint [0.22, 1, 0.36, 1], ~60px X offset) and a modal-style FooterPanelView for legal/feedback content with overscroll-contain scroll isolation, aria-modal focus management, and ESC-to-close.
  • Established a lock-step Layout invariant (h-screen overflow-hidden + fixed Navbar/Footer chrome) so manual scroll stays trapped inside the active view β€” required for panel scroll isolation to work.
  • Respected prefers-reduced-motion throughout with a media-query-aware scrollToId helper and per-call behavior fallbacks; all animations have a no-motion code path.
  • Implemented Redis sliding window rate limiter for Python FastAPI services using Lua script atomic operations, dual-layer architecture (Nginx 30r/m + App 10r/m), and IP+UA hash client identification with dedicated Redis DB isolation.
  • Implemented cache-aside pattern with Redis (30-minute TTL) + MongoDB fallback, optimizing for 95%+ cache hit rate and automatic expiration handling.
  • Deployed complete observability stack with OpenTelemetry collector, Tempo (distributed tracing), Loki (log aggregation via Promtail), and Grafana dashboards for end-to-end service visibility.
  • Engineered production-grade resilience with PgBouncer connection pooling (53% reduction in overhead), atomic PostgreSQL key acquisition using SELECT FOR UPDATE SKIP LOCKED, and exponential backoff retries.
  • Implemented intelligent key pre-population using Celery workers maintaining pool of unused keys for instant URL creation without database latency, with hybrid strategy auto-selecting optimal insertion method.
  • Built comprehensive testing infrastructure with multi-database mocking (SQLite, mongomock, fakeredis), async pytest framework, httpx API client testing, and isolated test environments.
  • Deployed to Render (backend) + Cloudflare Pages (frontend) with branch-based preview environments, env-var driven BASE_URL/CORS configuration, and zero-touch CI/CD via GitHub Actions.

Technical Deep Dive: Read my Medium articles

Tech Stack: Go Chi Router FastAPI Celery Redis PostgreSQL MongoDB Nginx Docker React 18 Vite TypeScript Framer Motion TanStack Query Zod shadcn/ui Tailwind PWA Cloudflare Pages Render Pulumi Ansible OpenTelemetry Tempo Loki Grafana Promtail PgBouncer Circuit Breaker pytest vitest httpx GitHub Actions

Key Learnings:

  • Polyglot microservices: Go for performance-critical paths, Python for business logic
  • Clean architecture with internal/ package structure in Go
  • Circuit breaker pattern for fault tolerance in distributed systems
  • Two-layer navigation: URL routes for shareable destinations, context state for in-page transitions
  • Scroll-isolation invariants: h-screen overflow-hidden root + panel overscroll-contain
  • a11y in motion: aria-modal, focus management, ESC handling, prefers-reduced-motion
  • End-to-end observability with OpenTelemetry + Tempo + Loki
  • Multi-database testing strategies with mocking frameworks
  • Submodule workflow: pointer-based dependency between backend (this repo) and frontend (link-love)

Most Complex Infrastructure Project - ML-Enhanced Event-Driven Architecture

Production-grade autoscaling system for K3s clusters with 4-layer intelligent scaling architecture, ML-based predictive scaling, and multi-AZ high availability

  • Architected 4-layer autoscaling system: (1) Data Collection for ML training, (2) Time-Aware Scaling with peak/off-peak thresholds (85%/60% vs 60%/40%), (3) Flash Sale Detection with emergency response to CPU spikes >30% in 2 minutes, (4) Predictive Scaling using Prophet models forecasting CPU 15 minutes ahead.

Version Milestones:

  • v1.3 β€” Fast Worker Bootstrap with Pre-Baked AMI (91sβ†’30s bootstrap, auto-detect network iface, k3s-agent-binary role, Lambda SSM-only AMI)
  • v1.2 β€” ML Training Pipeline + Predictive Scaling (Layer 4): Prophet model, Kubernetes CronJob for automated weekly retraining, feature engineering, cross-validation
  • v1.1 β€” Layered Autoscaling Architecture: Time-Aware Scaling, Flash Sale Detection, Permanent Worker Protection, fixed CloudWatch LogGroups
  • v1.0 β€” Event-Driven Lambda Architecture with DynamoDB state management, Multi-AZ distribution, LIFO scale-down, Spot fallback, 17 CloudWatch alarms

Details:

  • Implemented ML training pipeline with Kubernetes CronJob for automated weekly model retraining (Sunday 2 AM UTC), feature engineering (temporal cyclical encoding, lag features, rolling statistics), time-series cross-validation with MAE/RMSE metrics, and backtesting with prediction interval coverage.
  • Built event-driven Lambda architecture with four specialized functions (Decision, Scale-Up, Scale-Down, Cleanup) orchestrated through EventBridge for fault tolerance, crash recovery via Write-Ahead Log (WAL), and distributed locking with 200s timeout.
  • Designed multi-AZ high availability with round-robin worker distribution across 3 availability zones (ap-southeast-1a/b/c), single NAT Gateway optimization, and LIFO scale-down maintaining natural distribution balance.
  • Implemented multi-layer idempotency including bootstrap verification, cooldown checks (scale-up: 300s, scale-down: 900s), pending instance detection, and automatic stale flag cleanup to prevent duplicate scaling operations.
  • Integrated comprehensive observability with 17 CloudWatch alarms (CRITICAL/WARNING severity), Prometheus health graceful degradation (conservative defaults when unavailable), and fixed LogGroups for stable dashboard references.
  • Engineered spot instance support with automatic On-Demand fallback when spot capacity unavailable (InsufficientInstanceCapacity, SpotInstanceCapacityNotAvailable, MaxSpotInstanceCountExceeded), graceful 2-minute node drain via SSM, and proper Kubernetes cleanup.
  • Deployed infrastructure as code with Pulumi (VPC, EC2, Lambda, DynamoDB, EventBridge, IAM) and Ansible (k3s-worker-preinstall and k3s-agent-binary roles for bare-metal worker provisioning, worker-bake.yml for Pre-Baked AMI creation)

Tech Stack: AWS Lambda EventBridge DynamoDB EC2 K3s Prometheus CloudWatch Prophet Kubernetes CronJob SSM Secrets Manager S3 Python 3.11 Pulumi Ansible kubectl Node Exporter Pre-Baked AMI

Key Learnings:

  • Layered autoscaling architecture combining reactive (time-aware, flash sale) and proactive (ML predictive) scaling
  • Event-driven architecture patterns with Lambda chaining via EventBridge and WAL-based crash recovery
  • Distributed systems state management with DynamoDB conditional writes (optimistic locking) and 200s distributed locks
  • ML pipeline deployment with automated retraining via Kubernetes CronJob, model versioning in S3
  • Multi-AZ infrastructure design with cost optimization (single NAT, round-robin scale-up, LIFO scale-down)
  • Kubernetes cluster operations including node lifecycle, pod draining via SSM (120s timeout), and CronJob scheduling
  • Pre-baked AMI patterns for fast worker bootstrap (91sβ†’30s), Ansible roles for bare-metal provisioning

High Complexity - Media Processing Pipeline

Full-Stack advanced video streaming solution with adaptive bitrate technology

  • Engineered a secure and scalable video platform with a Django REST API and a React/TypeScript frontend, architected for high-performance adaptive streaming.
  • Implemented a robust security model, using dj-rest-auth for token-based authentication and a protected media workflow (via Nginx X-Accel-Redirect) to ensure only authorized users can access streaming content.
  • Built an asynchronous video processing pipeline using Celery, Redis, and FFMPEG to transcode videos for DASH playback, ensuring a smooth, low-latency user experience.
  • Automated the entire cloud workflow, from provisioning AWS S3 infrastructure with Pulumi and configuring servers with Ansible, to deploying the Docker-containerized application via GitHub Actions.

Tech Stack: Django React Celery Redis PostgreSQL FFMPEG DASH AWS S3 Nginx Docker Pulumi Ansible


Medium-High Complexity - Worker Orchestration

Scalable job processing system with advanced features

  • Developed a distributed job queue system using FastAPI and Redis to manage asynchronous tasks with priority-based queuing and automatic worker scaling.
  • Implemented a real-time monitoring dashboard with Jinja2 templates to provide visibility into job status, queue metrics, and worker activity.
  • Engineered an automatic worker scaling mechanism based on job load and worker availability, using Docker Swarm to dynamically adjust resources.
  • Created a comprehensive error handling and fault tolerance system, including automatic retries for failed jobs and a dead-letter queue for unrecoverable tasks.
  • Designed a job dependency feature to ensure complex workflows are executed in the correct order, improving system reliability.
  • Containerized all services (API, Worker, Monitor) using Docker for consistent deployment and simplified management.

Tech Stack: FastAPI Redis Docker Swarm Jinja2


Medium Complexity - Full-Stack Application

Full-stack financial management application for tracking installments and payments

  • Backend: High-performance API built with FastAPI, using SQLAlchemy for ORM with a PostgreSQL database.
  • Frontend: Modern and responsive UI built with React, TypeScript, and Vite, styled with Tailwind CSS and Shadcn UI.
  • Asynchronous Tasks: Celery and Redis manage background jobs like sending OTP and due date notification emails.
  • Authentication: Secure JWT-based authentication with role-based access for customers and admins.
  • Data Management: Alembic handles database schema migrations, and TanStack Query manages server state on the frontend.
  • DevOps: Fully containerized with Docker and Docker Compose for reproducible development and deployment environments.

Tech Stack: FastAPI React TypeScript PostgreSQL SQLAlchemy Redis Celery Docker Tailwind CSS Shadcn UI Alembic


High Complexity - Async Communication

Production-ready notification microservice with service-to-service auth and distributed rate limiting

  • Modern Backend: Python + FastAPI with async API endpoints, direct RabbitMQ consumer (no Celery) with pika for simplified ops
  • Multi-Channel Delivery: Email (SendGrid), SMS (Twilio), Push (Firebase) via factory pattern
  • JWT Service Auth: Scoped tokens for service-to-service auth with 60-min expiry
  • Rate Limiting: Redis token bucket (100 req/min + burst per service) with Lua script atomic ops
  • Redis Caching: 30s TTL cache for notification lookups with automatic invalidation on status update
  • Idempotency + Retry: Worker skips already-sent notifications; 3 retry attempts (1s, 2s, 4s delays)
  • Structured JSON Logging: Production observability via extra={} dict pattern
  • Containerized: Docker Compose with PostgreSQL, RabbitMQ, Redis services
  • Comprehensive Testing: pytest suite with unit and integration tests

Tech Stack: FastAPI Python pika PostgreSQL RabbitMQ Redis SQLAlchemy Docker pytest JWT SendGrid Twilio Firebase


Medium Complexity - HA Architecture

Enterprise-grade Todo application with AWS infrastructure

  • Engineered full-stack application with FastAPI backend and React frontend
  • Implemented Infrastructure as Code using Pulumi for AWS resource management
  • Designed fault-tolerant architecture with load balancing across multiple AZs
  • Built PostgreSQL replication system with automated backup/recovery
  • Integrated Redis Sentinel for high availability caching

Tech Stack: FastAPI React AWS EC2 PostgreSQL Redis Sentinel Nginx Docker


πŸ’Ό Professional Experience

Backend Engineer & Product Builder

August 2024 - Present | Portfolio Projects

🎯 Building production systems to demonstrate platform engineering capabilities while actively seeking full-time opportunities

  • Architected and deployed 5 production-grade applications serving 5,000+ real users across e-commerce, fintech, and SaaS domains
  • Managed 11+ EC2 instances with 99.9% uptime through multi-AZ AWS infrastructure with automated deployment
  • Built ElastiKube: ML-enhanced Kubernetes autoscaler achieving 60% cost reduction with 4-layer intelligent scaling (time-aware, flash sale detection, Prophet forecasting)
  • Engineered polyglot URL shortener with Go redirect service achieving sub-1ms latency and comprehensive observability (OpenTelemetry, Tempo, Loki, Grafana)
  • Automated infrastructure deployment with Pulumi & Ansible, reducing deployment time 93.75% (4 hours β†’ 15 minutes)

Tech Stack: Python, Go, FastAPI, Django, AWS, Kubernetes, Docker, PostgreSQL, Redis, MongoDB, Pulumi, Ansible, OpenTelemetry, Grafana


Backend Developer @ Cooking Station

June 2024 - August 2024 | Dhaka, Bangladesh

🎯 Delivered measurable business impact:

  • Designed role-based admin dashboard for 200+ users with real-time meal analytics
  • Automated 40% of manual effort in account management through intelligent workflows
  • Built production-ready meal scheduling system using cron jobs with configurable time boundaries

Tech Stack: Python, Django, PostgreSQL, Docker, JavaScript, HTML/CSS


πŸŽ“ Education

Bachelor of Science in Computer Science & Engineering
Daffodil International University | September 2017 - December 2022


πŸ“ Technical Writing & Community Impact

πŸ“– Published Articles

🌍 Community Contributions

  • 200,000+ readers on Quora with tech insights in Bengali
  • Nearly 200 followers engaging with technology content
  • 40+ instructional videos on YouTube bridging Bengali tech education gap

🧠 Problem Solving & Competitive Programming

  • 500+ Problems Solved across multiple platforms
  • Active on: BeeCrowd, LightOJ, HackerRank, LeetCode
  • Contest Achievements:
    • DIU Take-Off Programming Contest (Ranked 6th out of 300 participants)
    • Multiple university-level programming contest participations

πŸ“ˆ Coding Profiles

BeeCrowd HackerRank LeetCode

πŸ“Š GitHub Stats

Top Languages

GitHub Stats

GitHub Streak

🌱 Currently Learning

  • 🐳 Kubernetes - Container orchestration at scale

🀝 Let's Collaborate!

I'm actively seeking opportunities to work on:

  • πŸ—οΈ Distributed systems requiring high availability and fault tolerance
  • ☁️ Cloud-native applications with automated infrastructure
  • πŸ”„ Microservices architectures with proper observability
  • πŸ“š Open-source projects where I can contribute infrastructure expertise

πŸ“« Get In Touch

Looking for a backend engineer who can:

  • βœ… Design scalable distributed systems
  • βœ… Automate infrastructure from scratch
  • βœ… Write clean, testable, maintainable code
  • βœ… Document complex architectures clearly

Let's build something amazing together!


"Building robust systems that scale, one commit at a time" πŸš€

Profile Views

⭐ If you find my projects useful, consider starring them!

Pinned Loading

  1. streambuddy streambuddy Public

    Python

  2. todo_application todo_application Public

    TypeScript

  3. elevator_system elevator_system Public

    Python

  4. installment_manager installment_manager Public

    TypeScript

  5. socketio_chat_application socketio_chat_application Public

    Python

  6. job-queue-system-2.0 job-queue-system-2.0 Public

    Python