Backend Engineer working on high-concurrency distributed systems and AI infrastructure
Built systems handling ~850+ req/sec with sub-500ms latency under real-world load
Open to remote opportunities and global collaboration
- High-concurrency backend systems (500–850+ req/sec under load)
- Distributed AI systems (RAG, LLM orchestration, vector search)
- Event-driven architectures (Kafka, async pipelines)
- API-first platforms for integrations and automation
- Low-latency systems optimized using caching, batching, and routing
Production-grade distributed backend system designed to handle real-world AI workloads at scale.
- ~850 req/sec throughput (load-tested)
- 500+ concurrent requests with stable latency
- 100K+ documents processed
- Async FastAPI services (stateless, horizontally scalable)
- Redis distributed caching
- FAISS vector index for semantic retrieval
- Kafka-based event pipelines
- Multi-LLM routing layer with fallback handling
- Stateless architecture for horizontal scaling
- Async pipelines for non-blocking execution
- Cache-first design to reduce latency and cost
- Backpressure handling for stability
- API-first approach for extensibility
- ~40% latency reduction
- ~30% cost reduction
- Stable under sustained and burst traffic
graph TD
Client --> CDN
CDN --> LB[Load Balancer]
LB --> API[FastAPI Gateway]
API --> Cache[Redis Cache]
API --> Workers[Async Workers]
Workers --> VectorDB[FAISS Index]
VectorDB --> LLM[LLM Providers]
LLM --> Response
- Distributed systems and system design
- High-throughput backend engineering
- AI infrastructure and LLM systems
- Performance optimization and reliability
- Backend Engineer (Distributed Systems)
- AI Infrastructure
- High-Concurrency Systems