Skip to content
View devloperdevesh's full-sized avatar

Block or report devloperdevesh

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
devloperdevesh/README.md

Devesh Chauhan

Backend Engineer working on high-concurrency distributed systems and AI infrastructure

Built systems handling ~850+ req/sec with sub-500ms latency under real-world load

Open to remote opportunities and global collaboration


Engineering Activity


What I Build

  • High-concurrency backend systems (500–850+ req/sec under load)
  • Distributed AI systems (RAG, LLM orchestration, vector search)
  • Event-driven architectures (Kafka, async pipelines)
  • API-first platforms for integrations and automation
  • Low-latency systems optimized using caching, batching, and routing

Flagship System — High-Concurrency AI Platform

Production-grade distributed backend system designed to handle real-world AI workloads at scale.

Scale

  • ~850 req/sec throughput (load-tested)
  • 500+ concurrent requests with stable latency
  • 100K+ documents processed

Architecture

  • Async FastAPI services (stateless, horizontally scalable)
  • Redis distributed caching
  • FAISS vector index for semantic retrieval
  • Kafka-based event pipelines
  • Multi-LLM routing layer with fallback handling

Engineering Decisions

  • Stateless architecture for horizontal scaling
  • Async pipelines for non-blocking execution
  • Cache-first design to reduce latency and cost
  • Backpressure handling for stability
  • API-first approach for extensibility

Impact

  • ~40% latency reduction
  • ~30% cost reduction
  • Stable under sustained and burst traffic

System Architecture

graph TD
    Client --> CDN
    CDN --> LB[Load Balancer]
    LB --> API[FastAPI Gateway]
    API --> Cache[Redis Cache]
    API --> Workers[Async Workers]
    Workers --> VectorDB[FAISS Index]
    VectorDB --> LLM[LLM Providers]
    LLM --> Response
Loading

Focus

  • Distributed systems and system design
  • High-throughput backend engineering
  • AI infrastructure and LLM systems
  • Performance optimization and reliability

Positioning

  • Backend Engineer (Distributed Systems)
  • AI Infrastructure
  • High-Concurrency Systems

Pinned Loading

  1. OmniChat-AI OmniChat-AI Public

    High-concurrency LLM backend with multi-model routing, async architecture, and cost optimization.

    CSS

  2. EnterpriseRAG-AI EnterpriseRAG-AI Public

    High-concurrency distributed RAG system with multi-tenant architecture, async APIs, and low-latency retrieval.

    TypeScript