Skip to content

NVIDIA Dynamo

Dynamo is a new modular inference framework designed for serving large language models (LLMs) in multi-node distributed environments. It enables seamless scaling of inference workloads across GPU nodes and the dynamic allocation of GPU workers to address traffic bottlenecks at various stages of the model pipeline.

This GitHub organization hosts repositories for Dynamo's core components and integrations, including:

Core Framework

  • Distributed inference runtime with Rust-based orchestration
  • Python bindings for workflow customization
  • Multi-GPU/multi-node serving capabilities

LLM Optimized Components

  • Disaggregated Serving Engine: Decoupling of prefill and decode to optimize for throughput at latency SLOs
  • Intelligent Routing System: Prefix-based and load-aware request distribution
  • KV Cache Management: Distributed KV Cache management

NVIDIA Optimized Transfer Library (NIXL)

  • Abstracts memory of heterogeneous devices, i.e., CPU, GPU, storage, and enables most efficient and low-latency communication among them
  • Integrates with distributed inference servers such as Dynamo. This library will target distributed inference communication patterns to effectively transfer the KV cache in disaggregated LLM serving platforms.

Getting Started

To learn more about NVIDIA Dynamo Inference Serving Platform, please refer to the Dynamo developer page and read our Quickstart Guide for container setup and basic workflows.

Documentation

User documentation on Dynamo features, APIs, and architecture is located in the Dynamo documents folder on GitHub.

FAQ

Consult the Dynamo FAQ Guide for frequently asked questions and answers.

Contribution & Support

  • Follow Contribution Guidelines
  • Report issues via GitHub Discussions
  • Enterprise support available through NVIDIA AI Enterprise

License

Apache 2.0 licensed with third-party attributions documented in each repository.

Note

This project is currently in alpha stage - APIs and components may evolve based on community feedback

Pinned Loading

  1. dynamo dynamo Public

    A Datacenter Scale Distributed Inference Serving Framework

    Rust 6.6k 1k

  2. nixl nixl Public

    NVIDIA Inference Xfer Library (NIXL)

    C++ 997 300

  3. aiconfigurator aiconfigurator Public

    Offline optimization of your disaggregated Dynamo graph

    Python 273 104

  4. aiperf aiperf Public

    AIPerf is a comprehensive benchmarking tool that measures the performance of generative AI models served by your preferred inference solution.

    Python 242 62

  5. grove grove Public

    Kubernetes enhancements for Network Topology Aware Gang Scheduling & Autoscaling

    Go 196 54

  6. modelexpress modelexpress Public

    Model Express is a Rust-based component meant to be placed next to existing model inference systems to speed up their startup times and improve overall performance.

    Rust 55 21

Repositories

Showing 10 of 11 repositories
  • nixl Public

    NVIDIA Inference Xfer Library (NIXL)

    ai-dynamo/nixl’s past year of commit activity
    C++ 997 300 42 111 Updated Apr 23, 2026
  • velo Public
    ai-dynamo/velo’s past year of commit activity
    Rust 3 Apache-2.0 1 0 5 Updated Apr 23, 2026
  • dynamo Public

    A Datacenter Scale Distributed Inference Serving Framework

    ai-dynamo/dynamo’s past year of commit activity
    Rust 6,625 1,047 219 (11 issues need help) 443 Updated Apr 23, 2026
  • aiperf Public

    AIPerf is a comprehensive benchmarking tool that measures the performance of generative AI models served by your preferred inference solution.

    ai-dynamo/aiperf’s past year of commit activity
    Python 242 Apache-2.0 62 16 (1 issue needs help) 61 Updated Apr 23, 2026
  • aiconfigurator Public

    Offline optimization of your disaggregated Dynamo graph

    ai-dynamo/aiconfigurator’s past year of commit activity
    Python 273 Apache-2.0 104 21 21 Updated Apr 23, 2026
  • modelexpress Public

    Model Express is a Rust-based component meant to be placed next to existing model inference systems to speed up their startup times and improve overall performance.

    ai-dynamo/modelexpress’s past year of commit activity
    Rust 55 Apache-2.0 21 5 20 Updated Apr 23, 2026
  • grove Public

    Kubernetes enhancements for Network Topology Aware Gang Scheduling & Autoscaling

    ai-dynamo/grove’s past year of commit activity
    Go 196 Apache-2.0 54 45 (5 issues need help) 23 Updated Apr 22, 2026
  • flextensor Public

    FlexTensor is a tensor offloading and management library for PyTorch that enables running large models on limited GPU memory by intelligently offloading tensors between GPU and CPU memory.

    ai-dynamo/flextensor’s past year of commit activity
    Python 94 Apache-2.0 11 0 0 Updated Apr 19, 2026
  • enhancements Public

    Enhancement Proposals and Architecture Decisions

    ai-dynamo/enhancements’s past year of commit activity
    8 Apache-2.0 13 1 51 Updated Apr 14, 2026
  • aitune Public

    NVIDIA AITune is an inference toolkit designed for tuning and deploying Deep Learning models with a focus on NVIDIA GPUs.

    ai-dynamo/aitune’s past year of commit activity
    Python 256 Apache-2.0 29 0 0 Updated Mar 13, 2026

Top languages

Loading…

Most used topics

Loading…