Skip to content

Dynamo Release v0.2.0

Choose a tag to compare

@nv-anants nv-anants released this 01 May 00:33
ca728f6

Dynamo is an open source project with Apache 2 license. The primary distribution is done via pip wheels with minimal binary size. The ai-dynamo github org hosts 2 repos: dynamo and NIXL. Dynamo is designed as the ideal next generation inference server, building upon the foundations of the Triton Inference Server. While Triton focuses on single-node inference deployments, we are committed to integrating its robust single-node capabilities into Dynamo within the next several months. We will maintain ongoing support for Triton while ensuring a seamless migration path for existing users to Dynamo once feature parity is achieved. As a vendor-agnostic serving framework, Dynamo supports multiple LLM inference engines including TRT-LLM, vLLM, and SGLang, with varying degrees of maturity and support.

Dynamo v0.2.0 features:

  • GB200 support with ARM builds (Note: currently requires a container build)
  • Planner - new experimental support for spinning workers up and down based on load
  • Improved K8s deployment workflow
    • Installation wizard to enable easy configuration of Dynamo on your Kubernetes cluster
    • CLI to manage your operator-based deployments
    • Consolidate Custom Resources for Dynamo Deployments
    • Documentation improvements (including Minikube guide to installing Dynamo Platform)

Future plans

Dynamo Roadmap

Known Issues

  • Benchmark guides are still being validated on public cloud instances (GCP / AWS)
  • Benchmarks on internal clusters show a 15% degradation from results displayed in summary graphs for multi-node 70B and are being investigated.
  • TensorRT-LLM examples are not working currently in this release - but are being fixed in main.

What's Changed

New Contributors

Full Changelog: v0.1.1...v0.2.0