Dynamo Release v0.2.0
Dynamo is an open source project with Apache 2 license. The primary distribution is done via pip wheels with minimal binary size. The ai-dynamo github org hosts 2 repos: dynamo and NIXL. Dynamo is designed as the ideal next generation inference server, building upon the foundations of the Triton Inference Server. While Triton focuses on single-node inference deployments, we are committed to integrating its robust single-node capabilities into Dynamo within the next several months. We will maintain ongoing support for Triton while ensuring a seamless migration path for existing users to Dynamo once feature parity is achieved. As a vendor-agnostic serving framework, Dynamo supports multiple LLM inference engines including TRT-LLM, vLLM, and SGLang, with varying degrees of maturity and support.
Dynamo v0.2.0 features:
- GB200 support with ARM builds (Note: currently requires a container build)
- Planner - new experimental support for spinning workers up and down based on load
- Improved K8s deployment workflow
- Installation wizard to enable easy configuration of Dynamo on your Kubernetes cluster
- CLI to manage your operator-based deployments
- Consolidate Custom Resources for Dynamo Deployments
- Documentation improvements (including Minikube guide to installing Dynamo Platform)
Future plans
Known Issues
- Benchmark guides are still being validated on public cloud instances (GCP / AWS)
- Benchmarks on internal clusters show a 15% degradation from results displayed in summary graphs for multi-node 70B and are being investigated.
- TensorRT-LLM examples are not working currently in this release - but are being fixed in main.
What's Changed
- fix: fix max_local_prefill_length not being printed out in disagg router log by @tedzhouhk in #628
- docs: Add instructions to install git lfs by @tanmayv25 in #627
- fix: add DYNAMO_HOME env var to vLLM docker image by @nv-anants in #629
- fix: Account for Metrics.decode() changes by @rmccorm4 in #619
- fix: Update test_report by @pvijayakrish in #641
- fix: serviceArgs in config was not getting set for workers by @mohammedabdulwahhab in #640
- fix: adding conversion to string for notif id comparison by @nnshah1 in #638
- docs: Add documentation for UCX KV cache transfer in TRTLLM by @tanmayv25 in #639
- build: Define UCX env var to use NVLink when available by @tanmayv25 in #631
- feat: ETCD prefix watcher + python binding + runtime reconfiguration for router and disagg router by @tedzhouhk in #581
- fix: dynamo build should work with link syntax by @mohammedabdulwahhab in #646
- fix: change trtllm kv_router default block_size to 32 by @ziqif-nv in #642
- fix: signal handlers to clean up zombie vllm processes by @ishandhanani in #545
- feat: add .devcontainer based off images in container/ by @alec-flowers in #497
- fix: devcontainer mounts and vllm c api by @alec-flowers in #663
- fix: deploy command should support passing config by @mohammedabdulwahhab in #626
- feat(dynamo-run): improve available engines list in --help by @XueSongTap in #664
- feat: add dynamoDeployment CR finalizer by @julienmancuso in #623
- fix: set correct parent_hash for each kv block when publish kv events by @ziqif-nv in #671
- docs: Use the same term for dynamo base image across code snippets and text by @hutm in #670
- docs: move deploy docs to docs/guides by @hhzhang16 in #674
- fix: frontend and http server signal handling by @alec-flowers in #677
- fix: check for resource in pipeline helm chart by @julienmancuso in #687
- fix: ensure
VLLM_LOGGING_LEVEL=xyzfollowsDYN_LOG=xyzby @ishandhanani in #692 - feat: replace dynamo server with dynamo cloud by @hhzhang16 in #696
- feat: base Dynamo docker image improvements and fixes by @hhzhang16 in #658
- fix: fix pipeline helm chart by @julienmancuso in #698
- docs: Benchmarking guide updates by @kthui in #678
- feat: bump vLLM version to v0.8.4 by @ptarasiewiczNV in #690
- chore: Replace TRD->Dynamo in llmctl help output by @rmccorm4 in #710
- fix: allow for an empty dynamo config file by @hhzhang16 in #712
- fix: cli version by @ishandhanani in #716
- docs: Remove outdated python-wheels directory reference by @rmccorm4 in #719
- fix: direct clients vs dependancies by @ishandhanani in #704
- feat: adding dynamo-tokens crate by @ryanolson in #718
- fix: bump GAP to r25.03 by @tedzhouhk in #724
- feat: make ingress configurable in operator by @julienmancuso in #717
- feat: configure logger with detail info by @tlipoca9 in #654
- feat: Add disagg skeleton example by @kylehh in #683
- fix: dynamo deploy helm chart cleanup by @mohammedabdulwahhab in #727
- docs: add dedicated minikube guide by @mohammedabdulwahhab in #735
- feat(dynamo-engine-vllm): vllm 0.8.X support by @grahamking in #728
- feat: gracefully shutdown endpoint by revoking etcd lease + python binding by @tedzhouhk in #730
- fix: Add missing deps for '--framework none' build by @rmccorm4 in #738
- chore: Remove TRT-LLM C++ engine in favor of Python one by @grahamking in #747
- docs: Support matrix post release. by @pvijayakrish in #736
- docs: add aggregated deployment guide for multi-node sized model by @GuanLuo in #713
- feat: make the model name to be the same as the HF repo name for dynamo-run by @AndyDai-nv in #749
- feat: add additional packages to log filters by @abrarshivani in #752
- chore(dynamo-run): Fix echo_core for EOS tokens by @grahamking in #759
- feat: add custom lease to worker components by @ishandhanani in #748
- chore: Add roadmap to main README.md by @harryskim in #763
- feat: MLA disaggregation support to vLLM patch by @ptarasiewiczNV in #745
- fix: Fix cancellation flow in python component graph by @pankajroark in #765
- fix: give the user ownership permissions of /opt/dynamo/venv by @hhzhang16 in #767
- docs: deployment docs improvements by @hhzhang16 in #753
- feat: add option to configure separate docker registry for pipelines docker images by @julienmancuso in #744
- chore: Update bug report to use dynamo env for collecting environment information by @nv-tusharma in #558
- docs: R1 disaggregation guide by @GuanLuo in #720
- feat: allow to CRUD dynamo pipelines by @julienmancuso in #761
- docs: Custom Backend/Worker Guide by @rmccorm4 in #608
- chore: fix arg name in example by @CormickKneey in #770
- build: add rust binaries in manylinux image by @nv-anants in #783
- feat: remove bento/yatai references by @julienmancuso in #782
- docs: add note to use release branch examples by @nv-anants in #793
- feat: Add log verbosity level flag to dynamo-run cli by @abrarshivani in #780
- feat: rename operator CRDs by @julienmancuso in #795
- feat: Add linux aarch64 support to dynamo-run build by @rmccorm4 in #802
- fix: Update TRTLLM version and fix disagg workflow by @tanmayv25 in #804
- chore: Increase sleep times from 2s -> 30s for startup logs by @rmccorm4 in #807
- feat: Warm‑up mistral.rs engine to reduce latency on subsequent requests by @abrarshivani in #796
- feat: improve dynamo deployment CLI by @hhzhang16 in #798
- feat: Add unified x86 / aarch64 (ARM) build for TRTLLM image by @rmccorm4 in #803
- feat: remove old bento images by @julienmancuso in #801
- refactor: transition CLI to use typer for UX and testing by @ishandhanani in #703
- docs: Update README.md by @alec-flowers in #821
- feat: remove proxy side car by @julienmancuso in #822
- refactor: refactor dynamo serve part-1/N by @biswapanda in #788
- chore: Publish Model Deployment Card to NATS by @grahamking in #799
- fix: remove dynamo cloud login by @mohammedabdulwahhab in #824
- fix: Change default vLLM router to round-robin by @piotrm-nvidia in #597
- build: update cudarc dependency to crate version by @nv-anants in #815
- feat: add network configuration wizard during platform install by @julienmancuso in #820
- fix: add VLLM_KV_CAPI_PATH to vllm dockerfile to make kv routing working by @ziqif-nv in #832
- chore: update vllm wheel dependency version by @nv-anants in #828
- feat: misc changes while deploying by @hhzhang16 in #831
- fix: wrong lease_id by @alec-flowers in #833
- chore: bump NIXL version and package versions by @saturley-hall in #836
- feat: local planner for 0.2.0 release by @tedzhouhk in #398
- feat: Add unified x86 / aarch64 (ARM) build for VLLM image (#839) by @rmccorm4 in #871
- refactor: change trtllm example kv routing use python bindings | deal with trtllm partial blocks | trtllm event change (#866) by @ziqif-nv in #877
- chore: bump nixl commit to 0.2.0 rc1 by @saturley-hall in #878
- fix: manylinux tag in ai-dynamo-vllm wheel (#884) by @nv-anants in #887
- fix: add fastapi depenedncy in pyproject.toml cherry-pick of #888 by @saturley-hall in #898
- chore: update support matrix by @saturley-hall in #880
- docs: cherry pick docs fixes for dynamo deploy by @mohammedabdulwahhab in #907
- fix: cherry pick fix for VLLM_KV_CAPI_PATH by @nnshah1 in #906
- docs: update pythonpath for starting planner (cherry-pick #890) by @saturley-hall in #908
New Contributors
- @XueSongTap made their first contribution in #664
- @kylehh made their first contribution in #683
- @pankajroark made their first contribution in #765
- @CormickKneey made their first contribution in #770
Full Changelog: v0.1.1...v0.2.0