feat(SGLang): Add DP-aware routing and dp_rank propagation across Prefill/Decode #4221

YAMY1234 · 2025-11-10T20:04:24Z

Overview:

Enable DP-aware routing across Dynamo ↔ SGLang so Prefill and Decode run on the same data_parallel_rank, improving cache locality and correctness in disaggregated mode.

Details:

Router
- StandaloneRouterHandler.best_worker_id() now uses best_worker() and returns (worker_id, dp_rank, overlap_blocks).
Registration
- Surface data_parallel_size in runtime config; log when dp_size > 1.
Handlers
- Decode: extract data_parallel_rank (preserving 0), honor router-selected dp_rank, and inject it into the disaggregated prefill request (post-serialization).
- Prefill: read data_parallel_rank from inner request and pass to engine.
- Backward-compatible with older routers that return a 2-tuple.
Init
- Prefill registers early so router can discover dp_size.
Tests
- tests/router/test_dp_rank_routing.py: validates dp_rank range and basic coverage across ranks.

Conducted sample verification on DP=4

grep 'Selected worker.*dp_rank' /workspace/files/dynamo/router.out | tail -5
2025-11-10T19:06:20.395092Z  INFO dynamo_llm::kv_router::scheduler: Selected worker: worker_id=7587890747736561486 dp_rank=3, logit: 0.188, cached blocks: 0, total blocks: 14737
2025-11-10T19:06:20.467151Z  INFO dynamo_llm::kv_router::scheduler: Selected worker: worker_id=7587890747736561486 dp_rank=2, logit: 0.188, cached blocks: 0, total blocks: 14737
2025-11-10T19:06:22.544802Z  INFO dynamo_llm::kv_router::scheduler: Selected worker: worker_id=7587890747736561486 dp_rank=2, logit: 0.188, cached blocks: 0, total blocks: 14737
2025-11-10T19:06:22.624922Z  INFO dynamo_llm::kv_router::scheduler: Selected worker: worker_id=7587890747736561486 dp_rank=2, logit: 0.188, cached blocks: 0, total blocks: 14737
2025-11-10T19:06:22.694884Z  INFO dynamo_llm::kv_router::scheduler: Selected worker: worker_id=7587890747736561486 dp_rank=3, logit: 0.188, cached blocks: 0, total blocks: 14737
grep 'Routing to prefill dp_rank' /workspace/files/dynamo/decode.out | tail -5
2025-11-10T19:06:20.395973Z  INFO decode_handler.generate: Routing to prefill dp_rank=3   
2025-11-10T19:06:20.468346Z  INFO decode_handler.generate: Routing to prefill dp_rank=2   
2025-11-10T19:06:22.545452Z  INFO decode_handler.generate: Routing to prefill dp_rank=2   
2025-11-10T19:06:22.625552Z  INFO decode_handler.generate: Routing to prefill dp_rank=2   
2025-11-10T19:06:22.695482Z  INFO decode_handler.generate: Routing to prefill dp_rank=3   
grep 'Prefill using dp_rank' /workspace/files/dynamo/prefill.out | tail -5
2025-11-10T19:06:20.397902Z  INFO prefill_handler.generate: Prefill using dp_rank=3   
2025-11-10T19:06:20.470409Z  INFO prefill_handler.generate: Prefill using dp_rank=2   
2025-11-10T19:06:22.548199Z  INFO prefill_handler.generate: Prefill using dp_rank=2   
2025-11-10T19:06:22.627056Z  INFO prefill_handler.generate: Prefill using dp_rank=2   
2025-11-10T19:06:22.697203Z  INFO prefill_handler.generate: Prefill using dp_rank=3

Where should the reviewer start?

Routing API change: components/src/dynamo/router/__main__.py
DP rank propagation:
components/src/dynamo/sglang/request_handlers/llm/decode_handler.py
components/src/dynamo/sglang/request_handlers/llm/prefill_handler.py
Registration/init wiring:
components/src/dynamo/sglang/register.py, components/src/dynamo/sglang/main.py
Tests: tests/router/test_dp_rank_routing.py

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Relates to: #3274

Summary by CodeRabbit

New Features
- Added data-parallel rank-aware routing to intelligently distribute requests across parallel instances.
- Enabled readiness tracking for prefill operations.
- Exposed data-parallel configuration settings in runtime configuration.
Improvements
- Enhanced routing and request handling to properly manage distributed data-parallel setups.
Tests
- Added test coverage for data-parallel rank routing validation and distribution.

copy-pr-bot · 2025-11-10T20:04:27Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

github-actions · 2025-11-10T20:04:32Z

👋 Hi YAMY1234! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

coderabbitai · 2025-11-10T20:10:23Z

Walkthrough

This PR implements data-parallel rank-aware routing in the Dynamo system by modifying the router to return 3-tuples with dp_rank information, registering prefill readiness with endpoints, exposing data_parallel_size in runtime configuration, and propagating dp_rank through request handlers (decode and prefill). New tests validate dp_rank coverage and distribution.

Changes

Cohort / File(s)	Summary
Router API Refactoring `components/src/dynamo/router/__main__.py`	Modified `best_worker_id` method to call `best_worker` instead; refactored docstring and removed error logging in favor of direct RuntimeError raise.
Service Registration & Configuration `components/src/dynamo/sglang/main.py`, `components/src/dynamo/sglang/register.py`	Added prefill readiness registration with input/output types; exposed `data_parallel_size` from server args in runtime config with conditional logging.
Request Handlers — DP Rank Propagation `components/src/dynamo/sglang/request_handlers/llm/decode_handler.py`, `components/src/dynamo/sglang/request_handlers/llm/prefill_handler.py`	Enhanced decode and prefill handlers to extract, validate, and propagate `data_parallel_rank` through requests; decode handler unpacks 3-tuple router responses with fallback to 2-tuple; prefill handler consolidates generate kwargs with conditional dp_rank inclusion.
DP Rank Routing Tests `tests/router/test_dp_rank_routing.py`	Added test suite for data-parallel rank-aware routing with mocker-based harness; includes coverage validation and dp_rank distribution verification.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Router response unpacking and backward compatibility: The decode handler now unpacks 3-tuples with dp_rank while maintaining fallback to 2-tuples for older routers; verify edge cases and logging behavior.
Data-parallel rank propagation chain: Ensure dp_rank flows consistently from router through decode/prefill handlers to engine calls without loss or corruption.
Request validation and disaggregated format handling: Prefill handler adds validation for disaggregated format; verify error handling and payload structure assumptions.
Readiness registration and runtime configuration: Confirm prefill readiness registration doesn't conflict with existing endpoint setup and that data_parallel_size exposure is properly initialized.
Test infrastructure and mocker setup: New tests introduce MockerProcess context and runtime utilities; verify robustness of test isolation and scope handling.

Poem

🐰 Hops through ranks with gleeful cheer,
Data-parallel routing crystal clear,
Prefill whispers, decode respands,
DP-ranks dance through handler hands,
Three-tuples bloom where two once were!

Pre-merge checks

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 76.92% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description check	✅ Passed	The description provides a complete overview, detailed breakdown of changes across Router/Registration/Handlers/Init/Tests, reviewer entry points, and related issue reference, matching the template structure well.
Title check	✅ Passed	The title clearly and specifically describes the main change: adding DP-aware routing and dp_rank propagation across Prefill/Decode handlers, which aligns with the substantial modifications across router, registration, handler, and test files.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

components/src/dynamo/sglang/request_handlers/llm/decode_handler.py (1)

205-210: Consider using explicit variable initialization instead of locals() check.

The use of 'prefill_dp_rank' in locals() to check if the variable was assigned works, but is fragile and unusual. Consider initializing prefill_dp_rank = None before the router response handling (around line 135) and then setting it conditionally.

Apply this diff to make the logic more explicit:

             if (
                 self.prefill_router_client is not None
                 and self.prefill_router_client.instance_ids()
             ):
+                prefill_dp_rank = None  # Will be set by router if DP-aware
                 token_ids = request["token_ids"]
                 stream = await self.prefill_router_client.generate(token_ids)
                 result = await anext(stream)

Then simplify lines 205-210:

-            # Use router-selected dp_rank (fallback to request-level if not provided)
-            if "prefill_dp_rank" in locals() and prefill_dp_rank is not None:
-                effective_dp_rank = prefill_dp_rank
-            elif data_parallel_rank is not None:
-                effective_dp_rank = data_parallel_rank
-            else:
-                effective_dp_rank = None
+            # Use router-selected dp_rank (fallback to request-level if not provided)
+            effective_dp_rank = prefill_dp_rank if prefill_dp_rank is not None else data_parallel_rank

tests/router/test_dp_rank_routing.py (2)

101-143: Improve test hygiene with unused variable naming and exception logging.

The test logic correctly validates dp_rank bounds and type. However, consider these minor improvements:

Use underscore prefix for intentionally unused unpacked variables

Use logging.exception instead of logging.error in exception handlers to include traceback

Apply these diffs:
             if hasattr(kv_push_router, "best_worker"):
-                worker_id, dp_rank, overlap = await kv_push_router.best_worker(
+                _worker_id, dp_rank, _overlap = await kv_push_router.best_worker(
                     [1, 2, 3, 4, 5]
                 )
         except Exception as e:
-            logger.error(f"Test failed: {e}")
+            logger.exception(f"Test failed: {e}")
             raise
146-197: Apply same test hygiene improvements as the first test.

The coverage test correctly exercises the router with varied token sequences and validates distribution across DP ranks. The threshold of 2 ranks is conservative but reasonable given that routing depends on cache overlap patterns.

Apply the same improvements as in test_router_returns_valid_dp_rank:
                 for i in range(50):
                     test_tokens = list(range(i * 7, i * 7 + 10))
-                    worker_id, dp_rank, overlap = await kv_push_router.best_worker(
+                    _worker_id, dp_rank, _overlap = await kv_push_router.best_worker(
                         test_tokens
                     )
         except Exception as e:
-            logger.error(f"Test failed: {e}")
+            logger.exception(f"Test failed: {e}")
             raise

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7802f96 and 881be2f.

📒 Files selected for processing (6)

components/src/dynamo/router/__main__.py (1 hunks)
components/src/dynamo/sglang/main.py (2 hunks)
components/src/dynamo/sglang/register.py (1 hunks)
components/src/dynamo/sglang/request_handlers/llm/decode_handler.py (4 hunks)
components/src/dynamo/sglang/request_handlers/llm/prefill_handler.py (1 hunks)
tests/router/test_dp_rank_routing.py (1 hunks)

🧰 Additional context used

🧠 Learnings (1)

📓 Common learnings

Learnt from: oandreeva-nv
Repo: ai-dynamo/dynamo PR: 2989
File: lib/llm/src/block_manager/distributed/transfer.rs:6-6
Timestamp: 2025-09-18T21:47:44.143Z
Learning: For PR ai-dynamo/dynamo#2989, the ConnectorTransferBatcher architectural issues will be addressed in a follow-up PR by removing the duplicate batching logic and integrating distributed transfers with the existing TransferBatcher + LocalTransferManager pipeline, rather than adding bounded concurrency primitives like Semaphore.

🧬 Code graph analysis (5)

components/src/dynamo/sglang/request_handlers/llm/prefill_handler.py (2)

lib/llm/src/block_manager/kv_consolidator/subscriber.rs (1)

data_parallel_rank (40-42)

components/src/dynamo/sglang/request_handlers/handler_base.py (1)

_get_input_param (70-87)

components/src/dynamo/sglang/request_handlers/llm/decode_handler.py (2)

lib/llm/src/block_manager/kv_consolidator/subscriber.rs (1)

data_parallel_rank (40-42)

components/src/dynamo/sglang/protocol.py (1)

DisaggPreprocessedRequest (63-66)

components/src/dynamo/router/__main__.py (2)

lib/bindings/python/rust/llm/kv.rs (1)

best_worker (1186-1219)

lib/bindings/python/src/dynamo/_core.pyi (1)

best_worker (1404-1426)

tests/router/test_dp_rank_routing.py (2)

lib/bindings/python/src/dynamo/_core.pyi (3)

DistributedRuntime (35-65)

KvPushRouter (1342-1520)

KvRouterConfig (1044-1046)

tests/utils/managed_process.py (1)

ManagedProcess (71-568)

components/src/dynamo/sglang/main.py (2)

components/src/dynamo/sglang/register.py (1)

register_llm_with_readiness_gate (142-182)

lib/bindings/python/src/dynamo/_core.pyi (2)

ModelInput (1023-1025)

ModelType (1027-1034)

🪛 Ruff (0.14.4)

components/src/dynamo/sglang/request_handlers/llm/prefill_handler.py

79-82: Avoid specifying long messages outside the exception class

(TRY003)

components/src/dynamo/router/__main__.py

133-133: Avoid specifying long messages outside the exception class

(TRY003)

tests/router/test_dp_rank_routing.py

28-28: Standard pseudo-random generators are not suitable for cryptographic purposes

(S311)

35-35: Do not catch blind exception: Exception

(BLE001)

127-127: Unpacked variable worker_id is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

127-127: Unpacked variable overlap is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

142-142: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

177-177: Unpacked variable worker_id is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

177-177: Unpacked variable overlap is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

196-196: Use logging.exception instead of logging.error

Replace with exception

(TRY400)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Build and Test - dynamo

🔇 Additional comments (11)

components/src/dynamo/sglang/register.py (1)

97-102: LGTM! Clean exposure of data_parallel_size for DP-aware routing.

The changes correctly extract dp_size from server_args with a sensible default of 1, assign it to runtime_config, and log when DP mode is active. This provides the necessary foundation for DP-aware routing decisions.

components/src/dynamo/sglang/main.py (1)

113-113: Minor comment cleanup.

components/src/dynamo/sglang/request_handlers/llm/decode_handler.py (3)

118-124: LGTM! Correctly preserves dp_rank=0 with explicit None check.

The explicit check for the key's presence and non-None value ensures that dp_rank=0 is not misinterpreted as falsy. The fallback to dp_rank provides backward compatibility.

135-171: Well-designed DP-rank injection with backward compatibility.

The implementation correctly:

Handles both 3-tuple (new) and 2-tuple (legacy) router responses with appropriate fallback

Logs DP routing status once to avoid log spam

Injects dp_rank into the inner request after serialization to work around Pydantic's field validation

Uses defensive type checking before dictionary access

225-242: LGTM! Consistent DP-rank propagation in aggregated mode.

The aggregated mode correctly mirrors the decode path pattern by conditionally including data_parallel_rank in generate_kwargs when present.

components/src/dynamo/router/__main__.py (1)

126-138: LGTM! Clean delegation to best_worker for DP-aware routing.

The changes correctly:

Update the method to call best_worker() which returns a 3-tuple including dp_rank

Update the docstring to reflect the new return signature

Simplify error handling with direct RuntimeError raise

The decode handler (decode_handler.py lines 137-155) already handles backward compatibility by checking tuple length, so this breaking change is properly managed at the caller level.

components/src/dynamo/sglang/request_handlers/llm/prefill_handler.py (2)

77-93: LGTM! Proper validation and DP-rank extraction for Prefill.

The changes correctly:

Validate the disaggregated request format with clear error messages

Extract inner_request and sampling_params

Preserve dp_rank=0 with explicit None checking, consistent with decode_handler.py

95-111: LGTM! Consistent DP-rank propagation to SGLang engine.

The generate_kwargs construction mirrors the decode handler pattern and correctly includes data_parallel_rank when present, with appropriate logging for DP-aware generation.

tests/router/test_dp_rank_routing.py (3)

26-28: LGTM! Simple random suffix generator for test isolation.

Note: Static analysis flagged S311 (non-cryptographic random), but this is acceptable for test namespace isolation where cryptographic security is not required.

31-43: LGTM! Robust runtime acquisition with fallback handling.

The broad exception handling (flagged by static analysis) is appropriate here for test utilities that need to handle various runtime scenarios gracefully.

46-98: LGTM! Well-structured mocker process wrapper for DP testing.

The MockerProcess class correctly:

Configures the mocker with data-parallel-size for DP support

Uses appropriate timeout and logging settings

Implements context manager protocol for clean resource management

coderabbitai · 2025-11-10T20:10:26Z

components/src/dynamo/sglang/main.py

+    # Register Prefill to expose dp_size to Router
+    await register_llm_with_readiness_gate(
+        engine,
+        generate_endpoint,
+        server_args,
+        dynamo_args,
+        input_type=ModelInput.Tokens,
+        output_type=ModelType.Chat | ModelType.Completions,
+        readiness_gate=None,
+    )


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify registration failure handling in Prefill initialization.

The early registration exposes dp_size to the Router, which is essential for DP-aware routing. However, since this registration occurs before endpoint serving starts (line 174), any registration failure will prevent the Prefill worker from becoming available.

Run the following script to check if there are any error recovery mechanisms or if registration failures should be handled differently:

🏁 Script executed:

#!/bin/bash # Check error handling patterns in other init functions rg -A 5 "register_llm_with_readiness_gate" components/src/dynamo/sglang/main.py

Length of output: 1044

I need to gather more context to properly verify the concern. Let me check the surrounding code and error handling patterns.

--- 🏁 Script executed: ```shell #!/bin/bash # Get context around lines 163-172 to see if there's error handling sed -n '150,185p' components/src/dynamo/sglang/main.py

Length of output: 1075

🏁 Script executed:

#!/bin/bash # Check the implementation of register_llm_with_readiness_gate fd -t f "register.py" components/src/dynamo/sglang/ --exec cat {}

Length of output: 7565

🏁 Script executed:

#!/bin/bash # Look for try-catch or exception handling around registration calls rg -B 10 -A 10 "await register_llm_with_readiness_gate" components/src/dynamo/sglang/main.py | head -40

Length of output: 675

I need to check the outer context to see if there's higher-level error handling that catches exceptions from this registration call.

--- 🏁 Script executed: ```shell #!/bin/bash # Find the function containing the registration call (lines 163-172) rg -B 30 "await register_llm_with_readiness_gate" components/src/dynamo/sglang/main.py | grep -E "^(async )?def |--" | head -20

Length of output: 68

🏁 Script executed:

#!/bin/bash # Get full context - check if there's try-catch wrapping this function rg "def.*register_prefill" components/src/dynamo/sglang/main.py -A 200 | grep -E "(try:|except|raise RuntimeError)" | head -20

Length of output: 0

🏁 Script executed:

#!/bin/bash # Check how other registration calls handle failures (look at all call sites) rg "register_llm_with_readiness_gate" components/src/dynamo/sglang/main.py -B 2 -A 8 | grep -E "(try:|except|readiness_gate)"

Length of output: 437

--- 🏁 Script executed: ```shell #!/bin/bash # Get the complete init_prefill function sed -n '130,200p' components/src/dynamo/sglang/main.py

Length of output: 2169

🏁 Script executed:

#!/bin/bash # Check if there's a wrapper around init_prefill that handles exceptions rg "init_prefill\(" components/src/dynamo/sglang/main.py -B 5 -A 5

Length of output: 732

🏁 Script executed:

#!/bin/bash # Get all worker init functions to compare error handling patterns rg "^async def init_" components/src/dynamo/sglang/main.py

Length of output: 473

--- 🏁 Script executed: ```shell #!/bin/bash # Compare registration error handling across all init functions for func in init_embedding init_multimodal_encode_worker init_multimodal_prefill_worker; do echo "=== $func ===" rg "$func" components/src/dynamo/sglang/main.py -A 100 | grep -B 5 -A 5 "register_llm_with_readiness_gate" done

Length of output: 2138

🏁 Script executed:

#!/bin/bash # Check if there's a top-level try-catch around all init calls rg "if config.serving_mode" components/src/dynamo/sglang/main.py -B 10 -A 10

Length of output: 2422

🏁 Script executed:

#!/bin/bash # Check how the main entry point handles init function exceptions rg "async def main" components/src/dynamo/sglang/main.py -A 30

Length of output: 0

--- 🏁 Script executed: ```shell #!/bin/bash # Get full init_embedding to see the pattern with tasks sed -n '200,280p' components/src/dynamo/sglang/main.py

Length of output: 2717

🏁 Script executed:

#!/bin/bash # Search for the main entry point or where these init functions are called rg "def main|if __name__" components/src/dynamo/sglang/main.py -A 20

Length of output: 76

🏁 Script executed:

#!/bin/bash # Check if there's error handling around the caller (look for try/except at module level) rg "(asyncio.run|try:|except)" components/src/dynamo/sglang/main.py | head -20

Length of output: 407

Add error handling around Prefill registration to match other worker patterns.

The verification confirms that init_prefill lacks error handling for registration failures, while init_embedding and other workers properly wrap registration in try-catch blocks within asyncio.gather(). Registration failures will cause an uncaught RuntimeError that crashes the entire worker initialization.

Add try-catch around the registration call (lines 163-172) or refactor to use the concurrent pattern with asyncio.gather() and readiness_gate like init_embedding does (lines 220-237), allowing requests to queue until registration completes.

🤖 Prompt for AI Agents

In components/src/dynamo/sglang/main.py around lines 163 to 172, the Prefill registration call is unprotected and can raise an uncaught RuntimeError; wrap the await register_llm_with_readiness_gate(...) in a try/except that logs the error and sets/uses a readiness_gate (or refactor to start registration inside an asyncio.gather() task like init_embedding does) so failures do not crash initialization and requests can queue until registration completes; ensure the except block logs the exception and marks readiness_gate as failed/completed consistent with other workers.

PeaBrane · 2025-11-10T20:29:34Z

Hi @YAMY1234 thanks for the contribution. Could you please check the following

The code rabbit comment
The DCO CI failure (you may need to rebase and sign your commits)
Some e2e checks verifying the dp routing is working correctly. Say send a request over and over to the router, and verify that it would always route to the same worker dp rank, due to prefix caching. No need to update the actual sglang tests for now, as we are in the process of refactoring the tests

tests/router/test_dp_rank_routing.py

PeaBrane · 2025-11-10T22:55:42Z

components/src/dynamo/sglang/main.py

    health_check_payload = SglangPrefillHealthCheckPayload(engine).to_dict()

+    # Register Prefill to expose dp_size to Router
+    await register_llm_with_readiness_gate(


We should probably not register the prefill workers as llms for now, as there may be unintended consequences. For now, let's assume (and comment) that dp rank routing is not supporting for disagg. (WIP to enable integration of sglang with our new disagg frontend)

I just realized the scope of this PR is to enable dp rank propagation for prefill, so please ignore my above comment. Can you briefly motivate why we would want decode and prefill to run on the same dp_rank?

Thanks for the question!

In this PR currently, prefill and decode to run on the same dp_rank because SGLang shards the KV-cache and other DP-scoped state per rank. Keeping the request on the same rank: Matches SGLang’s PD design and recent fixes. After sglang #10169, the decode side explicitly consumes the prefill’s dp_rank (prefill_dp_rank) and targets that rank on the decode stage; this is the path SGLang supports for PD correctness.

So this does not “force” a specific DP size; it just propagates and honors the rank that the router picked for prefill so decode lands on the same rank—mirroring SGLang’s behavior.

Yeah but I think this is a really good point to think whether this design makes sense and can truly benefit in Dynamo...
In Dynamo the prefill and decode workers are not necessarily on the same GPU. Same dp_rank doesn’t imply same device in Dynamo. If prefill and decode are on different GPUs/nodes, we still minimize shuffle cardinality by sending the request to the decode worker that “owns” shard k. When co-location is possible (e.g., same worker/device), this naturally degrades to zero cross-rank/device copy; otherwise the transport performs a single well-defined transfer to the right shard instead of an arbitrary one.

components/src/dynamo/sglang/request_handlers/llm/decode_handler.py

YAMY1234 added 3 commits November 10, 2025 01:16

dp rank routing code

2d4b3b5

format fix

51bd5d7

optimize use of dp_rank/data_parallel_rank

881be2f

YAMY1234 requested review from a team as code owners November 10, 2025 20:04

pull-request-size bot added the size/L label Nov 10, 2025

github-actions bot added the external-contribution Pull request is from an external contributor label Nov 10, 2025

coderabbitai bot reviewed Nov 10, 2025

View reviewed changes

YAMY1234 changed the title ~~Feat(SGLang): Add DP-aware routing and dp_rank propagation across Prefill/Decode~~ feat(SGLang): Add DP-aware routing and dp_rank propagation across Prefill/Decode Nov 10, 2025

ishandhanani requested review from PeaBrane and ishandhanani November 10, 2025 20:13

github-actions bot added the feat label Nov 10, 2025

PeaBrane reviewed Nov 10, 2025

View reviewed changes

tests/router/test_dp_rank_routing.py Outdated Show resolved Hide resolved

PeaBrane requested a review from alec-flowers November 10, 2025 22:49

PeaBrane reviewed Nov 10, 2025

View reviewed changes

components/src/dynamo/sglang/request_handlers/llm/decode_handler.py Outdated Show resolved Hide resolved

YAMY1234 added 2 commits November 10, 2025 21:30

code optimize and add e2e test

424836a

remove redundant logging

0624354

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(SGLang): Add DP-aware routing and dp_rank propagation across Prefill/Decode #4221

feat(SGLang): Add DP-aware routing and dp_rank propagation across Prefill/Decode #4221

Uh oh!

YAMY1234 commented Nov 10, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Nov 10, 2025

Uh oh!

github-actions bot commented Nov 10, 2025

Uh oh!

coderabbitai bot commented Nov 10, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Nov 10, 2025

Uh oh!

PeaBrane commented Nov 10, 2025 •

edited

Loading

Uh oh!

Uh oh!

PeaBrane Nov 10, 2025 •

edited

Loading

Uh oh!

PeaBrane Nov 10, 2025

Uh oh!

YAMY1234 Nov 11, 2025

Uh oh!

YAMY1234 Nov 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat(SGLang): Add DP-aware routing and dp_rank propagation across Prefill/Decode #4221

Are you sure you want to change the base?

feat(SGLang): Add DP-aware routing and dp_rank propagation across Prefill/Decode #4221

Uh oh!

Conversation

YAMY1234 commented Nov 10, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Nov 10, 2025

Uh oh!

github-actions bot commented Nov 10, 2025

Uh oh!

coderabbitai bot commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

PeaBrane commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

PeaBrane Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PeaBrane Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

YAMY1234 Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

YAMY1234 Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

YAMY1234 commented Nov 10, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 10, 2025 •

edited

Loading

PeaBrane commented Nov 10, 2025 •

edited

Loading

PeaBrane Nov 10, 2025 •

edited

Loading