-
Notifications
You must be signed in to change notification settings - Fork 684
Description
Describe the Bug
Hi, I integrated vLLM with the latest Dynamo, and when enabling kvbm disk offload, I started vLLM container with the following options:
--mount-workspace --use-nixl-gds
I also set these environment variables:
export DYN_KVBM_CPU_CACHE_GB=50
export DYN_KVBM_DISK_CACHE_GB=100
After launching vLLM, a file named cufile.log appeared and contained the following two lines:
06-11-2025 03:44:55:695 [pid=424 tid=488] NOTICE cufio-drv:830 running in compatible mode
06-11-2025 03:45:10:764 [pid=424 tid=488] ERROR cufio-fs:79 mount option not found in mount table data device: /dev/vda1
When I check with df -Th, I get:
/dev/vda1 ext4 697G 658G 39G 95% /tmp
I’m not sure why cuFile cannot find the mount configuration.
The 100GB disk space seems to have been allocated, but I don’t see any corresponding file under /tmp.
When I run lsof to inspect open files, it shows that the disk cache files have been deleted while still in use:
VLLM::Wor 424 root 127u REG 253,1 99998760960 8670 /tmp/dynamo-kvbm-disk-cache-d2VwbN (deleted)
VLLM::Wor 424 root 128u REG 253,1 99998760960 8670 /tmp/dynamo-kvbm-disk-cache-d2VwbN (deleted)
Could you please help explain:
Why cuFile reports mount option not found in mount table for /dev/vda1?
Why the disk cache file is deleted while still being used (shown as (deleted) in lsof)?
Thanks a lot for your help!
Steps to Reproduce
- launch the container with run.sh
- launch the vllm with disk offload in container
Expected Behavior
disk offload is ok
Actual Behavior
cufile error out
Environment
ai-dynamo 0.6.0
vllm 0.11.0
Additional Context
No response
Screenshots
No response