vLLM cannot connect to existing Ray cluster #17512

as-bain · 2025-05-01T00:55:10Z

as-bain
May 1, 2025

I've been attempting to connect a vLLM engine (as part of KubeAI) to a Ray Cluster (deployed by Kuberay) and have not had much success. For some reason it is unable to generate the file node_ip_address.json. I can confirm that if I run ray status in the vLLM engine pod I see exactly the same output as I can see in the Ray cluster head pod, so vLLM is able to communicate with ray. These are the logs from vLLM.

2025-04-30 17:31:15,749	INFO worker.py:1514 -- Using address ray-cluster-kuberay-head-svc.kuberay.svc.cluster.local:6379 set in the environment variable RAY_ADDRESS
2025-04-30 17:31:15,749	INFO worker.py:1654 -- Connecting to existing Ray cluster at address: ray-cluster-kuberay-head-svc.kuberay.svc.cluster.local:6379...
2025-04-30 17:31:16,766	INFO node.py:1084 -- Can't find a `node_ip_address.json` file from /tmp/ray/session_2025-04-29_22-14-32_731655_1. Have you started Ray instance using `ray start` or `ray.init`?
2025-04-30 17:31:26,771	INFO node.py:1084 -- Can't find a `node_ip_address.json` file from /tmp/ray/session_2025-04-29_22-14-32_731655_1. Have you started Ray instance using `ray start` or `ray.init`?

Executing a health check from the vLLM engine pod returns an exit code of 0, which means the ray cluster health is allegedly ok.

ray health-check --address ray-cluster-kuberay-head-svc.kuberay.svc.cluster.local:6379

Has anyone seen the same behaviour before but successfully connected vLLM to an external ray cluster?

Engine Config:

  args:
  - --dtype=bfloat16
  - --tensor-parallel-size=2
  - --pipeline-parallel-size=2
  - --no-enable-prefix-caching
  - --gpu-memory-utilization=0.95
  - --distributed-executor-backend=ray
  - --max-model-len=65536
  engine: VLLM
  env:
    RAY_ADDRESS: ray-cluster-kuberay-head-svc.kuberay.svc.cluster.local:6379

Versions:

vLLM - 0.8.5, 0.8.2
Ray - 2.43.0-py312

Platform:

AKS (v1.30.9)

Stack Trace:

Traceback (most recent call last):
  File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 400, in run_engine_core
    raise e
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 387, in run_engine_core
    engine_core = EngineCoreProc(*args, **kwargs)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 329, in __init__
    super().__init__(vllm_config, executor_class, log_stats,
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 64, in __init__
    self.model_executor = executor_class(vllm_config)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 286, in __init__
    super().__init__(*args, **kwargs)
  File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 52, in __init__
    self._init_executor()
  File "/usr/local/lib/python3.12/dist-packages/vllm/executor/ray_distributed_executor.py", line 105, in _init_executor
    initialize_ray_cluster(self.parallel_config)
  File "/usr/local/lib/python3.12/dist-packages/vllm/executor/ray_utils.py", line 299, in initialize_ray_cluster
    ray.init(address=ray_address)
  File "/usr/local/lib/python3.12/dist-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/ray/_private/worker.py", line 1797, in init
    _global_node = ray._private.node.Node(
                   ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/ray/_private/node.py", line 204, in __init__
    node_ip_address = self._wait_and_get_for_node_address()
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/ray/_private/node.py", line 1091, in _wait_and_get_for_node_address
    raise ValueError(
INFO 04-30 18:19:21 [ray_distributed_executor.py:127] Shutting down Ray distributed executor. If you see error log from logging.cc regarding SIGTERM received, please ignore because this is the expected termination process in Ray.
ValueError: Can't find a `node_ip_address.json` file from /tmp/ray/session_2025-04-29_22-14-32_731655_1. for 60 seconds. A ray instance hasn't started. Did you do `ray start` or `ray.init` on this host?
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1130, in <module>
    uvloop.run(run_server(args))
  File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 109, in run
    return __asyncio.run(
           ^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
  File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 61, in wrapper
    return await main
           ^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1078, in run_server
    async with build_async_engine_client(args) as engine_client:
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
    return await anext(self.gen)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 146, in build_async_engine_client
    async with build_async_engine_client_from_engine_args(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
    return await anext(self.gen)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 178, in build_async_engine_client_from_engine_args
    async_llm = AsyncLLM.from_vllm_config(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 150, in from_vllm_config
    return cls(
           ^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 118, in __init__
    self.engine_core = core_client_class(
                       ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 642, in __init__
    super().__init__(
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 398, in __init__
    self._wait_for_engine_startup()
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 430, in _wait_for_engine_startup
    raise RuntimeError("Engine core initialization failed. "
RuntimeError: Engine core initialization failed. See root cause above.

onestardao · 2025-08-18T11:46:40Z

onestardao
Aug 18, 2025

This is a textbook distributed connection/healthcheck failure—exactly the sort of cross-process bug that keeps popping up in vLLM + Ray setups. (In our issue map it’s classified under “distributed infra: stale connection pool / cluster node health desync”.)

Most of the time, your Ray cluster may pass the built-in healthcheck but still fail when vLLM tries to schedule or allocate resources—due to socket state, firewall, or subtle config drift between nodes.

Quick things to check:

Ensure all Ray and vLLM nodes have matching environment, compatible Ray versions, and no leftover zombie processes.
Check your cluster’s firewall/NAT settings; sometimes internal IPs don’t propagate right between vLLM pods and Ray’s head node.
Clean up all Ray nodes with ray stop --force and restart.
If you’re using Docker, check network mode and port forwarding.
There are known issues with Ray 2.8.x and vLLM on certain Ubuntu builds; downgrading Ray has solved similar issues for some teams.

If you want the step-by-step diagnosis checklist or a full breakdown of these connection issues, let me know and I’ll share the reference.

0 replies

xXMrNidaXx · 2026-02-23T13:42:09Z

xXMrNidaXx
Feb 23, 2026

Ray cluster connectivity issues with vLLM are tricky — here's a debugging checklist:

1. Check Ray cluster is actually running

ray status
# Should show your nodes

2. Verify head node address

export RAY_ADDRESS="ray://<head-node-ip>:10001"
# or
ray.init(address="ray://<head-node-ip>:10001")

3. Port accessibility

# From worker, test connection to head
nc -zv <head-ip> 6379   # Redis
nc -zv <head-ip> 10001  # Client port
nc -zv <head-ip> 8265   # Dashboard

4. vLLM-specific Ray init

from vllm import LLM

# Let vLLM handle Ray
llm = LLM(
    model="...",
    tensor_parallel_size=4,
    # Don't init Ray yourself — vLLM does it
)

5. Common gotchas

Firewall blocking Ray ports
Different Ray versions on head vs workers
vLLM trying to start its own Ray instead of connecting

Fix: Explicit connection

import ray
ray.init(address="auto", ignore_reinit_error=True)
# Then start vLLM

We've deployed vLLM on Ray clusters at RevolutionAI. What's your cluster setup — same machine or distributed?

0 replies

xXMrNidaXx · 2026-02-23T15:26:11Z

xXMrNidaXx
Feb 23, 2026

This is a common issue with vLLM + external Ray clusters on Kubernetes.

Root cause:
vLLM worker pod is not registered as a Ray node, so it cannot find the node_ip_address.json file.

Fixes:

1. Run Ray worker in vLLM pod

containers:
- name: vllm
  command:
  - /bin/bash
  - -c
  - |
    ray start --address=$RAY_ADDRESS --block &
    sleep 10  # Wait for node registration
    python -m vllm.entrypoints.openai.api_server ...

2. Use Ray job submission instead

# Submit vLLM as Ray job
ray job submit --address $RAY_ADDRESS -- python -m vllm ...

3. Shared /tmp/ray volume

volumes:
- name: ray-tmp
  emptyDir: {}
volumeMount:
- name: ray-tmp
  mountPath: /tmp/ray

4. Set node IP explicitly

env:
- name: RAY_ADDRESS
  value: "ray://ray-cluster-head:10001"
- name: VLLM_HOST_IP
  valueFrom:
    fieldRef:
      fieldPath: status.podIP

5. Use KubeRay RayCluster with vLLM worker

workerGroupSpecs:
- groupName: vllm-workers
  template:
    spec:
      containers:
      - name: vllm
        image: vllm/vllm-openai:latest

We deploy vLLM on Kubernetes at Revolution AI — the Ray worker startup in the vLLM pod is the key fix.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

vLLM cannot connect to existing Ray cluster #17512

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

vLLM cannot connect to existing Ray cluster #17512

Uh oh!

Uh oh!

as-bain May 1, 2025

Replies: 3 comments

Uh oh!

onestardao Aug 18, 2025

Uh oh!

xXMrNidaXx Feb 23, 2026

Uh oh!

xXMrNidaXx Feb 23, 2026

as-bain
May 1, 2025

onestardao
Aug 18, 2025

xXMrNidaXx
Feb 23, 2026

xXMrNidaXx
Feb 23, 2026