CVE & CISA-KEV Catalog

CVE-2026-53923

HIGH
7.5
CVSS v3
NVD

Description

vLLM is an inference and serving engine for large language models (LLMs). From 0.5.5 until 0.23.1rc0, integer truncation of tensor dimensions in vLLM's GGUF dequantize kernels (csrc/quantization/gguf/gguf_kernel.cu) causes partial tensor processing. The output tensor is allocated at full size via torch::empty (uninitialized memory), but the dequantize CUDA kernel processes only a truncated number of elements. The unfilled portion of the output tensor retains whatever was previously in GPU memory. In multi-tenant inference deployments, this residual GPU memory may contain tensor data from other users' inference requests, constituting information disclosure. This vulnerability is fixed in 0.23.1rc0.

CVSS v3 Vector

Exploitability

Attack VectorNetwork
Attack ComplexityLow
Privileges RequiredNone
User InteractionNone
ScopeUnchanged

Impact

ConfidentialityHigh
IntegrityNone
AvailabilityNone

CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:N/A:N

Exploit Intelligence

0.32%probability of exploitation in 30 days
24thpercentile

Low risk: more likely to be exploited than 24% of all known CVEs.

References

Find and fix vulnerabilities across your fleet

TridentStack Control continuously scans your Windows, macOS, and Linux fleet for known vulnerabilities, prioritizes them by severity and active exploitation, and patches them automatically.

Start free

This product uses NVD data but is not endorsed or certified by the NVD. EPSS scores courtesy of FIRST.org (https://www.first.org/epss). Source: CISA KEV Catalog. Data as of 2026-06-24.