CVE-2026-44223
MEDIUMDescription
vLLM is an inference and serving engine for large language models (LLMs). From 0.18.0 to before 0.20.0, the extract_hidden_states speculative decoding proposer in vLLM returns a tensor with an incorrect shape after the first decode step, causing a RuntimeError that crashes the EngineCore process. The crash is triggered when any request in the batch uses sampling penalty parameters (repetition_penalty, frequency_penalty, or presence_penalty). A single request with a penalty parameter (e.g., "repetition_penalty": 1.1) is sufficient to crash the server. This vulnerability is fixed in 0.20.0.
How to fix
Remediation is compiled from vendor and distribution security advisories. Always confirm against the linked source for your exact version and platform.
CVSS v3 Vector
Exploitability
Impact
CVSS:3.1/AV:N/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H
Exploit Intelligence
Low risk: more likely to be exploited than 29% of all known CVEs.
References
Embed a live status badge for CVE-2026-44223
Markdown
[](https://tridentstack.com/cve/CVE-2026-44223)HTML
<a href="https://tridentstack.com/cve/CVE-2026-44223"><img src="https://tridentstack.com/cve/badge/CVE-2026-44223.svg" alt="CVE-2026-44223"></a>Find and fix vulnerabilities across your fleet
TridentStack Control continuously scans your Windows, macOS, and Linux fleet for known vulnerabilities, prioritizes them by severity and active exploitation, and patches them automatically.
Start freeThis product uses NVD data but is not endorsed or certified by the NVD. EPSS scores courtesy of FIRST.org (https://www.first.org/epss). Source: CISA KEV Catalog. Data as of 2026-06-22.