github - ghsa-4qjh-9fv9-r85r

ghsa-4qjh-9fv9-r85r

Vulnerability from github

Published

2025-05-28 18:02

Modified

2025-06-27 21:06

Severity ?

2.6 (Low) - CVSS:3.1/AV:N/AC:H/PR:L/UI:R/S:U/C:L/I:N/A:N

Summary

Potential Timing Side-Channel Vulnerability in vLLM’s Chunk-Based Prefix Caching

Details

This issue arises from the prefix caching mechanism, which may expose the system to a timing side-channel attack.

Description

When a new prompt is processed, if the PageAttention mechanism finds a matching prefix chunk, the prefill process speeds up, which is reflected in the TTFT (Time to First Token). Our tests revealed that the timing differences caused by matching chunks are significant enough to be recognized and exploited.

For instance, if the victim has submitted a sensitive prompt or if a valuable system prompt has been cached, an attacker sharing the same backend could attempt to guess the victim's input. By measuring the TTFT based on prefix matches, the attacker could verify if their guess is correct, leading to potential leakage of private information.

Unlike token-by-token sharing mechanisms, vLLM’s chunk-based approach (PageAttention) processes tokens in larger units (chunks). In our tests, with chunk_size=2, the timing differences became noticeable enough to allow attackers to infer whether portions of their input match the victim's prompt at the chunk level.

Environment

GPU: NVIDIA A100 (40G)
CUDA: 11.8
PyTorch: 2.3.1
OS: Ubuntu 18.04
vLLM: v0.5.1 Configuration: We launched vLLM using the default settings and adjusted chunk_size=2 to evaluate the TTFT.

Leakage

We conducted our tests using LLaMA2-70B-GPTQ on a single device. We analyzed the timing differences when prompts shared prefixes of 2 chunks, and plotted the corresponding ROC curves. Our results suggest that timing differences can be reliably used to distinguish prefix matches, demonstrating a potential side-channel vulnerability. roc_curves_combined_block_2

Results

In our experiment, we analyzed the response time differences between cache hits and misses in vLLM's PageAttention mechanism. Using ROC curve analysis to assess the distinguishability of these timing differences, we observed the following results: - With a 1-token prefix, the ROC curve yielded an AUC value of 0.571, indicating that even with a short prefix, an attacker can reasonably distinguish between cache hits and misses based on response times. - When the prefix length increases to 8 tokens, the AUC value rises significantly to 0.99, showing that the attacker can almost perfectly identify cache hits with a longer prefix.

Fixes

https://github.com/vllm-project/vllm/pull/17045

Show details on source website

JSON

To clipboard

{
  "affected": [
    {
      "package": {
        "ecosystem": "PyPI",
        "name": "vllm"
      },
      "ranges": [
        {
          "events": [
            {
              "introduced": "0"
            },
            {
              "fixed": "0.9.0"
            }
          ],
          "type": "ECOSYSTEM"
        }
      ]
    }
  ],
  "aliases": [
    "CVE-2025-46570"
  ],
  "database_specific": {
    "cwe_ids": [
      "CWE-208"
    ],
    "github_reviewed": true,
    "github_reviewed_at": "2025-05-28T18:02:24Z",
    "nvd_published_at": "2025-05-29T17:15:21Z",
    "severity": "LOW"
  },
  "details": "This issue arises from the prefix caching mechanism, which may expose the system to a timing side-channel attack.\n\n## Description\nWhen a new prompt is processed, if the PageAttention mechanism finds a matching prefix chunk, the prefill process speeds up, which is reflected in the TTFT (Time to First Token). Our tests revealed that the timing differences caused by matching chunks are significant enough to be recognized and exploited.\n\nFor instance, if the victim has submitted a sensitive prompt or if a valuable system prompt has been cached, an attacker sharing the same backend could attempt to guess the victim\u0027s input. By measuring the TTFT based on prefix matches, the attacker could verify if their guess is correct, leading to potential leakage of private information.\n\nUnlike token-by-token sharing mechanisms, vLLM\u2019s chunk-based approach (PageAttention) processes tokens in larger units (chunks). In our tests, with chunk_size=2, the timing differences became noticeable enough to allow attackers to infer whether portions of their input match the victim\u0027s prompt at the chunk level.\n\n## Environment\n\n- GPU: NVIDIA A100 (40G)\n- CUDA: 11.8\n- PyTorch: 2.3.1\n- OS: Ubuntu 18.04\n- vLLM: v0.5.1\nConfiguration: We launched vLLM using the default settings and adjusted chunk_size=2 to evaluate the TTFT.\n\n## Leakage\nWe conducted our tests using LLaMA2-70B-GPTQ on a single device. We analyzed the timing differences when prompts shared prefixes of 2 chunks, and plotted the corresponding ROC curves. Our results suggest that timing differences can be reliably used to distinguish prefix matches, demonstrating a potential side-channel vulnerability.\n\u003cimg src=\"https://github.com/user-attachments/assets/db3491e9-02b7-424c-9b6d-56f553b39f2f\" alt=\"roc_curves_combined_block_2\" width=\"400\"/\u003e\n\n\n## Results\nIn our experiment, we analyzed the response time differences between cache hits and misses in vLLM\u0027s PageAttention mechanism. Using ROC curve analysis to assess the distinguishability of these timing differences, we observed the following results:\n- With a 1-token prefix, the ROC curve yielded an AUC value of 0.571, indicating that even with a short prefix, an attacker can reasonably distinguish between cache hits and misses based on response times.\n- When the prefix length increases to 8 tokens, the AUC value rises significantly to 0.99, showing that the attacker can almost perfectly identify cache hits with a longer prefix.\n\n## Fixes\n\n* https://github.com/vllm-project/vllm/pull/17045",
  "id": "GHSA-4qjh-9fv9-r85r",
  "modified": "2025-06-27T21:06:46Z",
  "published": "2025-05-28T18:02:24Z",
  "references": [
    {
      "type": "WEB",
      "url": "https://github.com/vllm-project/vllm/security/advisories/GHSA-4qjh-9fv9-r85r"
    },
    {
      "type": "ADVISORY",
      "url": "https://nvd.nist.gov/vuln/detail/CVE-2025-46570"
    },
    {
      "type": "WEB",
      "url": "https://github.com/vllm-project/vllm/pull/17045"
    },
    {
      "type": "WEB",
      "url": "https://github.com/vllm-project/vllm/commit/77073c77bc2006eb80ea6d5128f076f5e6c6f54f"
    },
    {
      "type": "WEB",
      "url": "https://github.com/pypa/advisory-database/tree/main/vulns/vllm/PYSEC-2025-53.yaml"
    },
    {
      "type": "PACKAGE",
      "url": "https://github.com/vllm-project/vllm"
    }
  ],
  "schema_version": "1.4.0",
  "severity": [
    {
      "score": "CVSS:3.1/AV:N/AC:H/PR:L/UI:R/S:U/C:L/I:N/A:N",
      "type": "CVSS_V3"
    }
  ],
  "summary": "Potential Timing Side-Channel Vulnerability in vLLM\u2019s Chunk-Based Prefix Caching"
}

pysec-2025-53

Vulnerability from pysec

Published

2025-05-29 17:15

Modified

2025-06-26 21:23

Details

vLLM is an inference and serving engine for large language models (LLMs). Prior to version 0.9.0, when a new prompt is processed, if the PageAttention mechanism finds a matching prefix chunk, the prefill process speeds up, which is reflected in the TTFT (Time to First Token). These timing differences caused by matching chunks are significant enough to be recognized and exploited. This issue has been patched in version 0.9.0.

Aliases

JSON

To clipboard

{
  "affected": [
    {
      "package": {
        "ecosystem": "PyPI",
        "name": "vllm",
        "purl": "pkg:pypi/vllm"
      },
      "ranges": [
        {
          "events": [
            {
              "introduced": "0"
            },
            {
              "fixed": "77073c77bc2006eb80ea6d5128f076f5e6c6f54f"
            }
          ],
          "repo": "https://github.com/vllm-project/vllm",
          "type": "GIT"
        },
        {
          "events": [
            {
              "introduced": "0"
            },
            {
              "fixed": "0.9.0"
            }
          ],
          "type": "ECOSYSTEM"
        }
      ],
      "versions": [
        "0.0.1",
        "0.1.0",
        "0.1.1",
        "0.1.2",
        "0.1.3",
        "0.1.4",
        "0.1.5",
        "0.1.6",
        "0.1.7",
        "0.2.0",
        "0.2.1",
        "0.2.1.post1",
        "0.2.2",
        "0.2.3",
        "0.2.4",
        "0.2.5",
        "0.2.6",
        "0.2.7",
        "0.3.0",
        "0.3.1",
        "0.3.2",
        "0.3.3",
        "0.4.0",
        "0.4.0.post1",
        "0.4.1",
        "0.4.2",
        "0.4.3",
        "0.5.0",
        "0.5.0.post1",
        "0.5.1",
        "0.5.2",
        "0.5.3",
        "0.5.3.post1",
        "0.5.4",
        "0.5.5",
        "0.6.0",
        "0.6.1",
        "0.6.1.post1",
        "0.6.1.post2",
        "0.6.2",
        "0.6.3",
        "0.6.3.post1",
        "0.6.4",
        "0.6.4.post1",
        "0.6.5",
        "0.6.6",
        "0.6.6.post1",
        "0.7.0",
        "0.7.1",
        "0.7.2",
        "0.7.3",
        "0.8.0",
        "0.8.1",
        "0.8.2",
        "0.8.3",
        "0.8.4",
        "0.8.5",
        "0.8.5.post1"
      ]
    }
  ],
  "aliases": [
    "CVE-2025-46570",
    "GHSA-4qjh-9fv9-r85r"
  ],
  "details": "vLLM is an inference and serving engine for large language models (LLMs). Prior to version 0.9.0, when a new prompt is processed, if the PageAttention mechanism finds a matching prefix chunk, the prefill process speeds up, which is reflected in the TTFT (Time to First Token). These timing differences caused by matching chunks are significant enough to be recognized and exploited. This issue has been patched in version 0.9.0.",
  "id": "PYSEC-2025-53",
  "modified": "2025-06-26T21:23:06.231251+00:00",
  "published": "2025-05-29T17:15:21+00:00",
  "references": [
    {
      "type": "ADVISORY",
      "url": "https://github.com/vllm-project/vllm/pull/17045"
    },
    {
      "type": "ADVISORY",
      "url": "https://github.com/vllm-project/vllm/security/advisories/GHSA-4qjh-9fv9-r85r"
    },
    {
      "type": "FIX",
      "url": "https://github.com/vllm-project/vllm/commit/77073c77bc2006eb80ea6d5128f076f5e6c6f54f"
    },
    {
      "type": "REPORT",
      "url": "https://github.com/vllm-project/vllm/pull/17045"
    }
  ]
}

cve-2025-46570

Vulnerability from cvelistv5

Published

2025-05-29 16:32

Modified

2025-05-29 18:05

Severity ?

2.6 (Low) - CVSS:3.1/AV:N/AC:H/PR:L/UI:R/S:U/C:L/I:N/A:N

Summary

vLLM’s Chunk-Based Prefix Caching Vulnerable to Potential Timing Side-Channel

References

▼	URL	Tags
	https://github.com/vllm-project/vllm/security/advisories/GHSA-4qjh-9fv9-r85r	x_refsource_CONFIRM
	https://github.com/vllm-project/vllm/pull/17045	x_refsource_MISC
	https://github.com/vllm-project/vllm/commit/77073c77bc2006eb80ea6d5128f076f5e6c6f54f	x_refsource_MISC

Impacted products

▼	Vendor	Product
	vllm-project	vllm

Show details on NVD website

JSON

To clipboard

{
  "containers": {
    "adp": [
      {
        "metrics": [
          {
            "other": {
              "content": {
                "id": "CVE-2025-46570",
                "options": [
                  {
                    "Exploitation": "none"
                  },
                  {
                    "Automatable": "no"
                  },
                  {
                    "Technical Impact": "partial"
                  }
                ],
                "role": "CISA Coordinator",
                "timestamp": "2025-05-29T18:04:57.706360Z",
                "version": "2.0.3"
              },
              "type": "ssvc"
            }
          }
        ],
        "providerMetadata": {
          "dateUpdated": "2025-05-29T18:05:10.768Z",
          "orgId": "134c704f-9b21-4f2e-91b3-4a467353bcc0",
          "shortName": "CISA-ADP"
        },
        "title": "CISA ADP Vulnrichment"
      }
    ],
    "cna": {
      "affected": [
        {
          "product": "vllm",
          "vendor": "vllm-project",
          "versions": [
            {
              "status": "affected",
              "version": "\u003c 0.9.0"
            }
          ]
        }
      ],
      "descriptions": [
        {
          "lang": "en",
          "value": "vLLM is an inference and serving engine for large language models (LLMs). Prior to version 0.9.0, when a new prompt is processed, if the PageAttention mechanism finds a matching prefix chunk, the prefill process speeds up, which is reflected in the TTFT (Time to First Token). These timing differences caused by matching chunks are significant enough to be recognized and exploited. This issue has been patched in version 0.9.0."
        }
      ],
      "metrics": [
        {
          "cvssV3_1": {
            "attackComplexity": "HIGH",
            "attackVector": "NETWORK",
            "availabilityImpact": "NONE",
            "baseScore": 2.6,
            "baseSeverity": "LOW",
            "confidentialityImpact": "LOW",
            "integrityImpact": "NONE",
            "privilegesRequired": "LOW",
            "scope": "UNCHANGED",
            "userInteraction": "REQUIRED",
            "vectorString": "CVSS:3.1/AV:N/AC:H/PR:L/UI:R/S:U/C:L/I:N/A:N",
            "version": "3.1"
          }
        }
      ],
      "problemTypes": [
        {
          "descriptions": [
            {
              "cweId": "CWE-208",
              "description": "CWE-208: Observable Timing Discrepancy",
              "lang": "en",
              "type": "CWE"
            }
          ]
        }
      ],
      "providerMetadata": {
        "dateUpdated": "2025-05-29T16:32:42.794Z",
        "orgId": "a0819718-46f1-4df5-94e2-005712e83aaa",
        "shortName": "GitHub_M"
      },
      "references": [
        {
          "name": "https://github.com/vllm-project/vllm/security/advisories/GHSA-4qjh-9fv9-r85r",
          "tags": [
            "x_refsource_CONFIRM"
          ],
          "url": "https://github.com/vllm-project/vllm/security/advisories/GHSA-4qjh-9fv9-r85r"
        },
        {
          "name": "https://github.com/vllm-project/vllm/pull/17045",
          "tags": [
            "x_refsource_MISC"
          ],
          "url": "https://github.com/vllm-project/vllm/pull/17045"
        },
        {
          "name": "https://github.com/vllm-project/vllm/commit/77073c77bc2006eb80ea6d5128f076f5e6c6f54f",
          "tags": [
            "x_refsource_MISC"
          ],
          "url": "https://github.com/vllm-project/vllm/commit/77073c77bc2006eb80ea6d5128f076f5e6c6f54f"
        }
      ],
      "source": {
        "advisory": "GHSA-4qjh-9fv9-r85r",
        "discovery": "UNKNOWN"
      },
      "title": "vLLM\u2019s Chunk-Based Prefix Caching Vulnerable to Potential Timing Side-Channel"
    }
  },
  "cveMetadata": {
    "assignerOrgId": "a0819718-46f1-4df5-94e2-005712e83aaa",
    "assignerShortName": "GitHub_M",
    "cveId": "CVE-2025-46570",
    "datePublished": "2025-05-29T16:32:42.794Z",
    "dateReserved": "2025-04-24T21:10:48.175Z",
    "dateUpdated": "2025-05-29T18:05:10.768Z",
    "state": "PUBLISHED"
  },
  "dataType": "CVE_RECORD",
  "dataVersion": "5.1"
}

Action not permitted

ghsa-4qjh-9fv9-r85r

Vulnerability from github

Description

Environment

Leakage

Results

Fixes

pysec-2025-53

Vulnerability from pysec

cve-2025-46570

Vulnerability from cvelistv5

Tags