ghsa-ggh4-5hvc-554r
Vulnerability from github
In the Linux kernel, the following vulnerability has been resolved:
bpf: Fail bpf_timer_cancel when callback is being cancelled
Given a schedule:
timer1 cb timer2 cb
bpf_timer_cancel(timer2); bpf_timer_cancel(timer1);
Both bpf_timer_cancel calls would wait for the other callback to finish executing, introducing a lockup.
Add an atomic_t count named 'cancelling' in bpf_hrtimer. This keeps track of all in-flight cancellation requests for a given BPF timer. Whenever cancelling a BPF timer, we must check if we have outstanding cancellation requests, and if so, we must fail the operation with an error (-EDEADLK) since cancellation is synchronous and waits for the callback to finish executing. This implies that we can enter a deadlock situation involving two or more timer callbacks executing in parallel and attempting to cancel one another.
Note that we avoid incrementing the cancelling counter for the target timer (the one being cancelled) if bpf_timer_cancel is not invoked from a callback, to avoid spurious errors. The whole point of detecting cur->cancelling and returning -EDEADLK is to not enter a busy wait loop (which may or may not lead to a lockup). This does not apply in case the caller is in a non-callback context, the other side can continue to cancel as it sees fit without running into errors.
Background on prior attempts:
Earlier versions of this patch used a bool 'cancelling' bit and used the following pattern under timer->lock to publish cancellation status.
lock(t->lock); t->cancelling = true; mb(); if (cur->cancelling) return -EDEADLK; unlock(t->lock); hrtimer_cancel(t->timer); t->cancelling = false;
The store outside the critical section could overwrite a parallel requests t->cancelling assignment to true, to ensure the parallely executing callback observes its cancellation status.
It would be necessary to clear this cancelling bit once hrtimer_cancel is done, but lack of serialization introduced races. Another option was explored where bpf_timer_start would clear the bit when (re)starting the timer under timer->lock. This would ensure serialized access to the cancelling bit, but may allow it to be cleared before in-flight hrtimer_cancel has finished executing, such that lockups can occur again.
Thus, we choose an atomic counter to keep track of all outstanding cancellation requests and use it to prevent lockups in case callbacks attempt to cancel each other while executing in parallel.
{ "affected": [], "aliases": [ "CVE-2024-42239" ], "database_specific": { "cwe_ids": [ "CWE-667" ], "github_reviewed": false, "github_reviewed_at": null, "nvd_published_at": "2024-08-07T16:15:46Z", "severity": "MODERATE" }, "details": "In the Linux kernel, the following vulnerability has been resolved:\n\nbpf: Fail bpf_timer_cancel when callback is being cancelled\n\nGiven a schedule:\n\ntimer1 cb\t\t\ttimer2 cb\n\nbpf_timer_cancel(timer2);\tbpf_timer_cancel(timer1);\n\nBoth bpf_timer_cancel calls would wait for the other callback to finish\nexecuting, introducing a lockup.\n\nAdd an atomic_t count named \u0027cancelling\u0027 in bpf_hrtimer. This keeps\ntrack of all in-flight cancellation requests for a given BPF timer.\nWhenever cancelling a BPF timer, we must check if we have outstanding\ncancellation requests, and if so, we must fail the operation with an\nerror (-EDEADLK) since cancellation is synchronous and waits for the\ncallback to finish executing. This implies that we can enter a deadlock\nsituation involving two or more timer callbacks executing in parallel\nand attempting to cancel one another.\n\nNote that we avoid incrementing the cancelling counter for the target\ntimer (the one being cancelled) if bpf_timer_cancel is not invoked from\na callback, to avoid spurious errors. The whole point of detecting\ncur-\u003ecancelling and returning -EDEADLK is to not enter a busy wait loop\n(which may or may not lead to a lockup). This does not apply in case the\ncaller is in a non-callback context, the other side can continue to\ncancel as it sees fit without running into errors.\n\nBackground on prior attempts:\n\nEarlier versions of this patch used a bool \u0027cancelling\u0027 bit and used the\nfollowing pattern under timer-\u003elock to publish cancellation status.\n\nlock(t-\u003elock);\nt-\u003ecancelling = true;\nmb();\nif (cur-\u003ecancelling)\n\treturn -EDEADLK;\nunlock(t-\u003elock);\nhrtimer_cancel(t-\u003etimer);\nt-\u003ecancelling = false;\n\nThe store outside the critical section could overwrite a parallel\nrequests t-\u003ecancelling assignment to true, to ensure the parallely\nexecuting callback observes its cancellation status.\n\nIt would be necessary to clear this cancelling bit once hrtimer_cancel\nis done, but lack of serialization introduced races. Another option was\nexplored where bpf_timer_start would clear the bit when (re)starting the\ntimer under timer-\u003elock. This would ensure serialized access to the\ncancelling bit, but may allow it to be cleared before in-flight\nhrtimer_cancel has finished executing, such that lockups can occur\nagain.\n\nThus, we choose an atomic counter to keep track of all outstanding\ncancellation requests and use it to prevent lockups in case callbacks\nattempt to cancel each other while executing in parallel.", "id": "GHSA-ggh4-5hvc-554r", "modified": "2024-08-08T15:31:29Z", "published": "2024-08-07T18:30:43Z", "references": [ { "type": "ADVISORY", "url": "https://nvd.nist.gov/vuln/detail/CVE-2024-42239" }, { "type": "WEB", "url": "https://git.kernel.org/stable/c/3e4e8178a8666c56813bd167b848fca0f4c9af0a" }, { "type": "WEB", "url": "https://git.kernel.org/stable/c/9369830518688ecd5b08ffc08ab3302ce2b5d0f7" }, { "type": "WEB", "url": "https://git.kernel.org/stable/c/d4523831f07a267a943f0dde844bf8ead7495f13" } ], "schema_version": "1.4.0", "severity": [ { "score": "CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H", "type": "CVSS_V3" } ] }
- Seen: The vulnerability was mentioned, discussed, or seen somewhere by the user.
- Confirmed: The vulnerability is confirmed from an analyst perspective.
- Exploited: This vulnerability was exploited and seen by the user reporting the sighting.
- Patched: This vulnerability was successfully patched by the user reporting the sighting.
- Not exploited: This vulnerability was not exploited or seen by the user reporting the sighting.
- Not confirmed: The user expresses doubt about the veracity of the vulnerability.
- Not patched: This vulnerability was not successfully patched by the user reporting the sighting.