Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Library cleanup issues with OpenSSL - libp11 - yubihsm_pkcs11.so chain #527

Open
pugo opened this issue Mar 13, 2024 · 0 comments
Open

Library cleanup issues with OpenSSL - libp11 - yubihsm_pkcs11.so chain #527

pugo opened this issue Mar 13, 2024 · 0 comments

Comments

@pugo
Copy link

pugo commented Mar 13, 2024

We use libp11 in a chain like the following:

CST - OpenSSL - libpkcs11.so - yubihsm_pkcs11.so - YubiHSM 2

Where CST is i.MX Code Signing Tool, which is used to sign images for secure boot on NXP SOC:s.

When stress testing our signing I saw that the PKCS11 sessions are not correctly released, which after a short while under load causes errors due to lack of free sessions. With the yubihsm-connector the maximum number of consecutive sessions is 16 and the default timeout for inactive sessions is 30 seconds. When running more frequent than the timeouts can cope with signing operations fail.

I have been investigating the not released sessions, since we have a setup where signing operations can come in bursts that could potentially be hurt by the described problem. This could cause random build failures.

The results from my investigations are some smaller issues in the CST tool, but also one or two in libp11. They are described below.

I will make a PR for the first one.

Reference counting in p11_key.c pkcs11_object_free()

In the following code in p11_key.c the decrease of obj->refcnt happens before the following handling
of the obj->evp_key.

void pkcs11_object_free(PKCS11_OBJECT_private *obj)
{
        if(!obj)
                return;

        if (pkcs11_atomic_add(&obj->refcnt, -1, &obj->lock) != 0)
                return;
        if (obj->evp_key) {
                /* When the EVP object is reference count goes to zero, 
                 * it will call this function again. */
                EVP_PKEY *pkey = obj->evp_key;
                obj->evp_key = NULL;
                EVP_PKEY_free(pkey);
                return;
        }
        pkcs11_slot_unref(obj->slot);
        X509_free(obj->x509);
        OPENSSL_free(obj->label);
        pthread_mutex_destroy(&obj->lock);
        OPENSSL_free(obj);
}

In the if (obj->evp_key) section there is a comment that states that the pkcs11_object_free() will
get a new call as a result of the EVP_PKEY_free(pkey) call. At the end of that scope it returns.

When the second call comes, as stated in the comment, the obj->refcnt will be decreased once more.
This means that it will likely go negative and not 0 as tested in the if-clause. This means that it will return
and the five clean-up lines at the end of the function will never be run.

I will make a PR for the above.

Workaround in eng_front.c blocks cleanup

In the function engine_destroy() in eng_front.c there is a turned off call to ctx_finish(),
with a comment as below.

/* Destroy the context allocated with ctx_new() */
static int engine_destroy(ENGINE *engine)
{
	ENGINE_CTX *ctx;
	int rv = 1;

	ctx = get_ctx(engine);
	if (!ctx)
		return 0;

	/* ENGINE_remove() invokes our engine_destroy() function with
	 * CRYPTO_LOCK_ENGINE / global_engine_lock acquired.
	 * Any attempt to re-acquire the lock either by directly
	 * invoking OpenSSL functions, or indirectly via PKCS#11 modules
	 * that use OpenSSL engines, causes a deadlock. */
	/* Our workaround is to skip ctx_finish() entirely, as a memory
	 * leak is better than a deadlock. */
#if 0
	rv &= ctx_finish(ctx);
#endif

	rv &= ctx_destroy(ctx);
	ENGINE_set_ex_data(engine, pkcs11_idx, NULL);
	ERR_unload_ENG_strings();
	return rv;
}

Without the turned off call to ctx_finish() I can't see that the cleanup is run at all. I understand the comment about the deadlock issue, but the result of the decision is not only a memory leak but a leak of PKCS#11 sessions and possibly other resources.

Would it be possible to handle this in a better way? To handle the locks in a more detailed way on calls to ENGINE_remove()?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant