[Security vulnerability] Seeds are not changed on database save #219

basak · 2020-12-15T04:00:56Z

Hi,

Thank you for pykeepass!

I noticed what I believe is a security issue with the way that pykeepass saves a database (I've tested only KDBX). None of the four seeds are regenerated. These are:

kdbx.header.value.dynamic_header.master_seed.data
kdbx.header.value.dynamic_header.encryption_iv.data
kdbx.header.value.dynamic_header.transform_seed.data
kdbx.header.value.dynamic_header.protected_stream_key.data

I believe this results in a number of issues. Note that I'm not a cryptographic expert. However, the official keepass client does in my testing regenerate these parameters on every save, so I think that should be a good enough reason for pykeepass not to vary in behaviour from that.

Some issues I believe exist with not regenerating the seeds:

Everyone who uses create_database() will get the same seed for password derivation, which makes them vulnerable to precomputation attacks.
I'm looking to use pykeepass to derive a "subset" password database for lower security devices. In this case, although I might generate the derived database from a master database, I would not expect the derived database to leak any information about the master database I specifically chose not to include. However, without regeneration of the seeds, an attacker with access to the derived database would be able to precompute tables that make the master password faster to crack if it were later exposed.
Not regenerating protected_stream_key is not as severe, but it would still allow an attacker who manages to access only the decrypted XML to be able to compare passwords from different users for equality, when regenerating this header field would have prevented it.

Note that this affects both create_database() and PyKeePass.save(). In both cases, all seeds should be regenerated, just like the "official" keepass client does it.

Here are test cases that should pass when this issues is fixed:

import pykeepass
import pytest


EXPECTED_SALT_ATTRS = [
    "kdbx.header.value.dynamic_header.master_seed.data",
    "kdbx.header.value.dynamic_header.encryption_iv.data",
    "kdbx.header.value.dynamic_header.transform_seed.data",
    "kdbx.header.value.dynamic_header.protected_stream_key.data",
]


def get_recursive_attr(obj, name):
    try:
        idx = name.index(".")
    except ValueError:
        return getattr(obj, name)
    else:
        return get_recursive_attr(getattr(obj, name[:idx]), name[idx + 1 :])


@pytest.mark.parametrize(["attr"], [(attr,) for attr in EXPECTED_SALT_ATTRS])
def test_creation_salts_differ(tmpdir, attr):
    a = pykeepass.create_database(str(tmpdir.join("a.kdbx")))
    b = pykeepass.create_database(str(tmpdir.join("b.kdbx")))

    assert get_recursive_attr(a, attr) != get_recursive_attr(b, attr)


@pytest.mark.parametrize(["attr"], [(attr,) for attr in EXPECTED_SALT_ATTRS])
def test_saved_salts_differ(tmpdir, attr):
    a = pykeepass.create_database(str(tmpdir.join("a.kdbx")))
    old_salt = get_recursive_attr(a, attr)
    a.save()
    b = pykeepass.PyKeePass("a.kdbx")
    new_salt = get_recursive_attr(b, attr)
    assert old_salt != new_salt

The text was updated successfully, but these errors were encountered:

Evidlo · 2020-12-15T05:40:51Z

Thanks for reporting this.

After looking through the keepassxc source, it seems like master_seed, encryption_iv, and protected_stream_key are indeed regenerated on save.

kdbx4
kdbx3

basak · 2020-12-16T19:21:48Z

Yes, and additionally transform_seed is also regenerated, just a little more indirectly:

https://github.com/keepassxreboot/keepassxc/blob/3b459813ed19f9f7a4e79b237134a749151f13b2/src/format/Kdbx4Writer.cpp#L55 sets updateTransformSalt to true
https://github.com/keepassxreboot/keepassxc/blob/3b459813ed19f9f7a4e79b237134a749151f13b2/src/core/Database.cpp#L730 then regenerates it

I used keepass' behaviour to verify, as opposed to keepassxc, but the result is the same.

basak · 2020-12-21T00:07:03Z

I implemented a workaround for now by poking into kdbx.header... and kdbx.body... and resetting the necessary values with secrets.token_bytes(len(previous_value)). There were a few gotchas which will also probably need to be tackled for a proper fix. Being as it is in a pykeepass-using app rather than in pykeepass itself, my workaround is too far diverged to use against pykeepass, but hopefully the following will help:

Probably obvious to pykeepass authors, but it wasn't to me: the correct attributes inside the kdbx structure to change differ depending on major_version; and for version 4, they differ again depending on kdbx.header.value.dynamic_header.kdf_parameters.data.dict["$UUID"].
After modifying kdbx.header.value, I needed to del kdbx.header.data because a RawCopy will rebuild from its data attribute if that still exists. I filed RawCopy docstring: add note about rebuilding construct/construct#890 to improve the documentation on this.
Due to build_file() crashes when using RawCopy with a value only construct/construct#888, calling PyKeePass.save() then crashes. I filed build_file() stream reading construct/construct#889 to fix this; in the meantime, I worked around by creating the stream myself (with w+b) and calling build_stream() instead. You'll either need build_file() crashes when using RawCopy with a value only construct/construct#888 fixed upstream before you can fix this issue, or you'll need a similar workaround in PyKeePass.save().

Phew!

Evidlo · 2020-12-22T00:31:12Z

Thanks for the good work.

Probably obvious to pykeepass authors, but it wasn't to me: the correct attributes inside the kdbx structure to change differ depending on major_version; and for version 4, they differ again depending on kdbx.header.value.dynamic_header.kdf_parameters.data.dict["$UUID"].

Yes, the location of the KDF parameters within the database bytestream changes depending on the database version, which is reflected in the object that kdbx_parsing returns. Ideally the kdbx_parsing would abstract this away from the main pykeepass code, but this is a limitation of construct that I couldn't figure out how to get around when I wrote the parsing code. I'm starting to consider moving away from construct and just doing the parsing with the struct module (see here, although this is very old code and can't build databases). This would give us full control over where the parsed values are stored in the database object and might make things like converting between database versions easier.

After modifying kdbx.header.value, I needed to del kdbx.header.data because a RawCopy will rebuild from its data attribute if that still exists. I filed construct/construct#890 to improve the documentation on this.

You actually figured out a problem I was having. I wanted to add functionality for changing the encryption and key derivation methods, but my changed values kept being overwritten by the previously copied bytes.

basak · 2020-12-23T03:46:26Z

Ideally the kdbx_parsing would abstract this away from the main pykeepass code, but this is a limitation of construct that I couldn't figure out how to get around when I wrote the parsing code. I'm starting to consider moving away from construct...

FWIW, I didn't know about construct before I started looking at pykeepass a couple of weeks ago, and I'm very impressed both by construct and how you have used it. It's really nice how pykeepass deals with the different format versions "all at once" in a single structure. I'm looking forward to using construct to solve a similar problem I'm busy solving in a different project.

The way construct works, I think it's a necessary requirement that the data structure that comes out of it looks like the file format itself, rather than being able to abstract across fields that appear differently in the file format in different file format versions.

To abstract over this, I'd do it outside of construct, but still use construct as you have done. For example, you could create a "proxy" type Python object by capturing getattr() and setattr() that maps to different properties inside the current kdbx structure depending on major_version. This could be stateless except that it would be instantiated with a kdbx reference. It could then present a more unified view into kdbx, which would then be supplied by kdbx_parsing for the main PyKeePass class to use.

arekbulski · 2020-12-23T16:16:23Z

Disclosure: I am the maintainer of Construct.

Thanks for the high praise Basak. I was impressed by Construct too and that is why I became its maintainer. Evidlo, I encourage you to post more Issues at Construct's forum. Its clear you had some issues with Construct but maybe we can resolve them. I dont think using struct module is a good idea in a sophisticated project like this.

I can confirm issue # 888 will be resolved upstream before end of the year.

Another disclosure: I had a Cryptography class at Stanford. Which I have to admit was probably the best course at Coursera I ever had and I have like 20 certificates from there so... yeah, back to the topic:

Unfortunately I dont know pykeepass enough to advise anything but I can say one thing: using same seeds opens a way for using rainbow tables which I think was what basak meant by precomputation attacks. I believe Basak is right on this.

The official KeePass client produces and expects a 64 bit field, not a 32 bit field, for this parameter. Using the incorrect length happens to parse just fine. The correct length is encoded into the header so there is no parsing misalignment. And because the field is little endian, parsing the field itself as a 32 bit field even though it is actually a 64 bit field also works, provided that the field value fits into 32 bits, which it typically does. However, if we write header field back out, we write it with a 32 bit length, and KeePass rejects this. We weren't writing out the header ourselves previously because kdbx.header.data wasn't being removed (see libkeepass#219 (comment)). However if we start doing that, such as to fix issue libkeepass#219, then this bug is revealed. The fix is trivial: we change the declaration to be a 64 bit field to match the official client.

basak · 2020-12-27T06:24:17Z

A couple more loose ends for version 3:

There's a minor bug with a fix in kdbx3: fix length of transform_rounds field #222, without which the official KeePass client won't accept version 3 files.
HeaderHash in the XML needs recalculating, without which the official KeePass client rejects a changed header as corrupt. Deleting the field also worked, though recalculating seems better. I think you can probably do this in UnprotectedStream as the recomputed header data field should be available at that stage, though I haven't tested this. I guess that would want to be conditional on major_version. As a workaround, I modified the tree attribute after recomputing the header an additional time by hand, and that worked.

basak · 2020-12-27T07:19:17Z

One more. It's probably also wise to regenerate kdbx.header.value.dynamic_header.stream_start_bytes.data in the case of version 3, to stop an attacker building up sets of known plaintexts. I don't know if there's a cryptographic attack possible otherwise, but again I believe the official client does this, so I think it makes sense to avoid verifying that it's safe not to do it, and just do it.

To summarise, here are the fields to regenerate for version 3:

    "kdbx.header.value.dynamic_header.master_seed.data",
    "kdbx.header.value.dynamic_header.encryption_iv.data",
    "kdbx.header.value.dynamic_header.protected_stream_key.data",
    "kdbx.header.value.dynamic_header.stream_start_bytes.data",

For version 4:

    "kdbx.header.value.dynamic_header.master_seed.data",
    "kdbx.header.value.dynamic_header.encryption_iv.data",
    "kdbx.body.payload.inner_header.protected_stream_key.data",

And additionally kdbx.header.value.dynamic_header.transform_seed.data for version 3, and KDF-specific for version 4 (eg. kdbx.header.value.dynamic_header.kdf_parameters.data.dict.S.value if argon2 is used).

Then HeaderHash (if version 3) and the individual protected XML password elements need recalculating.

I've not looked into version < 3 (doesn't affect me).

The official KeePass client produces and expects a 64 bit field, not a 32 bit field, for this parameter. Using the incorrect length happens to parse just fine. The correct length is encoded into the header so there is no parsing misalignment. And because the field is little endian, parsing the field itself as a 32 bit field even though it is actually a 64 bit field also works, provided that the field value fits into 32 bits, which it typically does. However, if we write header field back out, we write it with a 32 bit length, and KeePass rejects this. We weren't writing out the header ourselves previously because kdbx.header.data wasn't being removed (see #219 (comment)). However if we start doing that, such as to fix issue #219, then this bug is revealed. The fix is trivial: we change the declaration to be a 64 bit field to match the official client.

The official KeePass client produces and expects a 64 bit field, not a 32 bit field, for this parameter. Using the incorrect length happens to parse just fine. The correct length is encoded into the header so there is no parsing misalignment. And because the field is little endian, parsing the field itself as a 32 bit field even though it is actually a 64 bit field also works, provided that the field value fits into 32 bits, which it typically does. However, if we write header field back out, we write it with a 32 bit length, and KeePass rejects this. We weren't writing out the header ourselves previously because kdbx.header.data wasn't being removed (see libkeepass#219 (comment)). However if we start doing that, such as to fix issue libkeepass#219, then this bug is revealed. The fix is trivial: we change the declaration to be a 64 bit field to match the official client.

Evidlo · 2021-02-06T23:48:56Z

Here's my solution to generate new seeds in a simplified pseudo-KeePass database.

from construct import Struct, RawCopy, Bytes, Checksum, this, Subconstruct
from construct import stream_seek, stream_tell, stream_read, stream_write
from Cryptodome.Random import get_random_bytes
from io import BytesIO

# simple placeholder checksum function
def check_func(header_data, master_seed):
    return header_data + master_seed


class RandomBytes(Bytes):
    """Same as Bytes, but generate random bytes when building"""

    def _build(self, obj, stream, context, path):
        length = self.length(context) if callable(self.length) else self.length
        data = get_random_bytes(length)
        stream_write(stream, data, length, path)
        return data

class Copy(Subconstruct):
    """Same as RawCopy, but don't create parent container when parsing.
    Instead store data in ._data attribute of subconstruct, and never rebuild from data
    """

    def _parse(self, stream, context, path):
        offset1 = stream_tell(stream, path)
        obj = self.subcon._parsereport(stream, context, path)
        offset2 = stream_tell(stream, path)
        stream_seek(stream, offset1, 0, path)
        obj._data = stream_read(stream, offset2-offset1, path)
        return obj

    def _build(self, obj, stream, context, path):
        offset1 = stream_tell(stream, path)
        obj = self.subcon._build(obj, stream, context, path)
        offset2 = stream_tell(stream, path)
        stream_seek(stream, offset1, 0, path)
        obj._data = stream_read(stream, offset2-offset1, path)
        return obj

# simple database format with header and checksum
s = Struct(
    "header" / Copy(Struct(
        "master_seed" / RandomBytes(8),
        "more_data" / Bytes(8)
    )),
    "hash" / Checksum(
        Bytes(24),
        lambda ctx: check_func(ctx.header._data, ctx.header.master_seed),
        this
    )
)

# synthesize a 'database'
master_seed = b'00000000'
more_data = b'12345678'
header = master_seed + more_data
data = header + check_func(header, master_seed)

# parse the database
parsed = s.parse(data)

# rebuild and try to reparse final result
data2 = s.build(parsed)
s.parse(data2)
assert data != data2, "Database unchanged"

Evidlo · 2021-02-17T23:00:15Z

I'd appreciate some feeback on the above.

arekbulski · 2021-02-17T23:15:50Z

As far as Construct is concerned, I see no problems with your code.

basak · 2021-02-28T11:51:32Z

I really like this approach. Nice work! I wonder if RandomBytes and Copy are sufficiently general to be appropriate to include in Construct itself? If not they work equally well here.

One downside of this approach is that each build triggers a rekey. This is what we want to fix this issue, of course, but I wonder if there are times when we might want to build without rekeying. For example, for unit testing.

I don't have a strong feeling as to whether explicit or implicit rekeying would be better. I think it's difficult to say without attempting one approach or the other.

I think this mechanism should be sufficient to do most of the necessary work. HeaderHash (version 3) will need recalculating from my notes above; I'm not sure if this mechanism will do this, or it needs implementing separately?

arekbulski · 2021-02-28T15:52:40Z

Those 2 classes surely are a creative work. But they wont get into Construct mainstream. Copy already does what RawCopy does, and in a way I dont like, too obscure. And RandomBytes is not general enough to be included. But if this works for you, I dont see why not.

A6GibKm · 2021-06-23T16:04:11Z

Is there any update to this? It seems like a rather problematic issue.

vszakats · 2023-01-07T16:07:31Z

Can someone suggest the necessary custom code to init the seeds (KDBX4, default KDF/encryption) before calling .save()?:

kp = create_database("test.kdbx", password=test, keyfile=None, transformed_key=None)
kp.kdbx.body.payload.xml = etree.fromstring(xml_str)
kp.kdbx.header.value.dynamic_header.master_seed.data = get_random_bytes(8)
kp.kdbx.header.value.dynamic_header.encryption_iv.data = ??
kp.kdbx.body.payload.inner_header.protected_stream_key.data = ??
#kp.kdbx.header.value.dynamic_header.kdf_parameters.data.dict.S.value =   # if argon2 is used
kp.save()

Without this, the output is reproducible, which is a showstopper for encrypted data.

Fixes: libkeepass#219

basak mentioned this issue Dec 15, 2020

No "Save as" use case documentation #220

Closed

basak mentioned this issue Dec 27, 2020

kdbx3: fix length of transform_rounds field #222

Merged

A6GibKm added a commit to A6GibKm/pykeepass that referenced this issue Nov 3, 2023

Change master seed on each save

5272378

Fixes: libkeepass#219

A6GibKm linked a pull request Nov 3, 2023 that will close this issue

Change master seed on each save #366

Draft

A6GibKm added a commit to A6GibKm/pykeepass that referenced this issue Nov 3, 2023

Change master seed on each save

c19ee21

Fixes: libkeepass#219

A6GibKm added a commit to A6GibKm/pykeepass that referenced this issue Nov 4, 2023

Change master seed on each save

43b1f91

Fixes: libkeepass#219

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Security vulnerability] Seeds are not changed on database save #219

[Security vulnerability] Seeds are not changed on database save #219

basak commented Dec 15, 2020 •

edited

Evidlo commented Dec 15, 2020 •

edited

basak commented Dec 16, 2020 •

edited

basak commented Dec 21, 2020

Evidlo commented Dec 22, 2020 •

edited

basak commented Dec 23, 2020 •

edited

arekbulski commented Dec 23, 2020

basak commented Dec 27, 2020

basak commented Dec 27, 2020

Evidlo commented Feb 6, 2021 •

edited

Evidlo commented Feb 17, 2021

arekbulski commented Feb 17, 2021

basak commented Feb 28, 2021

arekbulski commented Feb 28, 2021

A6GibKm commented Jun 23, 2021

vszakats commented Jan 7, 2023

[Security vulnerability] Seeds are not changed on database save #219

[Security vulnerability] Seeds are not changed on database save #219

Comments

basak commented Dec 15, 2020 • edited

Evidlo commented Dec 15, 2020 • edited

basak commented Dec 16, 2020 • edited

basak commented Dec 21, 2020

Evidlo commented Dec 22, 2020 • edited

basak commented Dec 23, 2020 • edited

arekbulski commented Dec 23, 2020

basak commented Dec 27, 2020

basak commented Dec 27, 2020

Evidlo commented Feb 6, 2021 • edited

Evidlo commented Feb 17, 2021

arekbulski commented Feb 17, 2021

basak commented Feb 28, 2021

arekbulski commented Feb 28, 2021

A6GibKm commented Jun 23, 2021

vszakats commented Jan 7, 2023

basak commented Dec 15, 2020 •

edited

Evidlo commented Dec 15, 2020 •

edited

basak commented Dec 16, 2020 •

edited

Evidlo commented Dec 22, 2020 •

edited

basak commented Dec 23, 2020 •

edited

Evidlo commented Feb 6, 2021 •

edited