Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HasTraits instantiation performance degradation from 6.0.0 => 6.1.1 => 6.2.0 #1648

Open
jwiggins opened this issue Jun 3, 2022 · 7 comments
Labels
component: core Issues related to the core library type: performance Issues related to speed or memory usage

Comments

@jwiggins
Copy link
Member

jwiggins commented Jun 3, 2022

This was reported to me informally by @dgoeries. He measured performance degradation of HasTraits instantiation on a large long-lived codebase when upgrading to Traits 6.1.1.

I've done my best to replicate what the code is doing in a minimally reproducible form.

Minimal Example

expand
import random
import secrets
import time

from traits.api import (
    Dict,
    Event,
    Float,
    HasStrictTraits,
    Instance,
    Property,
    String,
    Undefined,
    cached_property,
)


class Variable(HasStrictTraits):
    # data
    attributes = Dict()
    value = Undefined
    timestamp = Instance(object)
    event1 = Event()
    event2 = Event()

    # convenience properties
    attr1 = Property(depends_on='attributes')
    attr2 = Property(depends_on='attributes')
    attr3 = Property(depends_on='attributes')
    attr4 = Property(depends_on='attributes')
    attr5 = Property(depends_on='attributes')

    @cached_property
    def _get_attr1(self):
        return self.attributes.get('foo')

    @cached_property
    def _get_attr2(self):
        return self.attributes.get('foo')

    @cached_property
    def _get_attr3(self):
        return self.attributes.get('foo')

    @cached_property
    def _get_attr4(self):
        return self.attributes.get('foo')

    @cached_property
    def _get_attr5(self):
        return self.attributes.get('foo')


class ObjectRoot(Variable):
    value = Instance(dict, args=())
    name = String()
    schema_change = Event()


class NumberVariable(Variable):
    value = Float()


class StringVariable(Variable):
    value = String()


def create_random_schema(name="root", level=2):
    """Create a big and deeply nested object description."""
    VARS_PER_LEVEL = 5

    schema = {"name": name}
    level_schema = schema.setdefault("variables", {})
    for idx in range(VARS_PER_LEVEL):
        name_extra = f"{level}_{secrets.token_hex(2)}"
        if level > 0:
            name = f"subobj_{name_extra}"
            level_schema[name] = create_random_schema(name=name, level=level - 1)
        else:
            name = f"var_{name_extra}"
            vartype = "str" if idx % 2 == 0 else "num"
            varval = secrets.token_hex(4) if vartype == "str" else random.random() * 100
            level_schema[name] = {"name": name, "value": varval, "type": vartype}

    return schema


def generate_from_schema(schema):
    """Recursively generate an instance from a schema describing it."""

    def _gen_node(value):
        if "variables" in value:
            return generate_from_schema(value)
        valtype = value["type"]
        if valtype == "num":
            return NumberVariable(value=value["value"])
        elif valtype == "str":
            return StringVariable(value=value["value"])

        raise ValueError(f"{value}")

    obj = ObjectRoot(name=schema["name"])
    for key, value in schema["variables"].items():
        obj.value[key] = _gen_node(value)

    obj.schema_change = True
    return obj


def summarize_schema(schema, level=0):
    """Show a condensed representation of the object we're generating."""
    indent = " " * (level + 1) * 2
    if level == 0:
        print("Root:")

    for key, value in schema["variables"].items():
        if "variables" in value:
            print(f"{indent}{key}:")
            summarize_schema(value, level + 1)
        else:
            print(f"{indent}{key}: {value['type']}")


def main():
    COUNT = 9

    schema = create_random_schema()
    # summarize_schema(schema)

    accum_time = 0
    for _ in range(COUNT):
        start = time.perf_counter()
        generate_from_schema(schema)
        accum_time += time.perf_counter() - start

    print("Total time:", accum_time)
    print("Time per instantiation:", accum_time / COUNT)


if __name__ == "__main__":
    main()

Timings

Traits 6.2.0

Total time: 1.1325114750652574
Time per instantiation: 0.12583460834058416

Traits 6.1.1

Total time: 0.8695192660088651
Time per instantiation: 0.09661325177876279

Traits 6.0.0

Total time: 0.7810043709760066
Time per instantiation: 0.08677826344177851
@jwiggins jwiggins added the type: performance Issues related to speed or memory usage label Jun 3, 2022
@aaronayres35
Copy link
Contributor

FWIW changing properties to Property(observe='attributes') gives the following:

Traits 6.0.0 with depends_on (observe was introduced in 6.1.0)

Total time: 1.1431641150265932
Time per instantiation: 0.1270182350029548

Traits 6.1.1 with observe

Total time: 0.5715710348449647
Time per instantiation: 0.06350789276055163

Traits 6.2.0 with observe

Total time: 0.5453224293887615
Time per instantiation: 0.060591381043195724

Traits 6.3.2 with observe

Total time: 0.500484247226268
Time per instantiation: 0.05560936080291867

@dgoeries
Copy link

dgoeries commented Jun 3, 2022

Hi, thanks @jwiggins for the script. I realized a detail that I did not tell, sorry. We have it written now to take a default value and remove the caching, hoppla.

Modified example

Click to expand!
import random
import secrets
import time

from traits.api import (
    Dict,
    Event,
    Float,
    HasStrictTraits,
    Instance,
    Property,
    String,
    Undefined,
)


class Variable(HasStrictTraits):
    # data
    attributes = Property
    _attributes = Dict
    value = Undefined
    timestamp = Instance(object)
    event1 = Event()
    event2 = Event()

    # convenience properties
    attr1 = String
    attr2 = String
    attr3 = String
    attr4 = String
    attr5 = String

    def _set_attributes(self, value):
        self._attributes = value

    def _get_attributes(self):
        return self._attributes

    def _attr1_default(self):
        return self._attributes.get('foo')

    def _attr2_default(self):
        return self._attributes.get('foo')

    def _attr3_default(self):
        return self._attributes.get('foo')

    def _attr4_default(self):
        return self._attributes.get('foo')

    def _attr5_default(self):
        return self._attributes.get('foo')


class ObjectRoot(Variable):
    value = Instance(dict, args=())
    name = String()
    schema_change = Event()


class NumberVariable(Variable):
    value = Float()


class StringVariable(Variable):
    value = String()


random_attributes = {"foo": "bar"}


def create_random_schema(name="root", level=2):
    """Create a big and deeply nested object description."""
    VARS_PER_LEVEL = 5

    schema = {"name": name}
    level_schema = schema.setdefault("variables", {})
    for idx in range(VARS_PER_LEVEL):
        name_extra = f"{level}_{secrets.token_hex(2)}"
        if level > 0:
            name = f"subobj_{name_extra}"
            level_schema[name] = create_random_schema(name=name, level=level - 1)
        else:
            name = f"var_{name_extra}"
            vartype = "str" if idx % 2 == 0 else "num"
            varval = secrets.token_hex(4) if vartype == "str" else random.random() * 100
            level_schema[name] = {"name": name, "value": varval, "type": vartype}

    return schema


def generate_from_schema(schema):
    """Recursively generate an instance from a schema describing it."""

    def _gen_node(value):
        if "variables" in value:
            return generate_from_schema(value)
        valtype = value["type"]
        if valtype == "num":
            return NumberVariable(value=value["value"],
                                  attributes=random_attributes)
        elif valtype == "str":
            return StringVariable(value=value["value"],
                                  attributes=random_attributes)

        raise ValueError(f"{value}")

    obj = ObjectRoot(name=schema["name"])
    for key, value in schema["variables"].items():
        obj.value[key] = _gen_node(value)

    obj.schema_change = True
    return obj


def summarize_schema(schema, level=0):
    """Show a condensed representation of the object we're generating."""
    indent = " " * (level + 1) * 2
    if level == 0:
        print("Root:")

    for key, value in schema["variables"].items():
        if "variables" in value:
            print(f"{indent}{key}:")
            summarize_schema(value, level + 1)
        else:
            print(f"{indent}{key}: {value['type']}")


def main():
    COUNT = 1000

    schema = create_random_schema()
    # summarize_schema(schema)

    accum_time = 0
    for _ in range(COUNT):
        start = time.perf_counter()
        generate_from_schema(schema)
        accum_time += time.perf_counter() - start

    print("Total time:", accum_time)
    print("Time per instantiation:", accum_time / COUNT)


if __name__ == "__main__":
    main()

Timings

Traits 6.3.2

Total time: 1.316101769771194
Time per instantiation: 0.0013161017697711941

Traits 6.0.0

Total time: 0.7040128106891643
Time per instantiation: 0.0007040128106891643

@mdickinson mdickinson added the component: core Issues related to the core library label Jun 6, 2022
@mdickinson
Copy link
Member

Many thanks for this; the script is especially useful here.

Attached are some profile results using pyinstrument on a variant of this script, comparing the 6.0.0 release with the current main branch. (Python 3.10.4, macOS 12.4 / Intel).

Script
import random
import secrets

import pyinstrument

import traits
from traits.api import (Dict, Event, Float, HasStrictTraits, Instance,
                        Property, String, Undefined)


class Variable(HasStrictTraits):
    # data
    attributes = Property
    _attributes = Dict
    value = Undefined
    timestamp = Instance(object)
    event1 = Event()
    event2 = Event()

    # convenience properties
    attr1 = String
    attr2 = String
    attr3 = String
    attr4 = String
    attr5 = String

    def _set_attributes(self, value):
        self._attributes = value

    def _get_attributes(self):
        return self._attributes

    def _attr1_default(self):
        return self._attributes.get("foo")

    def _attr2_default(self):
        return self._attributes.get("foo")

    def _attr3_default(self):
        return self._attributes.get("foo")

    def _attr4_default(self):
        return self._attributes.get("foo")

    def _attr5_default(self):
        return self._attributes.get("foo")


class ObjectRoot(Variable):
    value = Instance(dict, args=())
    name = String()
    schema_change = Event()


class NumberVariable(Variable):
    value = Float()


class StringVariable(Variable):
    value = String()


random_attributes = {"foo": "bar"}


def create_random_schema(name="root", level=2):
    """Create a big and deeply nested object description."""
    VARS_PER_LEVEL = 5

    random.seed(12345)

    schema = {"name": name}
    level_schema = schema.setdefault("variables", {})
    for idx in range(VARS_PER_LEVEL):
        name_extra = f"{level}_{secrets.token_hex(2)}"
        if level > 0:
            name = f"subobj_{name_extra}"
            level_schema[name] = create_random_schema(
                name=name, level=level - 1
            )
        else:
            name = f"var_{name_extra}"
            vartype = "str" if idx % 2 == 0 else "num"
            varval = (
                secrets.token_hex(4)
                if vartype == "str"
                else random.random() * 100
            )
            level_schema[name] = {
                "name": name,
                "value": varval,
                "type": vartype,
            }

    return schema


def generate_from_schema(schema):
    """Recursively generate an instance from a schema describing it."""

    def _gen_node(value):
        if "variables" in value:
            return generate_from_schema(value)
        valtype = value["type"]
        if valtype == "num":
            return NumberVariable(
                value=value["value"], attributes=random_attributes
            )
        elif valtype == "str":
            return StringVariable(
                value=value["value"], attributes=random_attributes
            )

        raise ValueError(f"{value}")

    obj = ObjectRoot(name=schema["name"])
    for key, value in schema["variables"].items():
        obj.value[key] = _gen_node(value)

    obj.schema_change = True
    return obj


def main():
    COUNT = 10000
    version = traits.__version__

    schema = create_random_schema()

    profiler = pyinstrument.Profiler()
    profiler.start()

    for _ in range(COUNT):
        generate_from_schema(schema)

    profiler.stop()
    html_profile = profiler.output_html()
    with open(f"profile_{version}.html", "w", encoding="utf-8") as f:
        f.write(html_profile)


if __name__ == "__main__":
    main()

Profile results on 6.0.0 (as a screenshot, since GitHub won't let me upload the html file :-( ):

Screenshot 2022-06-06 at 10 54 19

Profile results on main:

Screenshot 2022-06-06 at 10 54 27

There's no one obvious single source for the performance regression, but there are a few low-hanging fruit. For one, it looks as though we're spending significant time validating keys and values for a Dict trait even when no key trait or value trait is specified. That's something that we could usefully special-case (with corresponding special cases for List and Set, of course).

@mdickinson
Copy link
Member

Looks like we're also losing some time in _post_init_trait_observers and _init_trait_observers, even though there's no observation going on. That seems worth an investigation.

As for very low-hanging fruit, this line unnecessarily looks up self.key_validator and self.value_validator once per item, when it should be creating the two validators as locals first. That's showing up in the profile above.

@jwiggins
Copy link
Member Author

@dgoeries Is it useful to link to the relevant code in Karabo now that it is visible on GitHub? Are you still stuck in an old version of traits, or has this been resolved?

@dgoeries
Copy link

Hey @jwiggins the code provided in the example nicely represents our binding code in https://github.com/European-XFEL/Karabo/blob/main/src/pythonGui/karabogui/binding/binding_types.py#L41

At the moment, we are stuck on traits 6.0.0 due to the significant performance drain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: core Issues related to the core library type: performance Issues related to speed or memory usage
Projects
None yet
Development

No branches or pull requests

4 participants