Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add config option to specify json float serialization precision #905

Open
wants to merge 4 commits into
base: develop
Choose a base branch
from

Conversation

Gatsik
Copy link
Contributor

@Gatsik Gatsik commented May 8, 2022

Closes #875

@Gatsik Gatsik force-pushed the issue/#875-maximum-precision-for-float-serialization branch 2 times, most recently from bf7c3aa to 86d642a Compare May 12, 2022 21:42
@Gatsik
Copy link
Contributor Author

Gatsik commented May 12, 2022

test_ladder_game_draw_bug and test_game_ended_broadcasts_rating_update are failing probably because of customized precision
Do you want this custom precision only for queue_pop_time_delta field in matchamker_info message?

@Askaholic
Copy link
Collaborator

Hey, nice work! I think it makes sense to use the same precision for everything, so those tests probably need to be adjusted to have higher tolerance for error.

The second one is interesting because it means there will be some edge cases where the rating change is so small that it doesn’t appear to change at all. I think that’s ok though since the full change history will still be available through the replay details page and the rating graph.

@Askaholic
Copy link
Collaborator

One thing we have to keep in mind is that if we’re messing with the json encoding, we need to be very careful of the performance impact since this is one of the hottest functions in the entire code base. It would be amazing if we could just override the float serialization code and leave the rest of the implementation untouched. I found a little snippet that does this although it uses the mock.patch functionality to override private functions from the Json library. Still, this may be a good source of inspiration:

https://gist.github.com/Sukonnik-Illia/ed9b2bec1821cad437d1b8adb17406a3

@Gatsik
Copy link
Contributor Author

Gatsik commented May 22, 2022

It looks like we can't do much without creating custom python package that will use compiled C code.
I did some comparison between different methods using slightly modified script taken from https://developpaper.com/speed-comparison-of-five-json-libraries-in-python/ and here are the results of encoding on my machine:

patching (example given above): 107.800
current (this pr's code):  28.629

Default serialization

simplejson: 6.981
ujson: 3.503
stdlib json: 4.500

Round floats approach (https://stackoverflow.com/a/53798633):

simplejson: 14.196
ujson: 10.481
stdlib json: 11.109

Overriding iterencode function, which contains floatstr method, directly without patching:

15.991

Compiling custom mix of json and simplejson (most of the code is a copy from python/cpython#13233, built with setup.py from simplejson):

6.589

Time is in seconds, the data for encoding is:

{
    "command": "game_info",
    "visibility": "public",
    "password_protected": True,
    "uid": 13,
    "title": "someone's game",
    "state": "playing",
    "game_type": "custom",
    "featured_mod": "faf",
    "sim_mods": {},
    "mapname": "scmp_009",
    "map_file_path": "maps/scmp_009.zip",
    "host": "Foo",
    "num_players": 2,
    "launched_at": 1111111111.1112312312312,
    "rating_type": "faf",
    "rating_min": None,
    "rating_max": None,
    "enforce_rating_range": False,
    "team_ids": [
        {
            "team_id": 1,
            "player_ids": [1],
        },
        {
            "team_id": 2,
            "player_ids": [2],
        },
    ],
    "teams": {
        1: ["Foo"],
        2: ["Bar"],
    },
}

There is also a package called orjson which is super fast, but requires dict keys to be strings, so with modified initial data, in which all keys are strings, the result is:

0.624

Otherwise, using "round floats" approach in which we also convert dict keys to str:

8.457


class CustomJSONEncoder(json.JSONEncoder):
# taken from https://stackoverflow.com/a/53798633
def encode(self, o):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this way the best out of all the ones proposed so far. Have you measured the performance difference for this one? I imagine it would be noticeable especially for dicts since it makes a copy of the entire data structure, but maybe it's not too bad.

I think the other way that would have merit would be to make a wrapper type around float and use default to format it. That would require us to explicitly set the float rounding everywhere, but it would also give more control if we wanted to round rating to 3 or 4 places but timestamps only to 2.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry, I see this is the 'round floats' approach you mentioned in your other comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand what do you mean by a wrapper, because if it requires us to call this wrapper every time we do some math, then why don't just use round?
Rewriting representation of the float won't help, the C code will operate with value and encode all the digits

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something similar to this, but using our own class instead of Decimal: https://stackoverflow.com/a/3885198

class PFloat:
    def __init__(self, value: float, precision: int):
        self.value = value
        self.precision = precision

And then in the to_dict method you'd have to wrap the floats in this class:

def to_dict(self):
    return {
        "some_float": PFloat(self.some_float, 2)
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I can make a class, but I think it is unnecessary complication, because we already have built-in round function which effectively does the same thing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like, we have 2 options:

  1. Rewrite json's encode method to change encoding of all floats
  2. Rewrite every to_dict method where we will round every float with its own precision (the "more control" mentioned above)

server/servercontext.py Show resolved Hide resolved
tests/unit_tests/test_protocol.py Outdated Show resolved Hide resolved
tests/unit_tests/test_protocol.py Outdated Show resolved Hide resolved
tests/unit_tests/test_protocol.py Outdated Show resolved Hide resolved
if isinstance(o, (list, tuple)):
return [round_floats(x) for x in o]
return o
return super().encode(round_floats(o))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should add a config option to disable this feature just in case it turns out to be too costly in production. Although, I think even if it doubles the json encoding time that will still be fine judging by the profiling information I've collected from the server in the past.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sure, what would you call it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think JSON_ROUND_FLOATS would be good. And then rename the other one to JSON_ROUND_FLOATS_MAX_DIGITS or JSON_ROUND_FLOATS_PRECISION

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should mocking be also conditional in tests?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's too important to have tests for the config values. They're generally simple enough that it doesn't matter

@Gatsik Gatsik force-pushed the issue/#875-maximum-precision-for-float-serialization branch from 2a356ce to c61d227 Compare May 28, 2022 19:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Maximum precision for float serialization
2 participants