Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify the codebase by avoiding metaprogramming #696

Open
oscarbenjamin opened this issue May 8, 2023 · 9 comments
Open

Simplify the codebase by avoiding metaprogramming #696

oscarbenjamin opened this issue May 8, 2023 · 9 comments
Labels
enhancement new feature requests (or implementation) need decision
Milestone

Comments

@oscarbenjamin
Copy link

Following on from #683 (comment) concerning the speed of creating a context object and see the past discussions in gh-178.

The mpmath codebase uses a number of metaprogramming features that increase complexity for what I suspect is relatively little benefit (mostly DRY and micro-optimisation). These make the codebase hard to follow and also defeat static code analysis which is increasingly integrated in modern Python editors. I propose to remove these and move towards ordinary Python code with plain functions, methods and attributes.

Specifically I suggest:

  • Eliminate dynamic class generation. Make e.g. mpf a simple type that just has _mpf_ and ctx attributes.
  • Write methods in ordinary Python code rather than generating them dynamically either with exec or by attaching them from decorators.
  • Use more traditional methods for reducing boilerplate like reusable functions.

One example of this is dynamically created classes like:

class PythonMPContext:
def __init__(ctx):
ctx._prec_rounding = [53, round_nearest]
ctx.mpf = type('mpf', (_mpf,), {})
ctx.mpc = type('mpc', (_mpc,), {})
ctx.mpf._ctxdata = [ctx.mpf, new, ctx._prec_rounding]
ctx.mpc._ctxdata = [ctx.mpc, new, ctx._prec_rounding]
ctx.mpf.context = ctx
ctx.mpc.context = ctx
ctx.constant = type('constant', (_constant,), {})
ctx.constant._ctxdata = [ctx.mpf, new, ctx._prec_rounding]
ctx.constant.context = ctx

There are also functions and methods created from code strings at runtime:
def binary_op(name, with_mpf='', with_int='', with_mpc=''):
code = mpf_binary_op
code = code.replace("%WITH_INT%", with_int)
code = code.replace("%WITH_MPC%", with_mpc)
code = code.replace("%WITH_MPF%", with_mpf)
code = code.replace("%NAME%", name)
np = {}
exec(code, globals(), np)
return np[name]
_mpf.__eq__ = binary_op('__eq__',
'return mpf_eq(sval, tval)',
'return mpf_eq(sval, from_int(other))',
'return (tval[1] == fzero) and mpf_eq(tval[0], sval)')

We also have methods that are dynamically added to classes:
def __init__(self):
cls = self.__class__
for name in cls.defined_functions:
f, wrap = cls.defined_functions[name]
cls._wrap_specfun(name, f, wrap)

Some of these happen at import time and others happen when a context object is created and add measurable overhead to the cost of creating a context:

In [9]: from mpmath import MPContext, mp

In [10]: %timeit MPContext()
538 µs ± 16.6 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [11]: %timeit mp.clone()
561 µs ± 27.3 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [12]: %timeit mp.cos(1)
10.9 µs ± 253 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [18]: %prun -s cumulative [mp.clone() for _ in range(1000)]
         863004 function calls in 1.736 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    1.736    1.736 {built-in method builtins.exec}
        1    0.000    0.000    1.736    1.736 <string>:1(<module>)
        1    0.002    0.002    1.736    1.736 <string>:1(<listcomp>)
     1000    0.005    0.000    1.734    0.002 ctx_mp.py:297(clone)
     1000    0.022    0.000    1.718    0.002 ctx_mp.py:63(__init__)
     1000    0.007    0.000    0.831    0.001 ctx_base.py:42(__init__)
     1000    0.364    0.000    0.783    0.001 functions.py:18(__init__)
     1000    0.355    0.000    0.775    0.001 ctx_mp.py:96(init_builtins)
   213000    0.270    0.000    0.378    0.000 ctx_mp_python.py:1014(_wrap_specfun)
    43000    0.356    0.000    0.372    0.000 ctx_mp_python.py:979(_wrap_libmp_function)
   220000    0.080    0.000    0.080    0.000 {built-in method builtins.setattr}
     1000    0.073    0.000    0.073    0.000 ctx_mp_python.py:585(__init__)
   256000    0.048    0.000    0.048    0.000 {method 'get' of 'dict' objects}
    17000    0.020    0.000    0.038    0.000 rational.py:31(__new__)
    14000    0.017    0.000    0.037    0.000 ctx_mp_python.py:336(__new__)
     1000    0.023    0.000    0.023    0.000 matrices.py:749(__init__)
    26000    0.020    0.000    0.020    0.000 {built-in method builtins.getattr}
    17000    0.016    0.000    0.018    0.000 rational.py:7(create_reduced)
     1000    0.008    0.000    0.015    0.000 ctx_base.py:52(_init_aliases)
    24000    0.011    0.000    0.011    0.000 {built-in method __new__ of type object at 0x7f2bc4270a40}
     1000    0.005    0.000    0.010    0.000 ctx_mp_python.py:612(_set_prec)
     1000    0.008    0.000    0.010    0.000 inverselaplace.py:668(__init__)
     1000    0.004    0.000    0.006    0.000 quadrature.py:461(__init__)
     5000    0.003    0.000    0.006    0.000 ctx_mp_python.py:597(make_mpf)
     1000    0.003    0.000    0.004    0.000 libmpf.py:59(prec_to_dps)
     1000    0.002    0.000    0.002    0.000 ctx_base.py:458(memoize)
     1000    0.002    0.000    0.002    0.000 rszeta.py:54(__init__)
     2000    0.002    0.000    0.002    0.000 quadrature.py:21(__init__)
     2000    0.002    0.000    0.002    0.000 {built-in method builtins.max}
     1000    0.001    0.000    0.001    0.000 ctx_mp_python.py:602(make_mpc)
     1000    0.001    0.000    0.001    0.000 {built-in method builtins.round}
     4000    0.001    0.000    0.001    0.000 inverselaplace.py:17(__init__)
     1000    0.001    0.000    0.001    0.000 ctx_mp_python.py:620(<lambda>)
     1000    0.001    0.000    0.001    0.000 ctx_mp_python.py:607(default)
     1000    0.001    0.000    0.001    0.000 {method 'update' of 'dict' objects}
     1000    0.000    0.000    0.000    0.000 {method 'items' of 'dict' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

Ideally the cost of creating a new context object would be small enough that it could be worthwhile to create one just for the purposes of a single function call. The context object only has a handful of genuine attributes so it should not be this expensive to create:

In [29]: class Context:
    ...:     __slots__ = ('prec',)
    ...:     def __init__(self, prec=53):
    ...:         self.prec = prec
    ...: 

In [30]: %timeit Context()
235 ns ± 25.1 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

I think that it would be better to rewrite these things using ordinary Python code with a primary emphasis on keeping the code simple and easy to understand and maintain. Future versions of CPython will also boost the performance of much Python code but mpmath will potentially not be able to benefit as much from these optimisations because they are generally targeted at simple straight-forward code.

With a simpler codebase it would also be easier to aim for bigger performance improvements by making it easier to swap different backends in and out and to leverage e.g. python_flint if available. The small performance benefits that are achieved by this metaprogramming could be made a lot greater just by having an optional C/Cython backend at least for basic types like mpf but it any such backend in a non dynamically typed language would not be able to replicate the metaprogramming. The python_flint module could house a faster version of mpf and MPContext if it was to be used as an optional backend for mpmath.

@skirpichev
Copy link
Collaborator

That seems reasonable. However,

Make e.g. mpf a simple type that just has mpf and ctx attributes.

Are you suggest ctx attribute as an instance property?

I think we could use contextvars.ContextVar to store the current context and provide an API to get/set one. Then we could make the ctx.mpf to be a simple type and the mpf.context will just run query to get the current context. I believe this could be a relatively simple change.

(Another issue with _mpf is that we have the libmpf.py, that implements floating-point arithmetic in a functional style. Not sure if this layer simplify the codebase or add a significant performance bonus.)

Write methods in ordinary Python code rather than generating them dynamically either with exec or by attaching them from decorators.

I doubt that the second is something wrong. Why do you think so?

There are also functions and methods created from code strings at runtime

See #702.
Remaining cases are make_hyp_summator() and identify().

@oscarbenjamin
Copy link
Author

Are you suggest ctx attribute as an instance property?

Effectively it already is an instance attribute but it is implemented by having a dynamically created class with a class attribute. I'm suggesting not to have the dynamically generated class and just have ctx as an instance attribute.

Currently it is like this:

class _mpf:
    def __init__(self, value):
        self.value = value

    def __repr__(self):
        return "mpf(%s) (%s)" % (self.value, self.ctx)

class Context:
    def __init__(self, name):
        self.name = name
        self.mpf_type = type("mpf", (_mpf,), {})
        self.mpf_type.ctx = self

    def mpf(self, value):
        return self.mpf_type(value)

    def __repr__(self):
        return "Context(%s)" % self.name

ctx1 = Context("default")
ctx2 = Context("other")

a = ctx1.mpf(1)
b = ctx2.mpf(2)
print("a =", a)
print("b =", b)

I would instead write it like:

class mpf:
    def __init__(self, value, ctx):
        self.value = value
        self.ctx = ctx

    def __repr__(self):
        return "mpf(%s) (%s)" % (self.value, self.ctx)

class Context:
    def __init__(self, name):
        self.name = name

    def mpf(self, value):
        return mpf(value, self)

    def __repr__(self):
        return "Context(%s)" % self.name

ctx1 = Context("default")
ctx2 = Context("other")

a = ctx1.mpf(1)
b = ctx2.mpf(2)
print("a =", a)
print("b =", b)

@oscarbenjamin
Copy link
Author

Write methods in ordinary Python code rather than generating them dynamically either with exec or by attaching them from decorators.

I doubt that the second is something wrong. Why do you think so?

One problem with the decorator approach as currently implemented is that it is slow when creating a context object (see the profile above).

Another problem is just that it makes the codebase complicated to navigate and understand. Many different files and functions need to be inspected just to understand exactly how one function calls another. Usually I would follow things like that by jumping from one function to another in my editor using "go to definition" (gd in vim) but that fails as does all static analysis like type checking code completion etc when the methods are assigned dynamically.

@asmeurer
Copy link
Contributor

I agree with this. I often find it hard to figure out where stuff is because of this. Especially if the only point of the metaprogramming is to avoid some code duplication, it would be better to either use standard methods of avoiding duplication (like factoring things into functions or subclasses), or to just duplicate the code.

@skirpichev
Copy link
Collaborator

Another example is multiple pickling issues with the current code base.

Effectively it already is an instance attribute but it is implemented by having a dynamically created class with a class attribute.

On another hand, this is a huge change in semantics: right now we have different mpf types per context. You suggest instead something like gmpy2.mpfr.

I did some experiments in https://github.com/skirpichev/mpmath/tree/simple-mpf branch. Not sure it worth a draft pr, let me know. But I'm surprised there are only ~20 test failures with a decent part of this - broken isinstance(foo, mp.mpf) checks. All changes are in the ctx_mp_python.py so far - the rest is things like type(m)(-1)->-ctx.one change in the elliptic.py, can be accepted in the master branch.

@oscarbenjamin
Copy link
Author

On another hand, this is a huge change in semantics: right now we have different mpf types per context.

Is that a huge change in semantics? Do you think that code outside of mpmath really relies on these dynamic types?

Probably something in SymPy would break but that is just because SymPy reaches into the internals of mpmath too much.

I did some experiments in https://github.com/skirpichev/mpmath/tree/simple-mpf branch. Not sure it worth a draft pr, let me know

The commit looks good to me.

But I'm surprised there are only ~20 test failures with a decent part of this - broken isinstance(foo, mp.mpf) checks.

I'm not so surprised because the dynamically created classes do not really achieve anything in terms of behaviour.

@skirpichev
Copy link
Collaborator

Do you think that code outside of mpmath really relies on these dynamic types?

Maybe. Don't get me wrong, I like this idea. In a long term, with this, probably we even could use (optionally) gmpy's mpfr/mpc instead of mpmath's mpf/mpc.

But this will be not a backward compatible change.

Probably something in SymPy would break but that is just because SymPy reaches into the internals of mpmath too much.

Lets test that: diofant/diofant#1356 (I think the situation will be more or less same for the SymPy).

UPD: And yes, it does.

The commit looks good to me.

WIP: #719

I'm not so surprised because the dynamically created classes do not really achieve anything in terms of behaviour.

There are other issues (e.g. pickling), which aren't simple to fix.

@skirpichev
Copy link
Collaborator

Ok, now there are only pickling-related failures.

@skirpichev
Copy link
Collaborator

Now pickling works.

@skirpichev skirpichev added this to the 1.4 milestone Apr 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement new feature requests (or implementation) need decision
Projects
None yet
Development

No branches or pull requests

3 participants