Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Introducing "import- vs run- time" semantics mode to Python #26

Open
pfalcon opened this issue Jan 5, 2019 · 4 comments
Open

RFC: Introducing "import- vs run- time" semantics mode to Python #26

pfalcon opened this issue Jan 5, 2019 · 4 comments

Comments

@pfalcon
Copy link
Owner

pfalcon commented Jan 5, 2019

One of the biggest (performance) issues with Python is (my term) overdynamicity - the fact that many symbols in a program a looked up at runtime by symbolic name. This includes: global variables and functions, module variables and functions, object attributes and methods. (Almost the only exception is that local function variables are optimized and accessed by "address" (more specifically, by offset in function stack frame)).

Such a semantics allows to override and customize many aspects of the language, but at the same time, leads to runtime inefficiency. But following are well-known facts:

  1. Majority of applications just never override symbols in other modules.
  2. Of those which do, majority do that once at the application startup (while "setting up application environment").
  3. Remaining would be quite specialized applications, either belonging to toolset (test runners, profilers, etc.) or applications which work around something instead of implementing/fixing properly.

Formalizing to Python semantics, following optimization approach can be proposed:

  1. During import time, a particular module can modify runtime environment (including overriding symbols in other modules).
  2. However, at runtime, such modifications are not allowed.
  3. These rules apply to all modules comprising a particular application recursively. I.e. there's a clear "import-time" phase vs runtime phases of application lifetime. Note that this rules out runtime imports (indeed, imports modify runtime environment, but it should be settled by the time when runtime phase starts).

Note also that "import time" is effectively corresponds to "compile time" in other languages. Indeed, cached bytecode files are produced during import phase, and they are produced by compiling source into the bytecode. But with conventional Python semantics, compiled bytecode has an implicit "module initialization function". That's required to allow both conventional semantics and modularity. For example, module init code can (and indeed, often does, per p.2 above) override symbols in other modules, so this has to be captured as imperative code. But the proposed new semantics effectively requires executing module init code during import time, and capturing effects of it. As effects can extend beyond the current module to the whole runtime environment, implementing the new semantics would require whole-program approach.

@pfalcon
Copy link
Owner Author

pfalcon commented Jan 5, 2019

From the above, it's clear which constraints are put under the code:

  1. Any function and globals definitions should be done in module init code.
  2. Any class definitions should be done in module init code.
  3. Any overridings of symbols in other modules should happen in module init code.

Note that "globals" is particular case of module name space, "globals" are just namespace of current module, with "builtins" module fallback.

As an example, suppose we want to override builtin print(). Code not compliant with the proposed approach:

import builtins

def my_print(*args, **kwargs):
    pass

def install_my_print():
    builtins.print = my_print

Compliant code:

import builtins

def my_print(*args, **kwargs):
    pass

builtins.print = my_print

@pfalcon
Copy link
Owner Author

pfalcon commented Jan 5, 2019

It should be noted which symbolic accesses can be optimized by this approach:

  • global variables and functions, module variables and functions - yes, these will be fixed at the end of import time, and thus could be accessed by address instead of symbolically.

  • object attributes and methods - no, these requires dynamic dispatch and thus dynamic lookup of attribute/method in an object whose type is known only at runtime. Optimizing this would require static type inference, and would be a next stage of optimization beyond the scope of this proposal.

@pfalcon pfalcon changed the title RFC: Introducing "import- vs run-time" semantics mode to Python RFC: Introducing "import- vs run- time" semantics mode to Python Jan 5, 2019
@pfalcon
Copy link
Owner Author

pfalcon commented Dec 28, 2019

To clearly separate import-time from run-time, we'd need to add to implement a special kind of "main" function to call after import phase if over. Turns out, many good things like this were already considered, but some were rejected: https://www.python.org/dev/peps/pep-0299/ "Special __main__() function in modules".

@pfalcon
Copy link
Owner Author

pfalcon commented Jan 1, 2020

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant