A WIP Lazy Basic Block Versioning + Copy and Patch JIT Interpreter for CPython. NOTE: This is not any faster than CPython. It's just a test bed for cool experimental technologies that we want to have fun with.
We provide pre-compiled binaries for 64-bit Windows 10/11 via GitHub releases.
Python is a widely-used programming language. As a Python user, I want CPython code to execute quicker.
CPython is Python's reference implementation. Due to Python’s dynamic type semantics, CPython is generally unable to execute Python programs as fast as it potentially could with static type semantics.
We fork CPython to implement and experiment with features that will make it faster. Our fork is called pyLBBVAndPatch.
This section will be laden with some CS programming language terminology. We will do our best to keep that to a minimum and explain our features as simply as possible.
Before Orbital, while taking CS4215 in the previous semester, we have already achieved the following:
- Lazy basic block versioning interpreter
- Basic type propagation
- Type check removal
- Basic tests
- Comprehensive documentation
In short, lazy basic block versioning is a technique for collecting type information of executing code. A basic block is a "is a straight-line code sequence with no branches in except to the entry and no branches out except at the exit" (retrieved from Wikipedia, 21/6/2023). The code you write usually consists of multiple basic blocks. Lazy basic block versioning means we only generate code at runtime, block by block, as we observe the types. This allows us to collect runtime type information at basic block boundaries and optimize our basic blocks we generate them.
Type propagation means we can take the type information gathered from a single basic block, and propagate them down to the next basic block. Thus over time we can accumulate more and more type information.
Type check removal means removing type checks in dynamic Python. E.g. if you have a + b
, the fast path in Python has to check that these are int
or str
or float
, then if all those fail, rely on a generic +
function. These type checks incur overhead at runtime. With type information, if we know the types, that we don't need any type checks at all! This means we can eliminate type checks.
We have a rudimentary test script and Doxygen documentation for all our functions to follow common SWE practices.
This Orbital, we intend to refine pyLBBV. These include but are not limited to:
- General refactoring
- Bug fixing
- A more advanced type propagator.
- A more comprehensive test suite with Continuous Integration testing.
- A copy and patch JIT compiler.
A JIT(Just-in-Time) compiler is just a program that generates native machine executable code at runtime. Copy and Patch is a new fast compilation technique developed rather recently. The general idea is that compilation normally requires multiple steps, thus making compilation slow (recall how many steps your SICP meta-circular evaluator needs to execute JS)! Copy and patch makes compilation faster by skipping all the intermediate steps, and just creating "templates" for the final code. These "templates" are called stencils and they contain holes, i.e. missing values. All you have to do for compilation now is to copy and template, and patch in the holes. Thus making it very fast!
The main copy and patch JIT compiler is writte by Brandt Bucher, and we plan on integrating his work with ours. However, as such a compiler is not designed for use with basic block versioning, we will be handwriting x64 assembly code to get things working!
Our work here will be made publically available so that it will benefit CPython and its users, and we plan to collaborate closely with the developers of CPython in the course of this project.
Due to Python being a huge language, pyLBBVAndPatch intends to support and optimise a subset of Python. Specifically pyLBBVAndPatch focuses on integer and float arithmetic. We believe this scope is sufficient as an exploration of the LBBV + Copy and Patch JIT approach for CPython.
We did a major refactor of our code generation machinery. This makes the code easier to reason about.
- We managed to support recursive functions in Python!
- We are now able to build ourselves using the standard Python build suite. This is a huge accomplishment because it requires supporting a lot of Python code.
- We have fixed enough bugs that we can now run complex recursive Python functions like
help(1)
.
- We fixed bugs with how one type context (i.e snapshot) is deemed to be compatible with another type context.
- We now support collecting negative type information (e.g. that a variable is not and
int
or notfloat
). This allows for better type check elimination.
- We have added both feature tests and regression tests to our test script in tier2_test.py.
- We now have continous integration. We automatically build our project and run tests using GitHub Actions for Windows 64-bit, on every pull request and commit to the repository!
- All PRs require review and an approval before merging is allowed. All tests in CI must also pass. This follows standard best practices.
The copy-and-patch JIT compiler uses a stencil compiler provided by Brandt Bucher.
- At runtime, each basic block, except the branches (exits) are compiled to machine code.
- If compilation is successful, execution jumps into the machine code rather than the interpreter bytecode.
- The branches remain as CPython interpreter bytecode, to faciliatate easy branching.
- Upon encountering a branch, the interpreter leaves the machine code to go back into bytecode.
- Execution thus interleaves between machine code and the interpreter.
flowchart TD
dsl[bytecodes.c<br> Modified DSL <br> containing CPython <br> bytecode definitions]
typeprop[tier2_typepropagator.c.h <br> Type propagator]
instdef[generated_cases.c.h <br> CPython bytecode <br> interpreter <br> includes Tier 2 <br> bytecode]
stencils[jit_stencil.h <br> JIT versions of <br> CPython bytecode]
tier2[Tier2 Execution]
instdef -- Used in --> tier2
stencils -- Used in --> tier2
typeprop -- Used in --> tier2
tier2 -- Component of --> python.exe
subgraph Our Project
dsl -- Generates --> instdef
dsl -- Generates --> typeprop
end
subgraph Brandt Bucher's Stencil Compiler
instdef -- Compiles into --> stencils
end
sequenceDiagram
autonumber
participant CPython Compiler
participant Tier 0
participant Tier 1
box rgba(66,120,99,0.1) Our Project
participant Tier 2
participant Type Propagator
end
CPython Compiler ->> Tier 0: Emits code for <br> Tier 0 to execute
loop
Tier 0 ->> Tier 1: Individual instructions <br>profile the data it <br> receives and overwrites <br> itself to a more efficient <br> instruction.
Tier 1 ->> Tier 0: If optimisation is <br> invalid, de-optimise <br> back to Tier 0
end
Tier 1 ->> Tier 2: Code executed more <br> than 63 times and <br> Tier 1 instructions <br> present, triggers the <br> Tier 2 interpreter
loop Until exit scope executed
loop until Tier 2 encounters type-specialised tier 1 instruction or a branch instruction
Note over Tier 2: Tier 2 copies Tier 1 <br> instructions into a <br> buffer to be executed <br> according to runtime <br> type info
Tier 2 ->> Type Propagator: Requests type propagator
Type Propagator ->> Tier 2: Type propagator <br> updates runtime type <br> info based on <br>newly emitted code
end
Note over Tier 2: Emits a typeguard <br> or branch
Note over Tier 2: JIT compiles current <br> basic block into <br> machine code
Note over Tier 2: Execute machine code <br> until the end of the <br> basic block
Note over Tier 2: Execute the Tier 2 <br> branch instruction <br> or type guard
Tier 2 ->> Type Propagator: Requests type propagator
Type Propagator ->> Tier 2: Type propagator updates <br> runtime type info <br> based on branch taken
Note over Tier 2: Starts emitting <br> type specialised <br> branch according to <br> runtime type info
end
- Just general bugfixing.
- Evaluation and benchmarking
Results can be found here.
At the time of writing this (Milestone 3 release), pyLBBVAndPatch is only around 12% faster than CPython at bm_float_unboxed.py, which presents a significant regression from 39% faster in the original pyLBBV.
We will run the bm_nbody.py script if we can fix the remaining bugs.
- Refactor: Typeprop codegen by @JuliaPoo in #1
- Refactored type propagation codegen to more explicitly align with the formalism in our technical report (Appendix) and remove duplicated logic
- Fixed bug with possible reference cycle in the type propagation
TYPE_SET
operation
- feat: granular type checks by @Fidget-Spinner in #2
- Split
BINARY_CHECK_X
into simplyCHECK_X
and short-circuit failure paths. This allows us to follow more closely to the LBBV paper. It also means we can supported cases where one operand is known but the other unknown.
- Split
- Perf: Improved typeprop by switching overwrite -> set by @JuliaPoo in #6
- Stricter type propagation reduces type information loss
- Improved support for Tier 2 code generation
- Insert cache entries for
SUBSCR
tier 2 instructions in #13 - Fixed off by one error in forward jump calculation in #19
- Fixed wrong offset in
BB_BRANCH_IF_FLAG_SET
codegen in #21 - Fixed
BB_TEST_POP_IF_FALSE
not setting flag in #24 - Recursive generation of basic blocks in #29
- Mark interpreter frames as tier 2 or not in #34
- Make instruction offset calculation frame aware in #37
- Support multiple entry points in a Basic Block in #39 and #40
BINARY_OP
specialisations now apply toPySmallInt_Type
in #46
- Insert cache entries for
- Improved workflow:
- workflow: enable CI GH actions tests in #35
- Improved type propagator:
You should install LLVM 16.0 for your system.
You should follow the official CPython build instructions for your platform. https://devguide.python.org/getting-started/setup-building/
During the build process, errors may be printed, and the build process may error. However, the final Python executable should still be generated.
The majority of the changes and functionality are in Python/tier2.c
where Doxygen documentation
is written alongside the code, and in Tools/cases_generator/
which contains the DSL implementation.
We've written simple tests of the main functionalities. Unfortunately we did not have time to write comprehensive tests, and it doesn't seem worth it eitherways given the experimental nature of this project.
After building, run python tier2_test.py
or python.bat tier2_test.py
(on Windows) in the repository's root folder.
In tier2.c
, three flags can be set to print debug messages:
// Prints Tier 2 intepreter codegen debug messages
#define BB_DEBUG 0
// Prints typeprop debug messages
#define TYPEPROP_DEBUG 0
// Prints JIT codegen debug messages
#define JIT_DEBUG 0
More details can be found in our technical report. For an introduction to pyLBBV, refer to our presentation.