PyPy JIT for dummies
As you surely know, the key idea of PyPy is that we are too lazy to write a
JIT of our own: so, instead of passing nights writing a JIT, we pass years
coding a JIT generator that writes the JIT for us :-).
I'm not going to explain how the JIT generator does its job, (perhaps this
will be the subject of another blog post), but how the generated JIT
works.
There are values that, if known at compile-time (i.e., when the JIT compiler
runs), let the JIT to produce very efficient code. In a dynamic language,
types are the primary example: for instance, suppose you are a compiler and
you have to compile to following Python function:
def mysum(a):
return a + 1
At compile time, you don't have any knowledge about the type of the parameter:
it could be integer, float, an user defined object, etc. In this situation,
the only safe choice is to emit code which does the usual, slow, full lookup
to know how to perform the operations.
On the other hand, suppose that you knew in advance that the parameter is an
integer: this time, you could emit code that exploits this extra
knowledge, by performing directly a fast integer addition.
The idea behind PyPy JIT is that if you don't have enough knowledge to
generate efficient code, you stop compiling and wait until you know
exactly what you need. Concretely, you emit code that runs until the point
where you stopped the compilation, then it triggers a special procedure that
restarts the compiler. This time the JIT compiler knows everything
you need, because you can inspect the state of the running program.
Let's see an example: the first time the JIT compiles mysum, it produces
something like this pseudo-code: