Optimizations

Optimizations

Implementated optimizations:

Call pure builtins

Call pure builtin functions at compilation: replace the call with the result in the specialized bytecode, add guards on the called builtin functions.

The optimization is disabled when the builtin function is modified or if a variable with the same name is added to the global namespace of the function.

The optimization on the builtin NAME requires two guards:

  • NAME key in builtin namespace
  • NAME key in global namespace

Example:

Original Specialized
def func():
    return len("abc")
def func():
    return 3

Loop unrolling

for i in range(3): ... and for i in (1, 2, 3): ... are unrolled. By default, only loops with 16 iterations or less are optimized.

Note

If break and/or continue instructions are used in the loop body, the loop is not unrolled.

Configuration option: unroll_loops.

tuple example

Example with a tuple.

Original Loop unrolled
def func():
    for i in ("a", "b"):
        print(i)
def func():
    i = "a"
    print(i)

    i = "b"
    print(i)

No guard is required. The function has no specialized bytecode, the optimization is done directly on the function.

Original bytecode:

.     0 SETUP_LOOP              14 (to 17)
      3 LOAD_CONST               3 (('hello', 'world'))
      6 GET_ITER

>>    7 FOR_ITER                 6 (to 16)
     10 STORE_FAST               0 (i)

     13 JUMP_ABSOLUTE            7
>>   16 POP_BLOCK

>>   17 LOAD_CONST               0 (None)
     20 RETURN_VALUE

fatoptimizer bytecode:

LOAD_CONST   1 ("hello")
STORE_FAST   0 (i)

LOAD_CONST   2 ("world")
STORE_FAST   0 (i)

LOAD_CONST   0 (None)
RETURN_VALUE

range example

Example of a loop using range().

Original Loop unrolled
def func():
    for i in range(2):
        print(i)
def func():
    i = 0
    print(i)

    i = 1
    print(i)

The specialized bytecode requires two guards:

  • range builtin variable
  • range global variable

Combined with constant propagation, the code becomes even more interesting:

def func():
    i = 0
    print(0)

    i = 1
    print(1)

Note

Since replacing range() requires a specialization with guard, the optimization is only implemented at function level.

Simplify comprehensions

Simplify list-comprehension, set-comprehension and dict-comprehension. Optimization similar to Loop unrolling, but applied to comprehensions.

Examples (combined with Constant folding):

Comprehension Code Simplified
List-comprehension
[i for i in (1, 2, 3)]
[1, 2, 3]
Set-comprehension
{i*2 for i in "abc"}
{"aa", "bb", "cc"}
Dict-comprehension
{i : i * 2 for i in (1, 2, 3)}
{1: 2, 2: 4, 3: 6}

Configuration option: unroll_loops.

Constant propagation

Propagate constant values of variables.

Original Constant propagation
def func()
    x = 1
    y = x
    return y
def func()
    x = 1
    y = 1
    return 1

Configuration option: constant_propagation.

Constant folding

Compute simple operations at the compilation:

  • arithmetic operations:
    • a+b, a-b, a*b, a/b: int, float, complex
    • +x, -x, ~x: int, float, complex
    • a//b, a%b, a**b: int, float
    • a<<b, a>>b, a&b, a|b, a^b: int
  • comparison, tests:
    • a < b, a <= b, a >= b, a > b
    • a == b, a != b: don’t optimize bytes == str
    • obj in seq, obj not in seq: for bytes, str, tuple seq
    • not x: int
  • str: str + str, str * int
  • bytes: bytes + bytes, bytes * int
  • tuple: tuple + tuple, tuple * int
  • str, bytes, tuple, list: obj[index], obj[a:b:c]
  • dict: obj[index]
  • replace x in list with x in tuple if list only contains constants
  • replace x in set with x in frozenset if set only contains constants
  • simplify tests:
Code Constant folding
not(x is y) x is not y
not(x is not y) x is y
not(obj in seq) obj not in seq
not(obj not in seq) obj in seq

Note: not (x == y) is not replaced with x != y because not x.__eq__(y) can be different than x.__ne__(y) for deliberate reason Same rationale for not replacing not(x < y) with x >= y. For example, math.nan overrides comparison operators to always return False.

Examples of optimizations:

Code Constant folding
-(5) -5
+5 5
x in [1, 2, 3] x in (1, 2, 3)
x in {1, 2, 3} x in frozenset({1, 2, 3})
‘Python’ * 2 ‘PythonPython’
3 * (5,) (5, 5, 5)
‘python2.7’[:-2] ‘python2’
‘P’ in ‘Python’ True
9 not in (1, 2, 3) True
[5, 9, 20][1] 9

Configuration option: constant_folding.

Replace builtin constants

Replace __debug__ constant with its value.

Configuration option: replace_builtin_constant.

Dead code elimination

Remove the dead code.

Examples:

Code Dead code removed
if test:
    pass
else:
    else_block
if not test:
    else_block
if 1:
    body_block
body_block
if 0:
    body_block
pass
if False:
    body_block
else:
    else_block
else_block
while 0:
    body_block
pass
while 0:
    body_block
else:
    else_block
else_block
...
return ...
dead_code_block
...
return ...
...
raise ...
dead_code_block
...
raise ...
try:
    pass
except ...:
    ...
pass
try:
    pass
except ...:
    ...
else:
    else_block
else_block
try:
    pass
except ...:
    ...
else:
    else_block
finally:
    final_block
try:
   else_block
finally:
   final_block

Note

If a code block contains continue, global, nonlocal, yield or yield from, it is not removed.

Configuration option: remove_dead_code.

Copy builtin functions to constants

Opt-in optimization (disabled by default) to copy builtin functions to constants.

Example with a function simple:

def log(message):
    print(message)
Bytecode Specialized bytecode
LOAD_GLOBAL   0 (print)
LOAD_FAST     0 (message)
CALL_FUNCTION 1 (1 positional, 0 keyword pair)
POP_TOP
LOAD_CONST    0 (None)
RETURN_VALUE
LOAD_CONST      1 (<built-in function print>)
LOAD_FAST       0 (message)
CALL_FUNCTION   1 (1 positional, 0 keyword pair)
POP_TOP
LOAD_CONST      0 (None)
RETURN_VALUE

The first LOAD_GLOBAL instruction is replaced with LOAD_CONST. LOAD_GLOBAL requires to lookup in the global namespace and then in the builtin namespaces, two dictionary lookups. LOAD_CONST gets the value from a C array, O(1) lookup.

The specialized bytecode requires two guards:

  • print builtin variable
  • print global variable

The print() function is injected in the constants with the func.patch_constants() method.

The optimization on the builtin NAME requires two guards:

  • NAME key in builtin namespace
  • NAME key in global namespace

This optimization is disabled by default because it changes the Python semantics: if the copied builtin function is replaced in the middle of the function, the specialized bytecode still uses the old builtin function. To use the optimization on a project, you may have to add the following configuration at the top of the file:

__fatoptimizer__ = {'copy_builtin_to_constant': False}

Configuration option: copy_builtin_to_constant.

See also:

  • codetransformer: @asconstants(len=len) decorator replaces lookups to the len name with the builtin len() function
  • Thread on python-ideas mailing list: Specifying constants for functions by Serhiy Storchaka, propose to add const len=len (or alternatives) to declare a constant (and indirectly copy a builtin functions to constants)

Simplify iterable

Try to replace literals built at runtime with constants. Replace also range(start, stop, step) with a tuple if the range fits in the configuration.

When range(n) is replaced, two guards are required on range in builtin and global namespaces and the function is specialized.

This optimization helps loop unrolling.

Examples:

Code Simplified iterable
for x in range(3): ... for x in (0, 1, 2): ...
for x in {}: ... for x in (): ...
for x in [4, 5. 6]: ... for x in (4, 5, 6): ...

Configuration option: simplify_iterable.

Function inlining

Replace a function call site with the body of the called function.

Note

The implementation is currently experimental and so disabled by default.

Original code Function inlining
def g():
    return 42

def f():
    return g(x) + 3
def g():
    return 42

def f():
    return 42 + 3

Configuration option: inlining.

Comparison with the peephole optimizer

The CPython peephole optimizer only implements a few optimizations: constant folding, dead code elimination and optimizations of jumps. fatoptimizer implements more optimizations.

The peephole optimizer doesn’t support constant propagation. Example:

def f():
    x = 333
    return x
Regular bytecode fatoptimizer bytecode
LOAD_CONST               1 (1)
STORE_FAST               0 (x)
LOAD_FAST                0 (x)
RETURN_VALUE
LOAD_CONST               1 (333)
STORE_FAST               0 (x)
LOAD_CONST               1 (333)
RETURN_VALUE

The constant folding optimization of the peephole optimizer keeps original constants. For example, "x" + "y" is replaced with "xy" but "x" and "y" are kept. Example:

def f():
    return "x" + "y"
Regular constants fatoptimizer constants
(None, 'x', 'y', 'xy'): 4 constants (None, 'xy'): 2 constants

The peephole optimizer has a similar limitation even when building tuple constants. The compiler produces AST nodes of type ast.Tuple, the tuple items are kept in code constants.