## Three Pathological cases for the PyPy JIT

At PyCon, Maciej from the PyPy team asked for programs that the PyPy JIT did a really bad job on.

I have a directory full of over 200 little Python programs that attempt to solve Project Euler math problems. Only 186 of them get the right answer, and fewer of them run in a reasonable period of time. But this led me to concoct a quick benchmark comparing the Python implementations installed on my computer: CPython, CPython with Psyco, Unladen Swallow, and PyPy. (I haven't tested them with Jython or IronPython yet. Maybe someday.)

Basically, I just run all of euler*.py under four different Python implementations, record how long it took each implementation to run each program, and throw out the results for any programs that didn't work in all four versions of Python, or that didn't finish in 60 seconds or less on all of them.

Here are the results, for recent svn checkouts of PyPy and Unladen, CPython 2.6.4, and CPython 2.6.4 plus Psyco 1.6. (Don't get too obsessed with the exact results, since it's just a private benchmark and most of the code is unpublished. This is just background for the real information at the bottom of the post.)

script | pypy | unladen | psyco | CPython |
---|---|---|---|---|

euler1.py | 0.66 | 0.60 | 0.11 | 0.61 |

euler2.py | 0.10 | 0.10 | 0.10 | 0.40 |

euler3.py | 0.21 | 0.51 | 0.31 | 0.41 |

euler4.py | 0.20 | 0.51 | 0.20 | 0.31 |

euler5.py | 0.10 | 0.10 | 0.10 | 0.10 |

euler6.py | 0.10 | 0.10 | 0.10 | 0.10 |

euler7.py | 0.11 | 0.20 | 0.10 | 0.31 |

euler8.py | 0.10 | 0.10 | 0.10 | 0.10 |

euler9.py | 0.10 | 0.20 | 0.10 | 0.20 |

euler10.py | 2.23 | 3.98 | 1.74 | 7.73 |

euler11.py | 0.11 | 0.10 | 0.11 | 0.10 |

euler13.py | 0.10 | 0.10 | 0.10 | 0.10 |

euler14.py | 44.79 | 2.64 | 1.33 | 2.96 |

euler15.py | 0.10 | 0.10 | 0.10 | 0.10 |

euler16.py | 0.10 | 0.10 | 0.11 | 0.10 |

euler18.py | 0.10 | 0.10 | 0.10 | 0.10 |

euler19.py | 0.10 | 0.10 | 0.10 | 0.10 |

euler20.py | 0.10 | 0.10 | 0.10 | 0.10 |

euler21.py | 0.10 | 0.20 | 0.10 | 0.20 |

euler22.py | 0.10 | 0.10 | 0.10 | 0.10 |

euler23.py | 4.55 | 20.96 | 4.08 | 10.50 |

euler24.py | 6.77 | 4.18 | 7.88 | 4.65 |

euler25.py | 0.61 | 0.81 | 0.82 | 0.81 |

euler26.py | 8.47 | 10.10 | 9.09 | 10.41 |

euler27.py | 3.06 | 6.37 | 1.23 | 8.64 |

euler28.py | 0.10 | 0.10 | 0.10 | 0.10 |

euler29.py | 0.10 | 0.10 | 0.10 | 0.10 |

euler30.py | 3.04 | 3.55 | 5.62 | 3.85 |

euler32.py | 2.96 | 3.34 | 4.07 | 3.57 |

euler33.py | 0.10 | 0.11 | 0.10 | 0.10 |

euler34.py | 7.08 | 11.96 | 11.12 | 12.96 |

euler35.py | 5.97 | 18.01 | 9.91 | 25.38 |

euler36.py | 1.83 | 2.93 | 3.24 | 2.83 |

euler37.py | 11.23 | 10.83 | 17.89 | 12.13 |

euler38.py | 1.92 | 1.22 | 1.63 | 1.42 |

euler39.py | 0.10 | 0.30 | 0.10 | 0.20 |

euler40.py | 0.81 | 0.91 | 0.51 | 0.71 |

euler41.py | 4.18 | 3.54 | 4.86 | 4.05 |

euler42.py | 0.10 | 0.10 | 0.10 | 0.10 |

euler43.py | 36.69 | 27.91 | 37.50 | 29.19 |

euler44.py | 1.02 | 7.98 | 0.72 | 4.50 |

euler45.py | 6.95 | 2.54 | 2.36 | 2.45 |

euler46.py | 0.21 | 0.91 | 0.21 | 1.23 |

euler47.py | 0.71 | 1.92 | 0.61 | 2.97 |

euler48.py | 0.10 | 0.21 | 0.21 | 0.10 |

euler49.py | 0.31 | 1.02 | 0.72 | 0.72 |

euler51.py | 22.68 | 27.90 | 49.11 | 31.86 |

euler52.py | 0.71 | 0.81 | 0.91 | 1.11 |

euler53.py | 0.20 | 0.20 | 0.20 | 0.31 |

euler54.py | 0.30 | 0.20 | 0.21 | 0.21 |

euler56.py | 0.83 | 0.71 | 0.92 | 0.62 |

euler57.py | 0.41 | 0.51 | 0.52 | 0.72 |

euler58.py | 6.57 | 37.76 | 6.77 | 58.48 |

euler59.py | 12.85 | 6.88 | 9.81 | 15.98 |

euler61.py | 0.10 | 0.20 | 0.31 | 0.21 |

euler62.py | 0.31 | 0.20 | 0.41 | 0.20 |

euler63.py | 0.21 | 0.10 | 0.20 | 0.20 |

euler64.py | 10.93 | 38.67 | 44.00 | 59.01 |

euler65.py | 0.10 | 0.10 | 0.11 | 0.11 |

euler66.py | 2.43 | 8.49 | 9.82 | 12.13 |

euler67.py | 0.11 | 0.11 | 0.10 | 0.11 |

euler68.py | 0.10 | 0.10 | 0.10 | 0.10 |

euler69.py | 0.10 | 0.11 | 0.10 | 0.11 |

euler70.py | 0.41 | 0.73 | 1.03 | 0.61 |

euler71.py | 0.32 | 1.53 | 0.31 | 1.31 |

euler72.py | 7.85 | 32.59 | 9.77 | 56.56 |

euler73.py | 9.19 | 27.32 | 2.87 | 25.50 |

euler75.py | 5.15 | 2.22 | 0.61 | 2.32 |

euler77.py | 0.31 | 0.32 | 0.23 | 0.51 |

euler79.py | 0.20 | 0.20 | 0.10 | 0.10 |

euler80.py | 0.71 | 0.71 | 0.71 | 0.61 |

euler81.py | 0.10 | 0.20 | 0.10 | 0.10 |

euler82.py | 0.71 | 0.31 | 0.30 | 0.20 |

euler83.py | 0.20 | 0.61 | 0.41 | 0.51 |

euler84.py | 2.93 | 14.24 | 4.76 | 17.89 |

euler85.py | 6.67 | 7.20 | 11.16 | 11.84 |

euler87.py | 1.63 | 1.32 | 0.41 | 0.82 |

euler89.py | 0.10 | 0.10 | 0.10 | 0.10 |

euler93.py | 3.13 | 5.26 | 5.31 | 11.75 |

euler94.py | 34.40 | 26.40 | 26.30 | 27.80 |

euler97.py | 3.44 | 4.56 | 2.35 | 3.34 |

euler98.py | 0.30 | 0.61 | 0.51 | 0.51 |

euler99.py | 0.10 | 0.10 | 0.10 | 0.10 |

euler100.py | 0.10 | 0.10 | 0.10 | 0.10 |

euler101.py | 0.20 | 0.10 | 0.10 | 0.10 |

euler102.py | 0.10 | 0.10 | 0.10 | 0.10 |

euler103.py | 0.10 | 0.10 | 0.10 | 0.10 |

euler105.py | 0.61 | 0.30 | 0.81 | 0.30 |

euler106.py | 0.61 | 0.51 | 0.81 | 0.31 |

euler107.py | 0.20 | 0.41 | 0.20 | 0.20 |

euler108.py | 3.77 | 13.74 | 7.14 | 9.99 |

euler109.py | 0.31 | 1.22 | 0.31 | 1.64 |

euler111.py | 3.27 | 15.82 | 4.19 | 15.28 |

euler112.py | 5.96 | 12.45 | 11.42 | 13.50 |

euler114.py | 0.12 | 0.21 | 0.10 | 0.11 |

euler115.py | 0.22 | 0.41 | 0.30 | 0.62 |

euler116.py | 0.11 | 0.11 | 0.10 | 0.11 |

euler117.py | 0.10 | 0.10 | 0.10 | 0.11 |

euler119.py | 0.10 | 0.10 | 0.10 | 0.10 |

euler120.py | 0.10 | 0.10 | 0.10 | 0.10 |

euler121.py | 0.10 | 0.20 | 0.10 | 0.20 |

euler123.py | 5.87 | 5.26 | 8.78 | 7.05 |

euler124.py | 0.82 | 1.83 | 0.71 | 2.43 |

euler125.py | 0.93 | 1.42 | 1.33 | 1.32 |

euler135.py | 28.39 | 5.22 | 2.65 | 4.19 |

euler142.py | 0.20 | 0.81 | 0.11 | 0.41 |

euler157.py | 0.10 | 0.10 | 0.10 | 0.10 |

euler162.py | 0.10 | 0.10 | 0.10 | 0.10 |

euler171.py | 0.12 | 0.11 | 0.10 | 0.10 |

euler173.py | 0.92 | 1.42 | 1.02 | 1.01 |

euler174.py | 35.72 | 4.75 | 2.96 | 3.27 |

euler181.py | 0.10 | 0.10 | 0.11 | 0.10 |

euler190.py | 0.10 | 0.10 | 0.10 | 0.10 |

euler193.py | 0.10 | 0.10 | 0.10 | 0.10 |

euler202.py | 0.10 | 0.10 | 0.10 | 0.10 |

euler205.py | 1.83 | 0.72 | 2.35 | 0.71 |

euler207.py | 0.11 | 0.53 | 0.32 | 0.42 |

euler222.py | 0.10 | 0.10 | 0.10 | 0.10 |

euler233.py | 0.10 | 0.10 | 0.11 | 0.10 |

euler234.py | 5.77 | 6.62 | 7.38 | 7.48 |

euler235.py | 0.20 | 0.20 | 0.42 | 0.20 |

euler240.py | 16.17 | 14.96 | 17.38 | 19.59 |

total | 409.3 | 492.26 | 393.54 | 593.8 |

wins | 77 | 48 | 67 | 45 |

So all the JITs are faster than CPython for this set of programs, but none is twice as fast. PyPy has the most wins, but Psyco has the lowest total time.

While PyPy does very well overall, it's an order of magnitude slower than all the others on 3 of the programs: 14, 135, and 174. If PyPy fixed whatever made those ones slow, it would definitely be the overall leader.

Here's my euler14.py

#!/usr/bin/env python """Project Euler, problem 14 The following iterative sequence is defined for the set of positive integers: n -> n / 2 (n is even) n -> 3n + 1 (n is odd) Which starting number, under one million, produces the longest chain? """ def next_num(num): if num & 1: return 3 * num + 1 else: return num // 2 MAX_NUM = 1000000 lengths = {1: 0} def series_length(num): global lengths if num in lengths: return lengths[num] else: num2 = next_num(num) result = 1 + series_length(num2) lengths[num] = result return result def main(): num_with_max_length = 1 max_length = 0 for ii in range(1, MAX_NUM): length = series_length(ii) if length > max_length: max_length = length num_with_max_length = ii print num_with_max_length, max_length if __name__ == "__main__": main()

Why is 14 so slow? My guess would be that it's because it heavily mutates a global dictionary.

Here's euler135.py:

#!/usr/bin/env python """Project Euler, problem 135 Given the positive integers, x, y, and z, are consecutive terms of an arithmetic progression, the least value of the positive integer, n, for which the equation, x**2 - y**2 - z**2 = n, has exactly two solutions is n = 27: 34**2 - 27**2 - 20**2 = 12**2 - 9**2 - 6**2 = 27 It turns out that n = 1155 is the least value which has exactly ten solutions. How many values of n less than one million have exactly ten distinct solutions? """ """ x**2 - y**2 - z**2 = n y = z + d x = z + 2 * d d > 0 z > 0 (z + 2 d) ** 2 - (z + d) ** 2 - z ** 2 = n (z**2 + 4dz + 4d**2) - (z**2 + 2dz + d**2) - z**2 = n z**2 + 4dz + 4d**2 -z**2 - 2dz - d**2 - z**2 = n -z**2 + 2dz + 3d**2 - n = 0 a = -1 b = 2d c = 3d**2 - n z = (-b +/- sqrt(b**2-4ac)) / 2a z = (-2d +/- sqrt(4d**2+4(3d**2-n))) / -2 z = (-2d +/- sqrt(4d**2+12d**2-4n))) / -2 z = (-2d +/- sqrt(16d**2-4n))) / -2 z = (-2d +/- 2 sqrt(4d**2 - n)) / -2 z = (-d +/- sqrt(4d**2 - n)) / -1 z = d +/- sqrt(4d**2 - n) 4d**2 - n >= 0 n <= 4d**2 4d**2 - n is a perfect square """ def main(): limit = 1000000 counts = limit * [0] for d in range(1, limit / 4 + 1): d24 = d ** 2 * 4 if d24 < limit: start = 1 counts[d24] += 1 else: start = int((d24 - limit) ** 0.5) if start < 1: start = 1 for root in xrange(start, 2 * d): n = d24 - root ** 2 if n <= 0: break if n < limit: z = root + d y = z + d x = y + d #print "n", n, "d", d, "root", root, "x", x, "y", y, "z", z #assert x**2 - y**2 - z**2 == n counts[n] += 1 if d > root: z = d - root y = z + d x = y + d #print "n", n, "d", d, "root", root, "x", x, "y", y, "z", z #assert x**2 - y**2 - z**2 == n counts[n] += 1 total = 0 assert counts[0] == 0 for val in counts: if val == 10: total += 1 print total if __name__ == "__main__": main()

Why is PyPy so slow on this one? Well, maybe it's because all the work happens in one function, if PyPy doesn't JIT a function while it's still running. Or maybe it's the list with a million elements in it.

Finally, euler174.py:

#!/usr/bin/env python """Project Euler, problem 174 We shall define a square lamina to be a square outline with a square "hole" so that the shape possesses vertical and horizontal symmetry. Given eight tiles it is possible to form a lamina in only one way: 3x3 square with a 1x1 hole in the middle. However, using thirty-two tiles it is possible to form two distinct laminae. If t represents the number of tiles used, we shall say that t = 8 is type L(1) and t = 32 is type L(2). Let N(n) be the number of t <= 1000000 such that t is type L(n); for example, N(15) = 832. What is sum(N(n)) for 1 <= n <= 10? """ from math import ceil from collections import defaultdict def gen_laminae(limit): """Yield laminae with up to limit squares, as tuples (outer, inner, squares)""" for outer in range(3, limit // 4 + 2): if outer & 1: min_min_inner = 1 else: min_min_inner = 2 min_inner_squared = outer ** 2 - limit if min_inner_squared < 0: min_inner = min_min_inner else: min_inner = max(min_min_inner, int(ceil(min_inner_squared ** 0.5))) if outer & 1 != min_inner & 1: min_inner += 1 outer_squared = outer ** 2 for inner in range(min_inner, outer - 1, 2): squares = outer_squared - inner ** 2 yield (outer, inner, squares) def main(): squares_to_count = defaultdict(int) for (outer, inner, squares) in gen_laminae(1000000): squares_to_count[squares] += 1 histogram = defaultdict(int) for val in squares_to_count.values(): if val <= 10: histogram[val] += 1 print sum(histogram.values()) if __name__ == "__main__": main()

Perhaps this one is slow because it heavily uses generators and a defaultdict?

Anyway, I'm not claiming any of these are great programs. I know others have much better solutions to these problems. And this code is certainly not optimized for PyPy. They're just examples of small programs that were originally written for CPython, and do work correctly on PyPy, but happen to be much slower on PyPy than on CPython.

Again, note these are the exceptions not the rule. PyPy runs most pure-Python code great. Just not these three programs.