Making Python run faster: a case study

December 25, 2016

As a langauge, Python was originally designed for faster development cycles, and not for execution speed. This principle remains very much alive today, and although it has garnered the language a bad reputation for its sluggishness, there are a few very common antipatterns that you can avoid to make your Python run faster. In this post, I will demonstrate some of these ideas by exploring solutions to a very simple algorithm: calculating the average (or, to be more precise, the arithmetic mean) out of a list of integers.

For our experiment, we’ll time our functions on three lists of different lenghts, as the size of the list will affect our solution’s runtime. We can start by generating some test data:

import random
random.seed(1)

lengths = [10, 100, 1000]
lsts = [[random.randint(0, 100) for _ in range(length)] for length in lengths]
# [[17, 72, ...another 8], [83, 48, ...another 98], [3, 60, ...another 998]]

With our three test lists, we can start writing up solutions. Notice that all our functions will start with an if lst check, since we’ll only return a value if a list isn’t empty.

The naive solution

Let’s start simple. We iterate through the indices of the list, and we keep track of two values as we go: the running sum, and the running count. When we get to the end, we return the sum over the count. That is the very definition of how to calculate an average.

def mean1(lst):
    if lst:
        running_sum = 0
        running_count = 0
        for i in range(len(lst)):
            running_sum += lst[i]
            running_count += 1
        return running_sum / running_count

This gives us the right solution, and the following runtimes:

List Length	10	100	1000
Runtime (µs)	1.483	9.789	129.695

We can surely do better, though.

Still naive, but more pythonic

If you work with Python long enough, or if you peruse questions on it on Stack Overflow, you’ll see the word pythonic pop up again, and again. Generally, it refers to concise and readable patterns that the community tends to follow. Most times, their popularity is based on actual efficiencies, and well deserved, even if their users don’t even know that’s the case. The word can take the place of “clean,” or “idiomatic,” but ultimately has no real meaning.

What makes the answer above not pythonic, you ask? The iteration pattern. If you come from a language like C/C++, or even Java, you’re probably used to processing iterables via indexing, leading to code like:

for i in range(len(lst)):
    num += lst[i]

While this case looks simple, in more complex algorithms it can be hard to keep track of the i’s and j’s to grab objects out of our list by an index. In Python, most cases of range(len(lst)) are a mistake that we can avoid via the iterator protocol, or perhaps with enumerate if indexed access is unavoidable. This helps us not only make our code more readable, but also more efficient:

def mean2(lst):
    if lst:
        running_sum = 0
        running_count = 0
        for number in lst:
            running_sum += number
            running_count += 1
        return running_sum / running_count

As you can see below, we get a nice performance bump, especially as our input list grows:

List Length	10	100	1000
Runtime (µs)	0.956	7.647	97.220

The preferred way, falling into C

One of the virtues of Python, is that under the hood it is written in very efficient C code. While rolling our own functions tends to be fun, in most cases there is some built-in method that we can call to do the work for us. Knowing that such functions exist, and thinking of them when solving problems is half the battle when writing Python. In this case, its the sum and len functions that we want to invoke:

def mean3(lst):
    if lst:
        return sum(lst) / len(lst)

Since Python’s len implementation on lists takes constant time, the time complexity is dominated by the O(N) operation required to sum, iterating over the numbers only once. (Thanks to Leon Sasson for reminding me that len is O(1))

List Length	10	100	1000
Runtime (µs)	0.409	1.041	7.348

This change gives us more than a performance bump. The code is an order of magnitude faster relative to the previous solution and it gives us a considerably simpler answer.

Using other libraries

It is important to note that there are other libraries that can provide you with a way to compute the mean of a series. In Python 3 there is the statistics.mean function, which uses the Kahan Summation algorithm to reduce numerical errors at the expense of 50 to 100 times more computation time than our preferred solution, depending on list size.

Additionally you could also use numpy.mean which has its own kind of overhead due to its array based operations, but can be very efficient when computing very large datasets thanks to other optimizations.

Neither of these are really comparable to the other mean functions we’ve written, as they serve different, and very specific, purposes. However, since numpy.mean and statistics.mean do compute the mean, innocent programmers might use the less performant but specialized variants unknowingly for no reason.

So what’s the best choice?

Obviously, the measured runtime will vary depending on the machine the tests run on. What we actually care about are the ratios. In the table below I have normalized the values in each column to the most efficient version for a given input list, and also included the numpy and statistics functions for reference:

Solution	Length 10	Length 100	Length 1000
`mean1`	3.6	9.4	17.7
`mean2`	2.3	7.3	13.2
`mean3`	1.0	1.0	1.0
`numpy.mean`	25.9	15.8	9.4
`statistics.mean`	45.9	65.5	79.2

While all the solutions proposed run in linear O(N) time, there are clear winners and losers. If your algorithms don’t require the extra precision, nor the array functionality, you should go with the mean3 version described above.

I would encourage you to play with the code, and generate your own results, by tweaking my script, which you can find here.

Can you think of other more efficient, more clever, ways to calculate the mean of a list with Python? Let me know!

Update (1/5/17): Leon Sasson correctly pointed out that len is O(1) on Python lists. This alters my explanation for mean3’s performance, but not its expected time complexity.

All tests were run on Python 3.5.2 on a mid-2014 MBPr with 16GB of RAM and a 2.5 GHz Intel Core i7 processor.
Image from “Reptiles and birds. A popular account of the various orders” by Figuier, Louis, 1819-1894 Gillmore, Parker. - Licensed under Public Domain, via Internet Archive Book Images