Making Python run faster: a case studyDecember 25, 2016
As a langauge, Python was originally designed for faster development cycles, and not for execution speed. This principle remains very much alive today, and although it has garnered the language a bad reputation for its sluggishness, there are a few very common antipatterns that you can avoid to make your Python run faster. In this post, I will demonstrate some of these ideas by exploring solutions to a very simple algorithm: calculating the average (or, to be more precise, the arithmetic mean) out of a list of integers.
For our experiment, we’ll time our functions on three lists of different lenghts, as the size of the list will affect our solution’s runtime. We can start by generating some test data:
With our three test lists, we can start writing up solutions. Notice that all our functions will start with an
if lst check, since we’ll only return a value if a list isn’t empty.
The naive solution
Let’s start simple. We iterate through the indices of the list, and we keep track of two values as we go: the running sum, and the running count. When we get to the end, we return the sum over the count. That is the very definition of how to calculate an average.
This gives us the right solution, and the following runtimes:
We can surely do better, though.
Still naive, but more pythonic
If you work with Python long enough, or if you peruse questions on it on Stack Overflow, you’ll see the word pythonic pop up again, and again. Generally, it refers to concise and readable patterns that the community tends to follow. Most times, their popularity is based on actual efficiencies, and well deserved, even if their users don’t even know that’s the case. The word can take the place of “clean,” or “idiomatic,” but ultimately has no real meaning.
What makes the answer above not pythonic, you ask? The iteration pattern. If you come from a language like C/C++, or even Java, you’re probably used to processing iterables via indexing, leading to code like:
While this case looks simple, in more complex algorithms it can be hard to keep track of the i’s and j’s to grab objects out of our list by an index. In Python, most cases of
range(len(lst)) are a mistake that we can avoid via the iterator protocol, or perhaps with
enumerate if indexed access is unavoidable. This helps us not only make our code more readable, but also more efficient:
As you can see below, we get a nice performance bump, especially as our input list grows:
The preferred way, falling into C
One of the virtues of Python, is that under the hood it is written in very efficient C code. While rolling our own functions tends to be fun, in most cases there is some built-in method that we can call to do the work for us. Knowing that such functions exist, and thinking of them when solving problems is half the battle when writing Python. In this case, its the
len functions that we want to invoke:
len implementation on lists takes constant time, the time complexity is dominated by the
O(N) operation required to sum, iterating over the numbers only once. (Thanks to Leon Sasson for reminding me that
This change gives us more than a performance bump. The code is an order of magnitude faster relative to the previous solution and it gives us a considerably simpler answer.
Using other libraries
It is important to note that there are other libraries that can provide you with a way to compute the mean of a series. In Python 3 there is the
statistics.mean function, which uses the Kahan Summation algorithm to reduce numerical errors at the expense of 50 to 100 times more computation time than our preferred solution, depending on list size.
Additionally you could also use
numpy.mean which has its own kind of overhead due to its array based operations, but can be very efficient when computing very large datasets thanks to other optimizations.
Neither of these are really comparable to the other
mean functions we’ve written, as they serve different, and very specific, purposes. However, since
statistics.mean do compute the mean, innocent programmers might use the less performant but specialized variants unknowingly for no reason.
So what’s the best choice?
Obviously, the measured runtime will vary depending on the machine the tests run on. What we actually care about are the ratios. In the table below I have normalized the values in each column to the most efficient version for a given input list, and also included the
statistics functions for reference:
|Solution||Length 10||Length 100||Length 1000|
While all the solutions proposed run in linear
O(N) time, there are clear winners and losers. If your algorithms don’t require the extra precision, nor the array functionality, you should go with the
mean3 version described above.
I would encourage you to play with the code, and generate your own results, by tweaking my script, which you can find here.
Can you think of other more efficient, more clever, ways to calculate the mean of a list with Python? Let me know!
Update (1/5/17): Leon Sasson correctly pointed out that
O(1) on Python lists. This alters my explanation for
mean3’s performance, but not its expected time complexity.
- All tests were run on Python 3.5.2 on a mid-2014 MBPr with 16GB of RAM and a 2.5 GHz Intel Core i7 processor.
- Image from “Reptiles and birds. A popular account of the various orders” by Figuier, Louis, 1819-1894 Gillmore, Parker. - Licensed under Public Domain, via Internet Archive Book Images
Want to see more articles like this? Sign up below: