The Descriptor Protocol, and Python Black MagicApril 26, 2016
Late last night, I saw a very confusing tweet:
— Jake VanderPlas (@jakevdp) April 26, 2016
Like any self-respecting programmer, the first thing I did was copy paste that into a terminal, even though I knew exactly what to expect. My response was simple:
Since I graduated last summer, I have been writing lots of both Python 2 and 3. This snippet seemed like something I should understand well. However, I did not, so this post is an attempt to solve that. I was inspired by Julia Evans, and her campaign to share the things she learns, however incomplete her understanding might be.
This post assumes you have at least a basic understanding of Python and OOP. For a good overview of OOP in Python, I recommend Leonardo Giordani’s series which builds up nicely from simple concepts to the internals of Python classes (he also has one on Python 2.x, although I haven’t read it closely).
So, what is this black magic?
My first instinct was to check the behavior of the comparison itself. While
== delegates to an object’s
__eq__ method to check equality, the
is keyword checks identity, so those objects can’t be the same in memory!
As expected! The memory locations (as given by
id) in Python 2 are different, causing the identity check to fail. Not so in
3. So far so good. But why do we get
unbound method on one end and
function on another? How are these objects even stored internally? In most cases, Python uses a dictionary, accessible under
__dict__ to store the local variables, or namespace of an object (Note that not all objects have a
__dict__, but that is a different story). Let’s look up
2 we get an
3 spits out a function, but if we check the
type inside the enclosing
__dict__ we see they are both functions? How does this work? This is caused by the design of the Descriptor Protocol, which defines how data in an object is reached through a series of attribute accesses. In Python 2, the protocol sets in place a
type distinction based on how the function object is accessed. In the doc, Raymond Hettinger explains:
In 3, this distinction between
unbound doesn’t exist, but strangely, the docs for Python 3 are not up to date, so I can’t tell what the underlying behavior is. The same code clearly has a different output:
Also explained in the documentation is the fact that both
unbound methods are backed by the same C implementation, except for the value of their
im_self attribute, which is NULL when unbound. So I am guessing that
instancemethod is creating a new instance of the function object at runtime in
2 regardless of whether it is bound or unbound, while in
3 the instantiation only happens when
bound, given that the
unbounds don’t exist. This would make sense, as the function must be executed each time you access it.
If that were the case, we would expect that calling
b on an instance on A would always return a different object, regardless of which Python runtime we’re on, as they are always bound:
So, the reason why
A.b is A.b in Python 3, and not Python 2 is this whole bound/unbound story. Seems like the Descriptor Protocol is responsible for this sorcery! Magic is just technology we don’t understand, yet.
If you have more insight into the inner workings of this, I’d love to hear about it.
Update (4/26/16): Jake VanderPlas replied to my tweet, and pointed to a 2009 post by Guido describing the behavior. Apparently, the bound/unbound distinction was introduced as a way to achieve “first-class everything,” which methods didn’t quite fit into. Python 3’s undoing of unbound methods is just a further expression of the idea.
Update 2 (4/29/16): Today I received an email from Todd Jennings, who pointed me to the bug that tracks the out-of-date documentation for Python 3. Sadly, it is marked as still waiting.
Update 3 (8/22/16): After attending PyBay, Wesley Chun pointed out that the definition of
A was that of a classic 2.x class, while the rest of the article used new-style classes. Changing the class definition to inherit from object (as in,
class A(object):) doesn’t change the behavior that I describe above, for either Python 2.x or 3.x. To remain true to the original tweet, I have kept the class definition without explicit inheritance, but the distinction is important.
Image: “The Witch No. 1” by Baker, Joseph E. - Licensed under Public Domain, via Wikimedia Commons
Want to see more articles like this? Sign up below: