The Descriptor Protocol, and Python Black Magic
April 26, 2016Late last night, I saw a very confusing tweet:
— Jake VanderPlas (@jakevdp) April 26, 2016
Like any self-respecting programmer, the first thing I did was copy paste that into a terminal, even though I knew exactly what to expect. My response was simple:
@jakevdp what is this black magic!?!?— Avy Faingezicht (@avyfain) April 26, 2016
Since I graduated last summer, I have been writing lots of both Python 2 and 3. This snippet seemed like something I should understand well. However, I did not, so this post is an attempt to solve that. I was inspired by Julia Evans, and her campaign to share the things she learns, however incomplete her understanding might be.
This post assumes you have at least a basic understanding of Python and OOP. For a good overview of OOP in Python, I recommend Leonardo Giordani’s series which builds up nicely from simple concepts to the internals of Python classes (he also has one on Python 2.x, although I haven’t read it closely).
So, what is this black magic?
My first instinct was to check the behavior of the comparison itself. While ==
delegates to an object’s __eq__
method to check equality, the is
keyword checks identity, so those objects can’t be the same in memory!
As expected! The memory locations (as given by id
) in Python 2 are different, causing the identity check to fail. Not so in 3
. So far so good. But why do we get unbound method
on one end and function
on another? How are these objects even stored internally? In most cases, Python uses a dictionary, accessible under __dict__
to store the local variables, or namespace of an object (Note that not all objects have a __dict__
, but that is a different story). Let’s look up b
in A
:
Huh? In 2
we get an instancemethod
, while 3
spits out a function, but if we check the type
inside the enclosing __dict__
we see they are both functions? How does this work? This is caused by the design of the Descriptor Protocol, which defines how data in an object is reached through a series of attribute accesses. In Python 2, the protocol sets in place a type
distinction based on how the function object is accessed. In the doc, Raymond Hettinger explains:
In 3, this distinction between bound
and unbound
doesn’t exist, but strangely, the docs for Python 3 are not up to date, so I can’t tell what the underlying behavior is. The same code clearly has a different output:
Also explained in the documentation is the fact that both bound
and unbound
methods are backed by the same C implementation, except for the value of their im_self
attribute, which is NULL when unbound. So I am guessing thatinstancemethod
is creating a new instance of the function object at runtime in 2
regardless of whether it is bound or unbound, while in 3
the instantiation only happens when bound
, given that the unbound
s don’t exist. This would make sense, as the function must be executed each time you access it.
If that were the case, we would expect that calling b
on an instance on A would always return a different object, regardless of which Python runtime we’re on, as they are always bound:
So, the reason why A.b is A.b
in Python 3, and not Python 2 is this whole bound/unbound story. Seems like the Descriptor Protocol is responsible for this sorcery! Magic is just technology we don’t understand, yet.
If you have more insight into the inner workings of this, I’d love to hear about it.
Update (4/26/16): Jake VanderPlas replied to my tweet, and pointed to a 2009 post by Guido describing the behavior. Apparently, the bound/unbound distinction was introduced as a way to achieve “first-class everything,” which methods didn’t quite fit into. Python 3’s undoing of unbound methods is just a further expression of the idea.
Update 2 (4/29/16): Today I received an email from Todd Jennings, who pointed me to the bug that tracks the out-of-date documentation for Python 3. Sadly, it is marked as still waiting.
Update 3 (8/22/16): After attending PyBay, Wesley Chun pointed out that the definition of A
was that of a classic 2.x class, while the rest of the article used new-style classes. Changing the class definition to inherit from object (as in, class A(object):
) doesn’t change the behavior that I describe above, for either Python 2.x or 3.x. To remain true to the original tweet, I have kept the class definition without explicit inheritance, but the distinction is important.
Image: “The Witch No. 1” by Baker, Joseph E. - Licensed under Public Domain, via Wikimedia Commons
Want to see more articles like this? Sign up below: