An Apprentice Experiment in Python Programming, Part 4
[Note to readers: The Jupyter notebook version of this post is here]
Python Objects in Memory (from comments)
In the previous post, purge commented:
Due to Python’s style of reference passing, most of these print statements will show matching id values even if you use any kind of object, not just True/False. Try to predict the output here, then run it to check:
def compare(x, y): print(x == y, id(x) == id(y), x is y) a = {"0": "1"} b = {"0": "1"} print(a == b, id(a) == id(b), a is b) compare(a, b) c = a d = a print(c == d, id(c) == id(d), c is d) compare(c, d)
When I was coming up with an answer to this question, I got stuck on what the operator is
did. I only had a vague sense of how to use it—I knew comparison with None
was done via is
but didn’t know why—so I had to look up what is
actually did.
Identity comparisons
The operators “is” and “is not” test for an object’s identity: “x is y” is true if and only if x and y are the same object. An Object’s identity is determined using the “id()” function. “x is not y” yields the inverse truth value.
Here’s the doc for id()
id(obj, /) Return the identity of an object. This is guaranteed to be unique among simultaneously existing objects. (CPython uses the object’s memory address.)
Then I understood that is
would literally check if two objects are the same object. So in the above example we’d get True False False
from print(a == b, id(a) == id(b), a is b)
and True True True
from print(c == d, id(c) == id(d), c is d)
Object Storage in Memory
Speaking of checking if two objects being the same object stored in the same location in memory, gilch made more comments about object storage models (paraphrased):
Compared to C/C++, Python has a more consistent object storage model: everything is an object, only references to objects are stored on the stack, pointing to the actual objects stored in the heap. This means that Python objects are scattered all over the place. One important aspect of CPU optimization is caching contiguous blocks of memory in CPU caches, but Python’s model cause cache-miss to be high since two objects adjacent to each other in memory are likely unrelated. This performance degradation is the price for Python’s simple memory model.
For computing tasks that have high requirement on performance, NumPy is optimized for making use of blocks of contiguous memory.
and is
Some remarks gilch made about ==
and is
The ==
operator calls the __eq__
method of an object. The default __eq__
inherits from is
, and does a check if two objects are the same object. (Source?) We can have two instances of a number, but not two instances of a True
or False
and Set Constructor
We went into a tangent where gilch checked my understanding of sets. We encountered some corner cases like Python interpreting True
as 1
and False
as 0
>>> {1, True}
is used to represent both sets and dictionaries, but {}
itself would be interpreted as an empty dictionary instead of an empty set:
>>> type({})
<class 'dict'>
To make an empty set, we’d use the set()
>>> set()
Gilch gave me a puzzle: make an empty set without using the set()
I came up with the answer {1} - {1}
pretty quickly, but gilch had another solution in mind that did not involve using any numbers or letters. Hint: passing in iterables to a constructor results in different values than passing in the same iterables in expressions:
>>> list("hello")
['h', 'e', 'l', 'l', 'o']
>>> ["hello"]
>>> set("hello")
{'o', 'l', 'e', 'h'}
>>> {"hello"}
Using splat, the other way to make an empty set without using the set()
constructor is
>>> {*[]}
Magic Methods for Attributes (Continued from last time)
When I was working on the solution that involved modifying the __dict__
last time, I was getting pretty confused about the difference between dir()
, vars()
and __dict__
Gilch started by asking me to construct a simple class and making an instance:
class SimpleClass:
def __init__(self, x):
self.x = x
sc = SimpleClass(42)
Then we listed out the attributes of sc
in different ways:
>>> dir(sc)
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'x']
>>> set(dir(sc)) - set(dir(object))
{'__weakref__', '__module__', 'x', '__dict__'}
>>> set(dir(sc)) - set(dir(type(sc))) - set(dir(object))
>>> sc.x
>>> vars(sc)
{'x': 42}
>>> sc.__dict__
{'x': 42}
>>> type(sc).__dict__
mappingproxy({'__module__': '__main__', '__init__': <function SimpleClass.__init__ at 0x7feba0967dc0>, '__dict__': <attribute '__dict__' of 'SimpleClass' objects>, '__weakref__': <attribute '__weakref__' of 'SimpleClass' objects>, '__doc__': None})
The difference between dir
and vars
is that dir
returns all attributes of an object, including the attributes of its class and attributes inherited from its superclasses; on the other hand, vars
only returns attributes stored in the default __dict__
attribute, which excludes inherited attributes. This StackOverflow question goes into more details.
stands for “method resolution order,” which provides the inheritance path from the current class all the way up to object
. It is honestly the most handy tool I’ve learned from this session.
Note that __mro__
is a class attribute, not an instance attribute:
>>> sc.__mro__
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'SimpleClass' object has no attribute '__mro__'
>>> type(sc).__mro__
(<class '__main__.SimpleClass'>, <class 'object'>)
Magic Methods for Attributes
Now we can verify that dir(sc)
returns the sum of vars(sc)
, vars(SimpleClass)
and vars(object)
>>> vars(sc)
{'x': 42}
>>> vars(type(sc))
mappingproxy({'__module__': '__main__', '__init__': <function SimpleClass.__init__ at 0x7f2ce3b79dc0>, '__dict__': <attribute '__dict__' of 'SimpleClass' objects>, '__weakref__': <attribute '__weakref__' of 'SimpleClass' objects>, '__doc__': None})
>>> vars(object)
mappingproxy({'__repr__': <slot wrapper '__repr__' of 'object' objects>, '__hash__': <slot wrapper '__hash__' of 'object' objects>, '__str__': <slot wrapper '__str__' of 'object' objects>, '__getattribute__': <slot wrapper '__getattribute__' of 'object' objects>, '__setattr__': <slot wrapper '__setattr__' of 'object' objects>, '__delattr__': <slot wrapper '__delattr__' of 'object' objects>, '__lt__': <slot wrapper '__lt__' of 'object' objects>, '__le__': <slot wrapper '__le__' of 'object' objects>, '__eq__': <slot wrapper '__eq__' of 'object' objects>, '__ne__': <slot wrapper '__ne__' of 'object' objects>, '__gt__': <slot wrapper '__gt__' of 'object' objects>, '__ge__': <slot wrapper '__ge__' of 'object' objects>, '__init__': <slot wrapper '__init__' of 'object' objects>, '__new__': <built-in method __new__ of type object at 0x955f60>, '__reduce_ex__': <method '__reduce_ex__' of 'object' objects>, '__reduce__': <method '__reduce__' of 'object' objects>, '__subclasshook__': <method '__subclasshook__' of 'object' objects>, '__init_subclass__': <method '__init_subclass__' of 'object' objects>, '__format__': <method '__format__' of 'object' objects>, '__sizeof__': <method '__sizeof__' of 'object' objects>, '__dir__': <method '__dir__' of 'object' objects>, '__class__': <attribute '__class__' of 'object' objects>, '__doc__': 'The base class of the class hierarchy.\n\nWhen called, it accepts no arguments and returns a new featureless\ninstance that has no instance attributes and cannot be given any.\n'})
>>> type(sc)
<class '__main__.SimpleClass'>
>>> list(vars(sc).keys()) + list(vars(SimpleClass).keys()) + list(vars(object).keys())
['x', '__module__', '__init__', '__dict__', '__weakref__', '__doc__', '__repr__', '__hash__', '__str__', '__getattribute__', '__setattr__', '__delattr__', '__lt__', '__le__', '__eq__', '__ne__', '__gt__', '__ge__', '__init__', '__new__', '__reduce_ex__', '__reduce__', '__subclasshook__', '__init_subclass__', '__format__', '__sizeof__', '__dir__', '__class__', '__doc__']
>>> set(_) == set(dir(sc))
Why did we need to covert the two lists to sets when comparing them at the end?
>>> sorted(list(vars(sc).keys()) + list(vars(SimpleClass).keys()) + list(vars(object).keys()))
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'x']
Two of the attributes, __init__
and __doc__
, were overridden.
>>> SimpleClass.__init__
<function SimpleClass.__init__ at 0x7f2ce3b79dc0>
>>> object.__init__
<slot wrapper '__init__' of 'object' objects>
>>> SimpleClass.__doc__
>>> object.__doc__
'The base class of the class hierarchy.\n\nWhen called, it accepts no arguments and returns a new featureless\ninstance that has no instance attributes and cannot be given any.\n'
Inheritance and __mro__
Noticing that I didn’t understand inheritance completely, gilch gave another example.
class SimpleClass:
def __init__(self, x):
self.x = x
x = 42
class SimpleClass2:
x = 24
class SimpleClass3(SimpleClass, SimpleClass2):
Here, SimpleClass3
inherits from SimpleClass
and SimpleClass2
. Both SimpleClass
and SimpleClass2
have implemented class method x
, which one would SimpleClass3
>>> SimpleClass3.x
>>> SimpleClass3.__mro__
(<class '__main__.SimpleClass3'>, <class '__main__.SimpleClass'>, <class '__main__.SimpleClass2'>, <class 'object'>)
However, this changes when we switch the order of inheritance:
class SimpleClass3(SimpleClass2, SimpleClass): # SimpleClass2 now comes first
>>> SimpleClass3.x
>>> SimpleClass3.__mro__
(<class '__main__.SimpleClass3'>, <class '__main__.SimpleClass2'>, <class '__main__.SimpleClass'>, <class 'object'>)
So the inheritance order decides which superclass takes precedence. The Python documentation on method resolution order as well as this talk gives more detailed explanations of the algorithm.
is used for saving memory.
class SimpleClass4:
__slots__ = ()
sc4 = SimpleClass4()
>>> sc4.__dict__
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'SimpleClass4' object has no attribute '__dict__'
>>> SimpleClass4.x = 42
>>> sc4.x = 0
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'SimpleClass4' object attribute 'x' is read-only
What happened here is that by overriding __slots__
we have restricted the __dict__
attribute of any instance of SimpleClass4
. Not adding instance methods means less memory used.
>>> dir(sc4)
['__class__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__slots__', '__str__', '__subclasshook__', 'x']
>>> vars(sc4)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: vars() argument must have __dict__ attribute
>>> vars(type(sc4))
mappingproxy({'__module__': '__main__', '__slots__': (), '__doc__': None, 'x': 42})
As we can see here, sc4
does not have a __dict__
attribute here, so vars(sc4)
has become invalid too.
Accessing Attributes of a Superclass
Next, gilch provided an example of using the keyword super
. First, we create a class NewTuple
that inherits from tuple
class NewTuple(tuple):
def __init__(self, x):
self.x = x
Then we can access the constructor of the superclass by calling super().__new__
and passing in the tuple
class as the first argument:
class NewTuple(tuple):
def __init__(self, x):
def __new__(cls, y):
return super().__new__(tuple, [y])
>>> NewTuple(2)
>>> type(_)
<class 'tuple'>
We get a tuple
object when we call NewTuple()
. However, this only works for subtypes of the superclass of the current class. If we pass in—list
which is not a subclass of—tuple
we would get an error:
class NewTuple(tuple):
def __init__(self, x):
def __new__(cls, y):
return super().__new__(list, [y]) # passing in list instead of tuple
>>> NewTuple(2)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "", line 8, in __new__
return super().__new__(list, [y])
TypeError: tuple.__new__(list): list is not a subtype of tuple
Of course, we can always pass in the current class to make the constructor return an instance of the current class:
class NewTuple(tuple):
def __init__(self, x):
def __new__(cls, y):
return super().__new__(cls, [y]) # passing in cls
>>> type(NewTuple(2))
<class '__main__.NewTuple'>
Next puzzle from gilch: make a @trace
decorator that prints inputs and return values.
I came up with a first pass solution:
def trace(f):
return lambda *args: print(*args, f(*args))
def addition(x, y):
return x + y
>>> addition(2, 5)
2 5 7
Then gilch added a condition: the decorated function still needs to return the same value as the undecorated version.
I was pretty stumped on this one. It seemed that I’d need two different statements in the lambda function returned by the decorator for this to work, one to do the printing and the other one to return the value. So gilch gave me a hint: Think about what the expression print('hi') or 1 + 2
evaluates to. Then it occurred to me that, since print
returns None
, I could use or
to combine statements as long as only one of them evaluates to something with boolean value True
. After an attempt, I also realized that the statement that produces the True
value would need to come last to prevent the expression evaluation being short-circuited.
def trace(f):
r = []
return lambda *args, **kwargs: r.append(f(*args, **kwargs)) or print(*r, args, kwargs) or r.pop()
def addition(x, y):
return x + y
>>> addition(2, 3)
5 (2, 3) {}
>>> addition(2, y=3)
5 (2,) {'y': 3}
>>> addition(x=2, y=3)
5 () {'x': 2, 'y': 3}
Gilch asked me to write a function named progn
that takes any number of parameters and only returns the last one. Using progn
, we can get rid of the or
def progn(*args):
return args[-1]
def trace(f):
r = []
return lambda *args, **kwargs: progn(r.append(f(*args, **kwargs)), print(*r, args, kwargs), r.pop())
def addition(x, y):
return x + y
>>> addition(2, 3)
5 (2, 3) {}
>>> addition(2, y=3)
5 (2,) {'y': 3}
Assignment Expression
Gilch introduced the assignment expression, and we rewrote the solution to use it:
def progn(*args):
return args[-1]
def trace(f):
return lambda *args, **kwargs: progn(
r := f(*args, **kwargs), # moved r inside of the lambda
print(args, kwargs, r),
def addition(x, y):
return x + y
>>> addition(2, y=3)
(2,) {'y': 3}
Earlier I was stumped because I wanted to put two statements inside a lambda function but couldn’t. With progn
and :=
, it’s possible to combine multiple statements into one, so effectively create a lambda with multiple statements.
Nice! I always enjoy reading these logs :-)
Numpy also has the enormous advantage of implementing all the numeric operators in C (or Fortran, or occasionally assembly. (If you want hardware accelerators, interop is a promising work in progress)
You can substantially reduce memory fragmentation and GC pressure with only the standard library
module andmemoryview
builtin type, if your data suits that pattern. This is particularly useful to implement zero-copy algorithms for IO processing; as soon as the buffer is in memory anywere you just take pointers to slices rather than creating new objects.JIT implementations of Python (PyPy, Pyjion, etc) are also usually pretty good at reducing the perf impact of Python’s memory model, at least if your program is reasonably sensible about what and when it allocates.
Sounds like you’re partway to updating for Python 3!
For the avoidance of doubt, the “obvious way” to do this (for an acculturated Python programmer) is with a nested
, which makes theprogn
thing non-obvious and therefore unpythonic. I strongly hinted at the obvious approach here, but konstell latched onto using alambda
instead (probably because she didn’t realize that named functions could also be closures). I saw a teaching opportunity in this, so I rolled with it. I got to dispel the myth that lambdas can only have one line and also introduced assignment expressions. I was going to get around to the obvious way, but we ran out of time.konstell is using the terminology is a little imprecisely here. In Python, an “expression” evaluates to an object, while a “statement” is an instruction that does not evaluate to an object (not even
). Most statement types can contain expressions, however expressions cannot contain statements (exec()
doesn’t count).One of the simplest types of statements in Python is the “expression statement”, which contains a single expression and discards its result. A
expression can discard the results of subexpressions in a similar way, making them act like expression statements, but they are not technically Python statements. We also found an expression substitute for an assignment statement. It’s ultimately possible to use expressions for everything you’d normally use statements for, but this is not the “obvious way” to do it.See my Drython and Hissp projects for more on “onlinerizing” Python.