Python and Memory Management, Part 1: Objects
Source: Dev.to
Value Equality – ==
== checks whether two objects contain the same data. When you write a == b, Python compares the values stored in the objects. This may involve calling special methods like __eq__.
# == compares values
a = [1, 2, 3]
b = [1, 2, 3]
print(a == b) # True – same content
Under the hood, == can trigger more complicated operations:
- For lists, it compares each element.
- For custom objects, it calls the object’s
__eq__method.
class SensorValue:
def __init__(self, value, tolerance):
self.value = int(value)
self.tolerance = int(tolerance)
def __eq__(self, other):
# Equality within tolerance
return abs(self.value - other.value)
Note (Python 3.14+) – You can query whether an object is immortal via
sys._is_immortal(obj)(a private API).
7. Recap
| Operator / Function | What it checks | Typical use case |
|---|---|---|
== | Value equality (__eq__) | Comparing data |
is | Identity (same memory address) | Singleton checks (is None) |
id() | Object’s “address” (integer) | Debugging / introspection |
Understanding these distinctions reveals why:
- Small integers are cached and appear to be the same object.
- Large integers are only shared when you explicitly bind them to multiple names.
- Reference counting drives memory reclamation, but immortal objects bypass it.
Armed with this knowledge, you can write more predictable, memory‑aware Python code—and avoid the subtle bugs that arise from confusing equality with identity.
String Interning
Python performs a special optimisation for some strings: it caches them and stores the cached objects as interned strings. Interned strings behave like “immortal” objects – they are never de‑allocated while the interpreter runs. See the official description in the sys.intern documentation.
Note – Python implicitly interns a number of strings (e.g., identifiers, short literals). Because of this, the
isoperator can give surprising results when comparing strings that look equal but are not the same object.
Small‑integer memory layout
id() returns the memory address of an object. When we look at the addresses of consecutive small integers we see a regular pattern:
for i in range(5):
print(f"id({i+1}) - id({i}) = {id(i+1) - id(i)}")
The difference is 32 bytes. Is that the size of a Python integer object?
import sys
print(f"{sys.getsizeof(1)=}") # sys.getsizeof(1)=28
sys.getsizeof(1)reports 28 bytes – the actual size of the object data.- The 32‑byte gap you see when printing
iddifferences comes from the way CPython allocates small integers in a contiguous array; each slot occupies 32 bytes, even though only 28 bytes are used for the object itself.
What occupies those 32 bytes?
| Offset | Size | Meaning |
|---|---|---|
| 0 – 7 | 8 B | Reference count (ob_refcnt). For “immortal” small integers this contains the magic value 0xC0000000. |
| 8 – 15 | 8 B | Pointer to the type object (ob_type). |
| 16 – 23 | 8 B | Size field – the number of digits that make up the integer (Python integers have arbitrary precision). |
| 24 – 27 | 4 B | The first digit (ob_digit[0]). Larger numbers allocate additional digits. |
| 28 – 31 | 4 B | Padding for 8‑byte alignment. |
CPython object layout (C level)
Every Python object starts with a PyObject header:
typedef struct _object {
Py_ssize_t ob_refcnt; // reference count (or immortal flag)
PyTypeObject *ob_type; // pointer to type object
} PyObject;
Integer (long) object definition
In CPython source (cpython/Include/cpython/longintrepr.h):
typedef struct _PyLongValue {
uintptr_t lv_tag; // number of digits, sign, flags
digit ob_digit[1]; // array of 30‑bit “digits” (uint32)
} _PyLongValue;
struct _longobject { // the public integer object
PyObject_HEAD // expands to the PyObject header above
_PyLongValue long_value; // the actual numeric data
};
The header is used by the memory manager for object retrieval and garbage collection.
Reference counting
At the heart of CPython’s memory management is reference counting. Each object stores a counter that tracks how many references point to it. When the counter reaches zero, the memory is released.
You can observe the reference count with sys.getrefcount() (the function adds one extra reference for the argument you pass).
import sys
x = [1, 2, 3]
print(f"Initial refcount: {sys.getrefcount(x)=}") # 2 (x + argument)
y = x
print(f"Assigning y <- x: {sys.getrefcount(x)=}") # 3
z = x
print(f"Assigning z <- x: {sys.getrefcount(x)=}") # 4
del y
print(f"De‑assigning y: {sys.getrefcount(x)=}") # 3
z = None
print(f"De‑assigning z: {sys.getrefcount(x)=}") # 2
del x
print("Deleted x")
When the reference count drops to 0, CPython immediately frees the object.
Weak references and circular references
A weak reference does not increase the reference count, allowing objects to be reclaimed even if a weak reference to them still exists.
import sys
import weakref
class Node:
def __init__(self, child):
self.child = child
a = Node(None) # refcnt = 1
b = Node(None)
a_ref = weakref.ref(a) # weak reference, no refcnt increase
a.child = b
b.child = a # creates a cycle, refcnt = 2 for each
print(f"All refs a, b, and a_ref assigned: {sys.getrefcount(a_ref())=}") # 3 (2 + argument)
del a
print(f"After variable A deleted: {sys.getrefcount(a_ref())=}") # 3 (still alive because of the cycle)
del b
print(f"After variable B deleted: {sys.getrefcount(a_ref())=}") # 2 (object freed, but weakref still returns None)
Why reference counting alone can’t break cycles
When objects reference each other (a cycle), their reference counts never reach zero, so they would leak memory. CPython therefore adds a generational garbage collector that can detect and collect such cycles.
Generational garbage collector
The collector is based on the generational hypothesis:
- Most objects die young (≈ 80‑90 %).
- Old objects rarely reference young objects.
CPython maintains three generations:
| Generation | When it runs | Description |
|---|---|---|
| 0 | Every gc.get_threshold()[0] allocations (default 700) | New objects. |
| 1 | Every gc.get_threshold()[1] collections of generation 0 (default 10) | Objects that survived one Gen 0 collection. |
| 2 | Every gc.get_threshold()[2] collections of generation 1 (default 10) | Long‑living objects. |
import gc
print(f"GC generations thresholds: {gc.get_threshold()}")
# (700, 10, 10) by default
Because generation 0 collections happen only after a certain number of allocations, a newly created object (or a weak‑referenced object) may remain alive for a short time even after all strong references are gone. This explains why, in the example above, the weak reference a_ref() could still be accessed after a and b were deleted – the cyclic garbage had not yet been collected.
TL;DR
- String interning caches certain strings, making them effectively immortal.
- Small integers are stored in a pre‑allocated contiguous array; each slot occupies 32 bytes, though the object itself uses 28 bytes.
- Every CPython object starts with a
PyObjectheader (ob_refcnt+ob_type). - Reference counting automatically frees objects when their count drops to zero.
- Weak references let you hold a non‑owning pointer to an object.
- Circular references are handled by the generational garbage collector, which runs periodically based on allocation thresholds.