The Secret Life of Python: The Pickle Jar
Source: Dev.to
Why “Cannot Pickle” happens: The limits of Python serialization
When you try to copy or serialize certain objects—such as live database connections, file handles, or network sockets—Python raises a TypeError like “cannot pickle …”. This error stems from the fundamental limits of the pickle module, which can only serialize data that lives entirely inside Python’s memory.
What is Pickling?
Pickling is Python’s term for serialization: converting an in‑memory object into a flat byte stream that can be written to a file, sent over a network, or used for a deep copy.
import pickle
data = {"matches": 10, "active": True}
pickled_data = pickle.dumps(data)
print(pickled_data)
pickle.dumps()returns a bytes object (e.g.,b'\x80\x04\x95\x11...') that represents the original data.pickle.loads()reverses the process, reconstructing a new object that is equivalent to the original.
The pickle module records the type of each object, its attributes, and the raw data. When unpickling, Python follows those recorded instructions to rebuild the object.
Why Some Objects Can’t Be Pickled
Objects that depend on resources managed by the operating system—such as:
- File descriptors (open files)
- Network sockets (live connections)
- Database connections
contain state that is meaningful only while the OS holds the underlying resource. Serializing the raw descriptor (e.g., a socket number) would be useless later, because the OS may have reassigned that number to a different process. Consequently, pickle refuses to serialize such objects and raises a TypeError.
Pickle vs. Other Formats
| Feature | Pickle | JSON |
|---|---|---|
| Language specificity | Python‑only | Language‑agnostic |
| Supported types | Almost any Python object (including custom classes) | Basic types: strings, numbers, booleans, lists, dicts |
| Human‑readable | No (binary) | Yes (text) |
| Security | Can execute arbitrary code on unpickling | Safe (no code execution) |
If you need to exchange data with programs written in JavaScript, Go, or other languages, JSON (or another interoperable format like MessagePack) is usually the better choice.
Security Considerations
Never unpickle data from an untrusted source.
During unpickling, Python executes the instructions stored in the byte stream. A malicious payload can embed code that runs automatically, leading to arbitrary code execution.
Only call pickle.loads() on data you have created yourself or received from a verified, secure source.
Advanced Tip: Custom Serialization
For classes that require special handling, implement the __reduce__ (or __reduce_ex__) method. This method tells pickle exactly how to serialize and reconstruct the object.
class MyClass:
def __init__(self, value):
self.value = value
def __reduce__(self):
# Return a callable and its arguments to recreate the object
return (self.__class__, (self.value,))
Summary
- Pickling = serialization to a Python‑specific byte stream.
- Unpickling = deserialization back into a Python object.
- Boundaries: OS‑bound resources (files, sockets, DB connections) cannot be pickled.
- Alternatives: Use JSON for cross‑language data exchange.
- Security rule: Only unpickle trusted data.
- Customization: Define
__reduce__for fine‑grained control over serialization.