When slotted attrs classes are derived from a non-attrs base class, the attributes of the base class do not survive pickling.
What does this mean? Why would one do such a thing? What can one do about it?
For a contrived example, consider these two classes:
class MyMixin:
def __init__(self):
self.mixed = True
@attr.define
class A(MyMixin):
spam: str = "eggs"
def __attrs_post_init__(self):
super().__init__()
Here, MyMixin
is just some class that doesn’t depend on attrs
.1
This generally works the way you probably expect; instances of A
all get decorated with a mixed
attribute.
>>> a = A()
>>> a
A(spam='eggs')
>>> a.mixed
True
Wonderful.
The problems don’t begin until later, when one discovers that one has cause to use
Python’s pickle
module to serialize an instance of the class A
.2
It seems to work:
>>> a_prime = pickle.loads(pickle.dumps(a)) # Round-trip `a` through pickle
>>> a_prime
A(spam='eggs')
But the integration tests break violently! Other code depends on the mixin’s attributes surviving serialization, but the mixin attributes don’t survive serialization:
>>> a_prime.mixed
AttributeError: 'A' object has no attribute 'mixed'
That’s funny. Where’d our attribute go?
The case of the missing member
If we start experimenting, we may notice a couple of things:
mixed
gets dropped if we decorateA
with@attr.define
. But in code where we used the old-style@attr.s
instead of@attr.define
, instances round-trip through pickle just fine, preserving theirmixed
attribute. 🧐- If
MyMixin
is itself an attrs class, themixed
attribute always survives a round-trip, whether we use@attr.define
or@attr.s
.
If you’ve been following attrs development, you might have recalled that one of the key differences between the “cute” attr.s
API and the “next generation” attr.define
API is that the next-generation API defaults to creating slotted classes. The slotted data model for classes comes with some performance advantages and better hygiene: it prevents users from creating undeclared attributes on an instance at runtime. With attrs, it also comes with a long list of edge cases, mercifully summarized in the attrs docs. Could slots be to blame?
Indeed, redefining A
with @attr.define(slots=False)
restores our ability to round-trip the mixed
attribute through pickling.
One of the caveats in the attrs docs might stand out on review:
Slotted classes must implement
__getstate__
and__setstate__
to be serializable withpickle
protocol 0 and 1. Therefore, attrs creates these methods automatically for slotted classes.
Peeking at the implementation of these methods explains our observations: the __getstate__
method that attrs writes for slotted classes works from a list of attrs properties belonging to the type to be serialized and its superclasses. It doesn’t try to include any properties of an instance that weren’t defined as attrs attribs.
How does this interact with inheritance? Mostly, it doesn’t. There is no established mechanism to invoke any __getstate__
or __setstate__
methods associated with a class’s superclasses, and the attrs-generated methods do not try.
These united states
So, what can we do to help pickle find our missing properties?
A few options are:
- Make the base type (i.e.
MyMixin
) an attrs class. This may not be appropriate if the other children of the base type don’t want to use attrs, but it’s very easy and only requires a single line-of-code change. - Somehow customize
__setstate__
and__getstate__
on every leaf class we want to pickle. It’s not sufficient to define these on the base type; even though attrs will avoid overwriting these methods if they are already defined on a class, it ignores inherited implementations. You could imagine implementing this with a class decorator or something similarly cute, but that would still require changes at each point of use of the base class, which would make the example mixin much more annoying to use. - Types can define a
__reduce__
method which (empirically!) takes precedence over__getstate__
and__setstate__
. We can implement__reduce__
on the base type and make it responsible for collecting and setting the right state about itself and any subtypes.
For both options 2 and 3, it’s awkward that MyMixin
acquires responsibility for serializing subtypes that it doesn’t know or care about. One possibility is to delegate to any __(get|set)state__
methods that we find on the instance it receives, which must belong to subclasses, and can take responsibility for their own properties.
Writing a reduce method for the base type
__reduce__
is expected to return a weird tuple, of length between 2 and 6. Quoting the pickle docs for each member, with annotations:
1 . A callable object that will be called to create the initial version of the object.
This is usually object.__new__
. That’ll work for us.
2 . A tuple of arguments for the callable object. An empty tuple must be given if the callable does not accept any argument.
object.__new__
receives the type of the object to create. We can use (type(self),)
.
3 . Optionally, the object’s state, which will be passed to the object’s
__setstate__()
method as previously described.
This is effectively where we override __getstate__
. The object’s state is held in self.__dict__
, and in any slots on either this type or its children. If we assume that all slotted children are attrs classes, we’ll know that know that any subtypes that use slots will have a customized __getstate__()
method.
What we can do is:
- If
__getstate__
exists, call it, and assume that it handles all of the subtype’s responsibilities. - Otherwise, grab the instance
__dict__
. - Add any state specific to the slots of the base class.
4 . Optionally, an iterator (and not a sequence) yielding successive items. These items will be appended to the object either using obj.append(item) or, in batch, using obj.extend(list_of_items).
We don’t care about this; we can set it to None
.
5 . Optionally, an iterator (not a sequence) yielding successive key-value pairs. These items will be stored to the object using obj[key] = value.
We don’t care about this; we can set it to None
.
6 . Optionally, a callable with a (obj, state) signature. This callable allows the user to programmatically control the state-updating behavior of a specific object, instead of using obj’s static
__setstate__()
method. If not None, this callable will have priority over obj’s__setstate__()
.
This is where we override __setstate__
. We get to provide a callable that accepts a brand new instance of the class, and feeds it the state that the instance needs to contain.
If the instance has a __setstate__
method, we can accomplish most of our work by calling it, and then slide in our base-class state on top of it.
Otherwise, we’re responsible for the whole kit and caboodle. This is a little bit subtle in the usual attrs ways; we need to circumvent any property setters and frozen class behaviours. We can do this by using object.__setattr__
to assign each of the properties.
Denouement
A concrete implementation that rescues pickling round-trips for our MyMixin example is:
class MyMixin:
def __init__(self):
self.mixed = True
def __reduce__(self) -> tuple:
if hasattr(self, "__getstate__"):
# Subtypes with slots should have customized their pickling behaviour.
# Rely on the subtype to handle the things it's responsible for.
state = self.__getstate__()
else:
# The subtype must not have slots. Grab its instance dictionary.
state = self.__dict__
# Sprinkle our own properties on top:
if hasattr(self, "mixed"):
state["mixed"] = self.mixed
return (
object.__new__,
(type(self),),
state,
None,
None,
_setstate,
)
def _setstate(obj: object, state: dict[str, Any]) -> None:
"""
Inspired by https://github.com/python-attrs/attrs/blob/336ea8b691f148b81db62e24bb55c7ca11ddbc6c/src/attr/_make.py#L931
"""
# Use object.__setattr__ to actually assign in state, to avoid any property setters
# or frozen class behaviours.
# I think the precise form of this (instead of repeatedly calling
# object.__setattr__(obj, key, value)) is a performance microoptimization.
_bound_setattr = object.__setattr__.__get__(obj)
if hasattr(obj, "__setstate__"):
# If the instance has a setstate method, rely on it to handle almost everything.
obj.__setstate__(state) # type: ignore
# Restore our MyMixin state to the object
if "mixed" in state:
_bound_setattr("mixed", state["mixed"])
return
# The object doesn't know how to restore its state, so we're responsible for it all.
for k, v in state.items():
if k == "__weakref__":
continue
_bound_setattr(k, v)
@attr.define
class A(MyMixin):
spam: str = "eggs"
def __attrs_post_init__(self):
super().__init__()
Finally, we can write:
>>> a = A()
>>> a_prime = pickle.loads(pickle.dumps(a))
>>> a_prime.mixed
True
Endnotes
1: A “mixin” class provides some functionality to child types, to which the mixin class may not be tightly coupled. It’s not important that the base class here is a mixin, instead of some other kind of base class, but it was the motivating example in the codebase where I encountered this strange circumstance.
2: To avoid slandering pickle, running into problems serializing this data layout isn’t a pickle-specific problem. A glance at the default cattrs
representation of an instance of A is revealing:
>>> cattr.unstructure(a)
{'spam': 'eggs'}
There’s no attempt to represent the mixed
attribute here, so it follows that you would have to similarly customize how cattrs structures and unstructures this type to preserve any interesting properties belonging to the mixin class.