Note: phiên bản Tiếng Việt của bài này ở link dưới.
https://duongnt.com/deep-copy-vie
Creating a deep copy of an object is not a simple task. In Python, the deepcopy
method in the copy
module can help simplify this process. Today, we will take a "deep dive" into this method and see how it works behind the scenes.
Deep copy makes simple
As its name suggests, the deepcopy
method can create a deep copy of an object. This means when copying a compound object, it recursively creates new objects instead of just copying their references.
from copy import deepcopy
inner = [1, 2, 3]
outer = [inner, 'a', 'b']
outer_copy = deepcopy(outer)
outer_copy[0] is outer[0] # False
However, when multiple attributes hold references to the same object, deepcopy
will only create one copy of that object.
inner = [1, 2, 3]
outer = [inner, inner]
outer[0] is outer[1] # True
outer_copy = deepcopy(outer)
outer_copy[0] is outer[0] # False
outer_copy[0] is outer_copy[1] # True
And it’s also smart enough to handle reference loops without creating infinite new objects.
a = [1,2,3]
b = [1, a]
a[0] = b
a[0][1] is a # True
a_copy = deepcopy(a) # No error
a_copy[0][1] is a_copy # True
How does deepcopy
do all this? Let’s take a look at its source code.
The inner workings of the deepcopy method
The steps to create a copy
From the docstring of the copy
module, we can get a clue how deepcopy
avoids issues when copying recursive objects.
Python's deep copy operation avoids these problems by:
a) keeping a table of objects already copied during the current
copying pass
b) letting user-defined classes override the copying operation or the
set of components copied
To have a deeper understanding, we need to look at thedeepcopy
method here. Broadly speaking, its steps are below.
- Initialize a dictionary to cache copied objects.
- Use the
id
method to calculate the identity of the target, then search the cache. If there is a cache hit then return the value from the cache. Otherwise, carry on.d = id(x) y = memo.get(d, _nil)
- Check another dictionary called
_deepcopy_dispatch
to see if we already know how to copy this object. If we get a cache hit, use the retrieved method to make the copy. Otherwise, carry on.cls = type(x) copier = _deepcopy_dispatch.get(cls)
- If the target defines a
__deepcopy__
method, use it to create the copy (this is how we override the copying operation). If there is no such method, carry on.copier = getattr(x, "__deepcopy__", None)
- If all else fails, we use the same logic as the pickling/unpickling process to create the copy.
- The last step is to store the new copy in the cache.
if y is not x: memo[d] = y
The copier method for built-in types
From this section, we can see that value types (and some special reference types) can be copied with the _deepcopy_atomic
method. A quick look at that method reveals that it simply returns the original object. This is because for a value type, we can’t distinguish between objects that have the same value. And modifying one object doesn’t change other objects with the same value.
def _deepcopy_atomic(x, memo):
return x
For list/tuple/dictionary
types, we create copy with the _deepcopy_list, _deepcopy_tuple, or _deepcopy_dict method. While the details differ, they all create a container, recursively call deepcopy
on each element of the original object, then put the copies into the new container.
Another interesting case is copying instance methods. As we all know, methods are first class objects in Python. This means we can put them into a list/tuple/dictionary
… If so, when we copy an instance method, what class instance should that method be bound to? The answer is that it will be bound to a copy of the original class object, created with, you guessed it, deepcopy
itself.
def _deepcopy_method(x, memo): # Copy instance methods
return type(x)(x.__func__, deepcopy(x.__self__, memo))
Here, we can see a rarely used feature of Python, creating methods directly with types.MethodType
.
What if we can’t find a copier?
What if the copy target is a custom type? In that case, deepcopy
will create a new copy by utilizing the same serialize/deserialize process that the pickle
module uses. Depend on the target object and Python version,
either pickle protocol version 1 or version 4 will be used. However, version 4 will be prioritized.
Let’s say we have the following object that supports pickle protocol version 4, how can we copy it?
class Person:
def __init__(self, name):
self.name = name
def introduce(self):
print(f'{id(self)}: My name is {self.name}')
p = Person('John Doe')
The first step is to call the __reduce_ex__
method on that object.
func, args, state, _, _ = d.__reduce_ex__(4)
The meaning of each returned value can be found here (this URL is for __reduce__
method, but __reduce_ex__
is almost identical). Because Person
doesn’t subclass list
nor dict
, we only care about the first three values.
func
<function __newobj__ at 0x000002392D419280>
args
(<class '__main__.Person'>,)
state
{'name': 'John Doe'}
We can use func
and args
to create a new object of type Person
.
p1 = func(*args)
Then we use state
to restore all attributes in the new copy. Because we are performing a deep copy, the state
itself must first be cloned.
state = deepcopy(state, memo) # Neat recursion
p1.__dict__.update(state)
All the steps above are executed inside a private method called _reconstruct of the copy
module.
Have fun with deepcopy
Deepcopy with memo
Since we rarely use the memo
parameter when calling deepcopy
, let’s create a small sample to see how it works.
first_inner = [1, 2, 3]
second_inner = [4, 5, 6]
outer = [first_inner, second_inner]
memo = { id(second_inner): second_inner }
outer_copy = deepcopy(outer, memo)
Remember that deepcopy
checks memo
to see if an object has already been copied before creating a new copy. Because we provide a memo
object that already has an entry for second_inner
, deepcopy
won’t create a copy for that list.
outer[0] is outer_copy[0] # False as expected
outer[1] is outer_copy[1] # Is actually True
outer[1].append(1989)
print(outer_copy[1]) # [4, 5, 6, 1989]
Implement deep copy by ourselves
As mentioned in an earlier section, if the target object defines a method called __deepcopy__
then Python will use that method instead. Let’s create an uncopyable object.
class Uncopyable:
def __init__(self, attributes):
self.attribute = attributes
def __deepcopy__(self, memo):
raise NotImplementedError('Cannot copy this.')
def show(self):
return self.attribute
We can create a shallow copy for Uncopyable
, but we can’t create a deep copy.
from copy import copy
u = Uncopyable([1,2,3,4,5])
u_shallow = copy(u) # OK
u_deep = deepcopy(u) # Exception has occurred: NotImplementedError: Cannot copy this.
Conclusion
I believe that reading the implementation of core modules is an excellent way to learn more about a language. Case in point, I didn’t expect that reading about deepcopy
would teach me something about pickle
.
Thank you so much for an excellent walkthough of this fascinating dark corner of Python!
Thank you. Please check out other articles as well.