Note: phiên bản Tiếng Việt của bài này ở link dưới.

https://duongnt.com/deep-copy-vie

Python deep copy in action

Creating a deep copy of an object is not a simple task. In Python, the deepcopy method in the copy module can help simplify this process. Today, we will take a "deep dive" into this method and see how it works behind the scenes.

Deep copy makes simple

As its name suggests, the deepcopy method can create a deep copy of an object. This means when copying a compound object, it recursively creates new objects instead of just copying their references.

from copy import deepcopy

inner = [1, 2, 3]
outer = [inner, 'a', 'b']

outer_copy = deepcopy(outer)
outer_copy[0] is outer[0] # False

However, when multiple attributes hold references to the same object, deepcopy will only create one copy of that object.

inner = [1, 2, 3]
outer = [inner, inner]
outer[0] is outer[1] # True

outer_copy = deepcopy(outer)
outer_copy[0] is outer[0] # False
outer_copy[0] is outer_copy[1] # True

And it’s also smart enough to handle reference loops without creating infinite new objects.

a = [1,2,3]
b = [1, a]
a[0] = b
a[0][1] is a # True

a_copy = deepcopy(a) # No error
a_copy[0][1] is a_copy # True

How does deepcopy do all this? Let’s take a look at its source code.

The inner workings of the deepcopy method

The steps to create a copy

From the docstring of the copy module, we can get a clue how deepcopy avoids issues when copying recursive objects.

Python's deep copy operation avoids these problems by:
 a) keeping a table of objects already copied during the current
    copying pass
 b) letting user-defined classes override the copying operation or the
    set of components copied

To have a deeper understanding, we need to look at thedeepcopy method here. Broadly speaking, its steps are below.

  • Initialize a dictionary to cache copied objects.
  • Use the id method to calculate the identity of the target, then search the cache. If there is a cache hit then return the value from the cache. Otherwise, carry on.
    d = id(x)
    y = memo.get(d, _nil)
    
  • Check another dictionary called _deepcopy_dispatch to see if we already know how to copy this object. If we get a cache hit, use the retrieved method to make the copy. Otherwise, carry on.
    cls = type(x)
    copier = _deepcopy_dispatch.get(cls)
    
  • If the target defines a __deepcopy__ method, use it to create the copy (this is how we override the copying operation). If there is no such method, carry on.
    copier = getattr(x, "__deepcopy__", None)
    
  • If all else fails, we use the same logic as the pickling/unpickling process to create the copy.
  • The last step is to store the new copy in the cache.
    if y is not x:
        memo[d] = y
    

The copier method for built-in types

From this section, we can see that value types (and some special reference types) can be copied with the _deepcopy_atomic method. A quick look at that method reveals that it simply returns the original object. This is because for a value type, we can’t distinguish between objects that have the same value. And modifying one object doesn’t change other objects with the same value.

def _deepcopy_atomic(x, memo):
    return x

For list/tuple/dictionary types, we create copy with the _deepcopy_list, _deepcopy_tuple, or _deepcopy_dict method. While the details differ, they all create a container, recursively call deepcopy on each element of the original object, then put the copies into the new container.

Another interesting case is copying instance methods. As we all know, methods are first class objects in Python. This means we can put them into a list/tuple/dictionary… If so, when we copy an instance method, what class instance should that method be bound to? The answer is that it will be bound to a copy of the original class object, created with, you guessed it, deepcopy itself.

def _deepcopy_method(x, memo): # Copy instance methods
    return type(x)(x.__func__, deepcopy(x.__self__, memo))

Here, we can see a rarely used feature of Python, creating methods directly with types.MethodType.

What if we can’t find a copier?

What if the copy target is a custom type? In that case, deepcopy will create a new copy by utilizing the same serialize/deserialize process that the pickle module uses. Depend on the target object and Python version, either pickle protocol version 1 or version 4 will be used. However, version 4 will be prioritized.

Let’s say we have the following object that supports pickle protocol version 4, how can we copy it?

class Person:
    def __init__(self, name):
        self.name = name

    def introduce(self):
        print(f'{id(self)}: My name is {self.name}')

p = Person('John Doe')

The first step is to call the __reduce_ex__ method on that object.

func, args, state, _, _ = d.__reduce_ex__(4)

The meaning of each returned value can be found here (this URL is for __reduce__ method, but __reduce_ex__ is almost identical). Because Person doesn’t subclass list nor dict, we only care about the first three values.

func
<function __newobj__ at 0x000002392D419280>
args
(<class '__main__.Person'>,)
state
{'name': 'John Doe'}

We can use func and args to create a new object of type Person.

p1 = func(*args)

Then we use state to restore all attributes in the new copy. Because we are performing a deep copy, the state itself must first be cloned.

state = deepcopy(state, memo) # Neat recursion
p1.__dict__.update(state)

All the steps above are executed inside a private method called _reconstruct of the copy module.

Have fun with deepcopy

Deepcopy with memo

Since we rarely use the memo parameter when calling deepcopy, let’s create a small sample to see how it works.

first_inner = [1, 2, 3]
second_inner = [4, 5, 6]
outer = [first_inner, second_inner]

memo = { id(second_inner): second_inner }
outer_copy = deepcopy(outer, memo)

Remember that deepcopy checks memo to see if an object has already been copied before creating a new copy. Because we provide a memo object that already has an entry for second_inner, deepcopy won’t create a copy for that list.

outer[0] is outer_copy[0] # False as expected
outer[1] is outer_copy[1] # Is actually True
outer[1].append(1989)
print(outer_copy[1]) # [4, 5, 6, 1989]

Implement deep copy by ourselves

As mentioned in an earlier section, if the target object defines a method called __deepcopy__ then Python will use that method instead. Let’s create an uncopyable object.

class Uncopyable:
    def __init__(self, attributes):
        self.attribute = attributes

    def __deepcopy__(self, memo):
        raise NotImplementedError('Cannot copy this.')

    def show(self):
        return self.attribute

We can create a shallow copy for Uncopyable, but we can’t create a deep copy.

from copy import copy

u = Uncopyable([1,2,3,4,5])
u_shallow = copy(u) # OK
u_deep = deepcopy(u) # Exception has occurred: NotImplementedError: Cannot copy this.

Conclusion

I believe that reading the implementation of core modules is an excellent way to learn more about a language. Case in point, I didn’t expect that reading about deepcopy would teach me something about pickle.

A software developer from Vietnam and is currently living in Japan.

3 Thoughts on “Python deep copy in action”

Leave a Reply