SoFunction
Updated on 2024-11-20

Python's Garbage Collection Mechanism Explained

reference count

In the Python source code, every object is represented as a structure with a count field.

typedef struct_object {
  int ob_refcnt;
  struct_typeobject *ob_type;
} PyObject;

PyObject is what every object must have, where ob_refcnt is used as a reference count. An object's ob_refcnt is increased when it has a new reference, and decreased when the object referencing it is deleted. Once an object's reference count reaches 0, the object is immediately recycled and the occupied space is freed.

vantage

  • simple and easy to use
  • Good real-time performance, released as soon as it is not referenced

drawbacks

  • Requires extra space to maintain reference counts
  • Cannot resolve circular references to objects

Circular references to objects
A circular reference is when two objects refer to each other and no external variable references either of them, causing the reference chain to form a ring.

>>> a = {}    # The reference count of object a is 1
>>> b = {}    # The reference count of object b is 1
>>> a['b'] = b  # b's reference count increased by 1
>>> b['a'] = a  # The reference count of a increases by 1
>>> del a     # a's reference count decreases by 1, and finally a's reference is 1
>>> del b     # bThe reference count of the1,ultimatebThe reference to the1

After the del operation is performed, there are no references to the a and b objects, but because the two objects each contain a reference to the other, the reference count always remains at one.

According to the principle of memory reclamation in reference counting, since the counts of a and b are not 0, these two objects will not be reclaimed when using reference counting method for memory management, they will always reside in memory, causing memory leaks.

Marker removal

The token removal mechanism is mainly used to solve circular reference problems.

Marker removal algorithm is a garbage collection algorithm based on the implementation of tracing GC (tracing GC) technology. It is mainly divided into two stages:

  • Marking phase, where the GC marks all active objects
  • Recycling of inactive objects that are not labeled

Distinguish between active and inactive objects

Objects are connected together by references, i.e. pointers, to form a directed graph, objects are the nodes of this directed graph, and reference relationships form the edges of this directed graph. Starting from the root object (root object), along the directed edge traversal of the object, reachable objects will be marked as active objects, unreachable objects is to be removed inactive objects.

Root objects are typically global variables, call stacks, registers, etc.

Scope of application

Marker removal algorithm as Python auxiliary garbage collection technique, mainly deal with container objects, because for strings, numerical objects, etc., it is not possible to cause circular references, Python will use a two-way chain table to organize these container objects.

There is a rather obvious drawback for the mark-and-clean algorithm: in order to remove inactive objects, the entire heap memory needs to be scanned, and even if there are only a small number of active objects left all objects need to be scanned.

Substitute recycling

Generational recycling is a space-for-time operation built on top of mark-and-clean techniques, and is a Python-assisted garbage collection technique that is primarily used to process container objects.

Python will divide the memory into different sets based on the survival time of the object, each set is called a generation, which will be mainly categorized into 3 generations: young generation. The middle generation and the old generation, which will correspond to 3 chained lists, and the corresponding garbage collection frequency decreases with the increase of the object's survival time.

Newly created objects are assigned to the young generation. When the total number of young generation links reaches the upper limit, Python's garbage collection mechanism will be triggered to recycle the recyclable objects, and those non-recyclable objects will be moved to the middle generation. By analogy, the oldest generation objects are the ones that survive the longest, and may even survive the entire system lifecycle.

This is the whole content of this article.