SoFunction
Updated on 2024-11-13

Python List Creation and Destruction and Cache Pool Mechanisms

List creation

To create lists, the only Python/C API provided by the underlying Python is PyList_New. This function takes a size parameter that allows us to specify the length of the underlying PyObject * array when creating a PyListObject object.

PyObject *
PyList_New(Py_ssize_t size)
{  
    // Declare a PyListObject * object.
    PyListObject *op;
#ifdef SHOW_ALLOC_COUNT
    static int initialized = 0;
    if (!initialized) {
        Py_AtExit(show_alloc);
        initialized = 1;
    }
#endif
  
    // If size is less than 0, throw an exception.
    if (size < 0) {
        PyErr_BadInternalCall();
        return NULL;
    }
    // whether the cache pool is available, and if so, what is it?
    if (numfree) {
        //Subtract the number of objects in the cache pool by one.
        numfree--;
        //from the cache pool
        op = free_list[numfree];
        //Set the reference count
        _Py_NewReference((PyObject *)op);
#ifdef SHOW_ALLOC_COUNT
        count_reuse++;
#endif
    } else {
        // Request memory when unavailable
        op = PyObject_GC_New(PyListObject, &PyList_Type);
        if (op == NULL)
            return NULL;
#ifdef SHOW_ALLOC_COUNT
        count_alloc++;
#endif
    }
    // if size equals 0, ob_item is set to NULL
    if (size <= 0)
        op->ob_item = NULL;
    else {
        // Otherwise, create an array of pointers of the specified capacity and then have ob_item point to it
        // So you create the PyListObject object first, then the pointer array.
        // Finally the connection is made through ob_item
        op->ob_item = (PyObject **) PyMem_Calloc(size, sizeof(PyObject *));
        if (op->ob_item == NULL) {
            Py_DECREF(op);
            return PyErr_NoMemory();
        }
    }
    // set ob_size and allocated, then return op
    Py_SIZE(op) = size;
    op->allocated = size;
    _PyObject_GC_TRACK(op);
    return (PyObject *) op;
}

We notice that there is a cache pool inside the source code, and yes, most Python objects have their own cache pools, just implemented differently.

Destruction of lists

When creating a PyListObject object, it will first check if there is an object available in the cache pool free_list, and use it directly if there is one, otherwise it will be requested on the heap through malloc. The list cache pool is implemented as an array, which maintains up to 80 PyListObject objects.

#ifndef PyList_MAXFREELIST
#define PyList_MAXFREELIST 80
#endif
static PyListObject *free_list[PyList_MAXFREELIST];

We know from previous experience that since we can get it from the cache pool when we create it, we also have to put the list inside the cache pool when we execute the destructor function.

static void
list_dealloc(PyListObject *op)
{
    Py_ssize_t i;
    PyObject_GC_UnTrack(op);
    Py_TRASHCAN_SAFE_BEGIN(op)
    // Release the underlying array first
    if (op->ob_item != NULL) {
        i = Py_SIZE(op);
        //But before releasing, there is one more important thing to do
        // To subtract 1 from the reference count of every object pointed to by a pointer in the underlying array
        // because they no longer hold a reference to the "object".
        while (--i >= 0) {
            Py_XDECREF(op->ob_item[i]);
        }
        // then free the memory occupied by the underlying array
        PyMem_FREE(op->ob_item);
    }
    // Judge the number of PyListObject objects in the buffer pool, if it is not full, add it to the cache pool
    //Note: We see that by the time we get to this point, the underlying array has already been released.
    if (numfree < PyList_MAXFREELIST && PyList_CheckExact(op))
    //Added to the cache pool is added to the tail
    //Fetching is also done from the end of the
        free_list[numfree++] = op;
    else
        // otherwise free the memory occupied by the PyListObject object.
        Py_TYPE(op)->tp_free((PyObject *)op);
    Py_TRASHCAN_SAFE_END(op)
}

We know that when creating a new PyListObject object, it's actually a two-step process, first creating the PyListObject object, then creating the underlying array, and finally having the ob_item member of the PyListObject object point to this underlying array.

Similarly, when destroying a PyListObject object, the underlying array maintained by ob_item is destroyed first, and then the PyListObject object itself is freed (if the cache pool is full).

Now it's clear to see that the empty cache pool is actually filled with dead PyListObject objects. When you create new PyListObject objects in the future, Python will first wake up these dead PyListObject objects, giving them a chance to clean up their act and start over. Note, however, that it is only the PyListObject objects that are cached here; for the underlying array, the ob_item no longer points to it.

From list_dealloc, we see that before the PyListObject object is put into the cache pool, the array pointed to by ob_item is released, and the reference counts of the objects pointed to by the pointers in the array are reduced by 1. So eventually, the objects pointed to by these pointers in the array also fly away, or survive or perish, and there is no longer any connection between them and the PyListObject. In any case, there is no longer any connection between the array and the PyListObject.

But why do that? Why not even maintain the underlying array as well? Think about it, if you continue to maintain it, the objects pointed to by the pointers in the array will never be released, and then the problem of dangling pointers is likely to arise, so the space occupied by the objects pointed to by these pointers has to be handed back to the system (provided there are no other pointers pointing to it).

In fact, however, it is possible to preserve the underlying array maintained by the PyListObject object, i.e., only reduce the reference count of the objects pointed to by the pointers in the array by 1, and then set the pointers in the array to NULL, no longer pointing to the previous object, but without freeing up the memory space occupied by the underlying array itself.

So this way, the freed memory is not given to the system heap, and then when it is allocated again, it is much faster. However, this creates a problem in that the memory will remain unused and will only be available for use by the underlying array pointed to by the ob_item of the PyListObject object. So Python still avoids consuming too much memory by returning the memory occupied by the underlying array to the system heap, choosing between time and space.

lst1 = [1, 2, 3]
print(id(lst1))  # 1243303086208
# Throw it in the cache pool at the end of the array #
del lst1
# Fetch from the cache pool, will also take from the end of the array
lst2 = [1, 2, 3]
print(id(lst2))  # 1243303086208
# So the printed address is the same

wrap-up

As a powerful data structure, more time is necessary.

This article on the Python list creation and destruction and cache pooling mechanism is introduced to this article, more related Python list content please search for my previous articles or continue to browse the following related articles I hope you will support me in the future!