Memory management of a language is an important aspect of language design. It is an important factor in determining the performance of a language. Whether it is manual management in C or garbage collection in Java, it has become the most important feature of the language. The Python language is used here as an example of memory management for a dynamically typed, object-oriented language.
Memory Usage of Objects
Assignment statements are some of the most common features of languages. But even the simplest assignment statement can be very informative.Python's assignment statement is well worth studying.
a = 1
The integer 1 is an object. And a is a reference. Using an assignment statement, the reference a points to object 1. Python is a dynamically typed language (see Dynamic Typing), where objects are separated from references, and Python uses references like "chopsticks" to touch and turn the real food - the objects! Python uses references like "chopsticks" to touch and turn the real food - objects.
References and Objects
To explore the storage of objects in memory, we can turn to Python's built-in function id(). It is used to return the identity of an object. In fact, what is called identity here is the memory address of that object.
a = 1 print(id(a)) print(hex(id(a)))
On my computer, they return the following.
11246696
'0xab9c68'
are the decimal and hexadecimal representations of memory addresses, respectively.
In Python, integers and short characters, Python caches these objects for reuse. When we create multiple references equal to 1, we're actually making all those references point to the same object.
a = 1 b = 1 print(id(a)) print(id(b))
The above program returns
11246696
11246696
It can be seen that a and b are actually two references to the same object.
To check that two references point to the same object, we can use the is keyword. is is used to determine whether the two references refer to the same object.
# True a = 1 b = 1 print(a is b) # True a = "good" b = "good" print(a is b) # False a = "very good morning" b = "very good morning" print(a is b) # False a = [] b = [] print(a is b)
The comments above show the results of the corresponding run. As you can see, since Python caches integers and short strings, there is only one copy of each object. For example, all references to the integer 1 point to the same object. Even with assignment statements, new references are created, not the objects themselves. Long strings and other objects can have more than one copy of the same object, and new objects can be created using assignment statements.
In Python, each object has a total number of references to it, known as the reference count.
We can use getrefcount() from the sys package to see the reference count of a particular object. Note that when a reference is used as an argument and passed to getrefcount(), the argument actually creates a temporary reference. As a result, getrefcount() will get a result that is 1 more than expected.
from sys import getrefcount a = [1, 2, 3] print(getrefcount(a)) b = a print(getrefcount(b))
For these reasons, the two getrefcounts will return 2 and 3 instead of the expected 1 and 2.
Object Reference Object
A container object (container) in Python, such as a table, dictionary, etc., can contain multiple objects. In fact, what is contained in the container object is not the element object itself, but references to individual element objects.
We can also customize an object with references to other objects: the
class from_obj(object): def __init__(self, to_obj): self.to_obj = to_obj b = [1,2,3] a = from_obj(b) print(id(a.to_obj)) print(id(b))
You can see that a references object b.
Object references are the most basic form of Python composition. Even the assignment a = 1 actually makes an element of the dictionary with the key "a" refer to the integer object 1. The dictionary object is used to keep track of all global references. The dictionary references the integer object 1. We can view the dictionary with the built-in function globals().
When an object A is referenced by another object B, the reference count of A is increased by one.
from sys import getrefcount a = [1, 2, 3] print(getrefcount(a)) b = [a, a] print(getrefcount(a))
As object b references a twice, the reference count of a is increased by 2.
References to container objects may form very complex topologies. We can plot their references with the objgraph package, for example
x = [1, 2, 3] y = [x, dict(key1=x)] z = [y, (x, y)] import objgraph objgraph.show_refs([z], filename='ref_topo.png')
objgraph is a third-party package for Python. You need to install xdot before installing it.
sudo apt-get install xdot sudo pip install objgraph
Two objects may refer to each other, thus constituting the so-called reference cycle (reference cycle).
a = [] b = [a] (b)
Even an object that simply references itself can form a reference ring.
a = [] (a) print(getrefcount(a))
Reference loops can cause a lot of trouble for the garbage collection mechanism, which I will describe in more detail later.
Citation reduction
The reference count of an object may be reduced. For example, a reference can be deleted using the del keyword:.
from sys import getrefcount a = [1, 2, 3] b = a print(getrefcount(b)) del a print(getrefcount(b))
del can also be used to remove elements from a container element, e.g..
a = [1,2,3] del a[0] print(a)
If a reference points to object A, when this reference is redirected to some other object B, the reference count of object A is reduced by.
from sys import getrefcount a = [1, 2, 3] b = a print(getrefcount(b)) a = 1 print(getrefcount(b))
garbage collection
When you eat too much, you always get fat, and the same goes for Python. When more and more objects in Python, they will occupy more and more memory. But you don't have to worry too much about Python's size, it will be good at the right time to "lose weight", start the garbage collection (garbage collection), will be useless to remove the object. There are garbage collection mechanisms in many languages, such as Java and Ruby, and while the ultimate goal is to create a slimmer reminder of your body, there are significant differences in the weight loss programs of the different languages (compare this article with Java Memory Management and Garbage Collection).
Basically, when the reference count of an object in Python drops to 0, it means that there are no references pointing to the object and the object becomes garbage to be recycled. For example, if a new object is created and it is assigned to a reference, the reference count of the object becomes 1. If the reference is deleted, the reference count of the object becomes 0 and the object is ready to be garbage collected. For example, the following table.
a = [1, 2, 3] del a
After del a, there is no longer any reference to the previously created table [1, 2, 3]. It is impossible for the user to touch or move this object in any way. This object becomes an unhealthy fat if it continues to stay in memory. When garbage collection kicks in, Python scans for this object with a reference count of 0 and empties the memory it occupies.
However, losing weight is an expensive and laborious endeavor. Python cannot perform other tasks while garbage collection is going on. Frequent garbage collection will greatly reduce Python's productivity. If there are not many objects in memory, there is no need to always start garbage collection. Therefore, Python will only automatically start garbage collection under certain conditions. When Python runs, it keeps track of the number of times it allocates and deallocates objects. Garbage collection is initiated when the difference between the two is above a certain threshold.
We can view this threshold via the get_threshold() method of the gc module:.
import gc print(gc.get_threshold())
Returns (700, 10, 10), the last two 10's are thresholds related to generational recycling, as you can see later. 700 is the threshold at which garbage collection is initiated. It can be reset with the set_threshold() method in gc.
We can also start garbage collection manually, i.e. using ().
Substitute recycling
Python also employs a generation recycling strategy. The underlying assumption of this strategy is that the longer an object lives, the less likely it is to become garbage later in the program. Our programs tend to produce a large number of objects, many of which are created and disappear quickly, but some of which are used for a long time. For reasons of trust and efficiency, we believe in the usefulness of such "long-lived" objects, so we scan them less frequently in garbage collection.
Python categorizes all objects into generations 0, 1, and 2. All newly created objects are generation 0 objects. When a generation object has gone through garbage collection and is still alive, then it is categorized as a next generation object. When garbage collection is initiated, all generation 0 objects must be scanned. If generation 0 has gone through garbage collection a certain number of times, then scanning and cleaning of generation 0 and generation 1 is initiated. When generation 1 has also gone through garbage collection a certain number of times, then a scan is initiated for 0, 1, and 2, i.e., for all objects.
These two times are the two 10s returned by (700, 10, 10) returned by get_threshold() above. i.e., every 10 times of generation 0 garbage collection will be paired with 1 time of generation 1 garbage collection; and only every 10 times of generation 1 garbage collection will be paired with 1 time of generation 2 garbage collection.
The same can be adjusted with set_threshold(), e.g. for more frequent scanning of 2-generation objects.
import gc gc.set_threshold(700, 10, 5)
Isolated reference loops
The existence of reference rings can make the above garbage collection mechanism very difficult. These reference loops may constitute some objects that are unusable, but have a reference count that is not zero.
a = [] b = [a] (b) del a del b
Above we first created two table objects and referenced each other to form a reference ring. After deleting the a and b references, these two objects can no longer be called from the program and are no longer useful. However, due to the existence of the reference ring, the reference count of these two objects have not been reduced to 0, and will not be garbage collected.
Isolated reference loops
To reclaim such a reference ring, Python replicates a reference count for each object, which can be notated as gc_ref. Assuming, for each object i, that count is gc_ref_i. Python iterates through all of object i. For each object j referenced by object i, the corresponding gc_ref_j is subtracted by one.
The result after traversal
At the end of the traversal, objects whose gc_ref is not 0, and objects referenced by those objects, as well as objects that continue to be referenced further downstream, need to be retained. The other objects are garbage collected.
summarize
Python, being a dynamically typed language, has a separation of objects and references. This is very different from the procedural oriented languages that were once available. In order to efficiently free memory, Python has built-in support for garbage collection.Python takes a relatively simple garbage collection mechanism, namely reference counting, and as a result needs to address the issue of isolated reference loops.
Python has both commonalities and special features with other languages. An understanding of this memory management mechanism is an important step in improving Python's performance.
This is the whole content of this article.