Generally speaking in In Python, object reference counting is used to solve memory leaks, and automatic garbage collection is implemented based on reference counting.
Because Python has automatic garbage collection, many beginners mistakenly believe that they are living the good life and don't have to suffer from memory leaks anymore. But if you take a closer look at the Python documentation for the __del__() function, you'll see that there are clouds in the sky. Here's a little excerpt from the documentation:
Some common situations that may prevent the reference count of an object from going to zero include: circular references between objects (., a doubly-linked list or a tree data structure with parent and child pointers); a reference to the object on the stack frame of a function that caught an exception (the traceback stored in sys.exc_traceback keeps the stack frame alive); or a reference to the object on the stack frame that raised an unhandled exception in interactive mode (the traceback stored in sys.last_traceback keeps the stack frame alive).
It can be seen thatCircular references between objects with __del__() functions are the main culprit for memory leaks。
Another clarification is needed:Circular references to Python objects without the __del__() function are automatically garbage collected.。
How do you know if an object has a memory leak?
Method 1: When you think an object should be destroyed (i.e., the reference count is 0), you can get the reference count of the object via (obj) and determine if there is a memory leak based on whether or not the returned value is 0. If the returned reference count is not 0, the object obj cannot be reclaimed by the garbage collector at this moment.
Alternatively, you can use the Python extension gc to see the details of objects that can't be recycled.
First, look at a normal piece of test code:
#--------------- code begin -------------- # -*- coding: utf-8 -*- import gc import sys class CGcLeak(object): def __init__(self): self._text = '#'*10 def __del__(self): pass def make_circle_ref(): _gcleak = CGcLeak() # _gcleak._self = _gcleak # test_code_1 print '_gcleak ref count0:%d' % (_gcleak) del _gcleak try: print '_gcleak ref count1:%d' % (_gcleak) except UnboundLocalError: print '_gcleak is invalid!' def test_gcleak(): # Enable automatic garbage collection. () # Set the garbage collection debugging flags. gc.set_debug(gc.DEBUG_COLLECTABLE | gc.DEBUG_UNCOLLECTABLE | / gc.DEBUG_INSTANCES | gc.DEBUG_OBJECTS) print 'begin leak test...' make_circle_ref() print 'begin collect...' _unreachable = () print 'unreachable object num:%d' % _unreachable print 'garbage object num:%d' % len() if __name__ == '__main__': test_gcleak()
In test_gcleak(), after setting the garbage collector debugging flag, collect() is then used to garbage collect, and finally prints out the number of unreachable garbage objects found by that garbage collection and the number of garbage objects in the entire interpreter.
is a list of objects which the collector found to be unreachable but could not be freed (uncollectable objects). The documentation describes it as: A list of objects which the collector found to be unreachable but could not be freed (uncollectable objects).
Typically, objects in the ring are references to objects in the ring. Because Python doesn't know a safe order in which to call the __del__() function on the objects in the ring, the objects always live in the ring, causing a memory leak. If you know a safe order, then break the reference ring and execute del [:] to clear the list of garbage objects.
The output of the above code is (the string after # is a comment added by the author):
#----------------------------------------- begin leak test... # The reference count of the variable _gcleak is 2. _gcleak ref count0:2 # _gcleak Illegal variable made unreachable. _gcleak is invalid! # Start garbage collection begin collect... # The number of unreachable garbage objects found by this garbage collection is 0. unreachable object num:0 # The number of garbage objects in the entire interpreter is 0. garbage object num:0 #-----------------------------------------
This shows that the reference counting of the _gcleak objects is correct and that no memory leaks have occurred in any of the objects.
If you don't comment out the test_code_1 statement in make_circle_ref():
_gcleak._self = _gcleak
That is, let _gcleak form a circular reference to itself. Run the above code again and the output becomes:
#----------------------------------------- begin leak test... _gcleak ref count0:3 _gcleak is invalid! begin collect... # Found recyclable garbage object: address 012AA090, type CGcLeak. gc: uncollectable <CGcLeak 012AA090> gc: uncollectable <dict 012AC1E0> unreachable object num:2 #!!! The number of garbage objects that cannot be reclaimed is 1, resulting in a memory leak! garbage object num:1 #-----------------------------------------
It can be seen that <CGcLeak 012AA090> object has a memory leak! And the extra dict garbage is the dictionary of the leaked _gcleak object, which prints out the dictionary information as.
{'_self': <__main__.CGcLeak object at 0x012AA090>, '_text': '##########'}
In addition to circular references to yourself, circular references between multiple objects can also lead to memory leaks. A simple example is as follows:
#--------------- code begin -------------- class CGcLeakA(object): def __init__(self): self._text = '#'*10 def __del__(self): pass class CGcLeakB(object): def __init__(self): self._text = '*'*10 def __del__(self): pass def make_circle_ref(): _a = CGcLeakA() _b = CGcLeakB() _a._b = _b # test_code_2 _b._a = _a # test_code_3 print 'ref count0:a=%d b=%d' % / ((_a), (_b)) # _b._a = None # test_code_4 del _a del _b try: print 'ref count1:a=%d' % (_a) except UnboundLocalError: print '_a is invalid!' try: print 'ref count2:b=%d' % (_b) except UnboundLocalError: print '_b is invalid!' #--------------- code end ----------------
The output after this test is:
#----------------------------------------- begin leak test... ref count0:a=3 b=3 _a is invalid! _b is invalid! begin collect... gc: uncollectable <CGcLeakA 012AA110> gc: uncollectable <CGcLeakB 012AA0B0> gc: uncollectable <dict 012AC1E0> gc: uncollectable <dict 012AC0C0> unreachable object num:4 garbage object num:2 #-----------------------------------------
It can be seen that both _a,_b objects have memory leaks. Because they are circular references, the garbage collector doesn't know how to reclaim them, i.e., it doesn't know which object's __del__() function to call first.
Memory leaks can be avoided by using either of the following methods to break circular references:
1. Comment out the test_code_2 statement in make_circle_ref();
2. Comment out the test_code_3 statement in make_circle_ref();
3. Uncomment the test_code_4 statement in make_circle_ref().
The corresponding output becomes:
#----------------------------------------- begin leak test... ref count0:a=2 b=3 # Note: The output here is subject to change. _a is invalid! _b is invalid! begin collect... unreachable object num:0 garbage object num:0 #-----------------------------------------
Conclusion: Python's gc has strong features, such as setting gc.set_debug(gc.DEBUG_LEAK) to check for memory leaks caused by circular references. If you check for memory leaks during development and make sure there are no memory leaks when you release, you can extend Python's garbage collection interval or even actively disable the garbage collection mechanism, thus improving operational efficiency.