The dht protocol search program, optimized in the past few days found that the speed is indeed much faster, but there is a new problem, the memory directly soared, I opened ten crawlers occupy memory 800m. But there is a new problem, the memory directly soared, I opened ten crawlers occupy memory 800m. at first I thought it was too many nodes, looking for a few small problems to modify a little, found that it does not work. Later to the Internet to find python memory analysis tools, check a little information found python has a meliae library operation is very convenient to use to analyze a little, found that not too many nodes for the reason 0 0, is to save the t_id sent to mark the return of the message is the one that sends a dictionary is too large.
From the results of the analysis is very easy to locate the number and size of an object, very easy to analyze. I began to think that because a lot of sending query information, the opposite did not return to cause the elements in the dictionary did not release caused, I used the expiration time to determine the expiration of the deletion. I found that it is small, but not very significant, as if less than a few dozen less than 100M. later reduced the time to find a random hash, before 1 minute to check once, I changed to the first check! I changed it to just the first time I checked! Should be to find the hash, ask the node, and then return and then ask the node inside, and finally the number of more and more, but I do not understand is how so much running a minute there are 600,000 entries. That means there were that many objects that the memory didn't release at that time. After reaching this memory footprint, it basically stopped changing, and there was a very small and slow increase, because there were other programs open, so I'm not sure if it was caused by the increase in other objects in those programs. I'm not sure if it's due to the increase of other objects in these programs. I'll test it with a phased dump.
Installation directly pip install meliae is ok, I see that there is no updated project for a long time, I do not know if there is a good alternative but it works fine.
Dump memory to file
from meliae import scanner
scanner.dump_all_objects('/tmp/dump%' % ())
Analyze the document:
from meliae import loader
#Load the dump file
om = ('/opt/log/')
# Calculate the referential relationship of each Objects
om.compute_parents()
# Remove the _dict_ attribute of each object Instance
om.collapse_instance_dicts()
# Analyze memory usage
()
The field meanings are as follows:
Index : Row index number
Count : Total number of objects of this type
%(Count) : The total number of objects of this type as a percentage of the total number of objects of all types.
Size : The total number of bytes of this type of object
%(Size) : the total number of bytes of this type of object as a percentage of the total number of bytes of all types of objects
Cum : %(Size) after accumulating row indexes
Max : The maximum number of bytes in this type of object.
Kind : Type
Analyze an object to find its references
# Get all the POP3ClientProtocol objects
p = om.get_all('POP3ClientProtocol')
#View the first object
p[0]
# Can see all references to this object
p[0].c
# See who references this object
p[0].p