In this paper, an example of the python implementation of the batch analysis of xml tags in the number of individual categories function. Shared for your reference, as follows:
Article Catalog
I need a script to analyze the number of targets and practice multiprocessing, for my own use, in code:
# -*- coding: utf-8 -*- # @Time : 2019/06/10 18:56 # @Author : TuanZhangSama import os import as ET from multiprocessing import Pool,freeze_support,cpu_count import imghdr import logging def get_all_xml_path(xml_dir:str,filter=['.xml']): # Iterate over all xml in the folder result=[] #maindir is the currently searched directory subdir is the name of the folder in the current directory file is the name of the file in the directory for maindir,subdir,file_name_list in (xml_dir): for filename in file_name_list: ext=(filename)[1]# Returns the extension if ext in filter: ((maindir,filename)) return result def analysis_xml(xml_path:str): tree=(xml_path) root=() result_dict={} for obj in ('object'): obj_name = ('name').text obj_num=result_dict.get(obj_name,0)+1 result_dict[obj_name]=obj_num if (xml_path.replace('.xml','.jpg')) != 'jpeg': print(xml_path.replace('.xml','.jpg'),'is worng') # (xml_path.replace('.xml','.jpg')) if is_valid_jpg(xml_path.replace('.xml','.jpg')): pass return result_dict def analysis_xmls_batch(xmls_path_list:list): result_list=[] for i in xmls_path_list: result_list.append(analysis_xml(i)) return result_list def collect_result(result_list:list): all_result_dict={} for result_dict in result_list: for key,values in result_dict.items(): obj_num=all_result_dict.get(key,0)+values all_result_dict[key]=obj_num return all_result_dict def main(xml_dir:str,result_save_path:str =None): r'''Count the number of all samples according to the xml file. Incomplete images and samples with xml but no images are deleted directly. Default run all cpu cores Parameters ---------- xml_dir : str The folder where the xml is located. It is recursive, so just make sure the xml is in a subdirectory of this directory. The corresponding image and its xml should be in the same directory. result_save_path : str The path to save the log of the result of the analysis. Default None No logs ''' if result_save_path is not None: assert isinstance(result_save_path,str),'{} is illegal path'.format(result_save_path) else: (filename=result_save_path,filemode='w',level=) freeze_support()#windows xmls_path=get_all_xml_path(xml_dir) worker_num=cpu_count() print('your CPU num is',cpu_count()) length=float(len(xmls_path))/float(worker_num) # Calculate subscripts to divide the list of input files as evenly as possible indices=[int(round(i*length)) for i in range(worker_num+1)] # Generate a list of subfiles to be processed by each process sublists=[xmls_path[indices[i]:indices[i+1]] for i in range(worker_num)] pool=Pool(processes=worker_num) all_process_result_list=[] for i in range(worker_num): all_process_result_list.append(pool.apply_async(analysis_xmls_batch,args=(sublists[i],))) () () print('analysis done!') _temp_list=[] for i in all_process_result_list: _temp_list=_temp_list+() result=collect_result(_temp_list) (result) print(result) def is_valid_jpg(jpg_file): """Determining if a JPG file download is complete """ if not (jpg_file): print(jpg_file,'is not existes') (jpg_file.replace('.jpg','.xml')) with open(jpg_file, 'rb') as fr: (-2, 2) if () == b'\xff\xd9': return True else: (jpg_file) (jpg_file.replace('.jpg','.xml')) print(jpg_file) (jpg_file,'is imperfect img') return False if __name__=='__main__': test_dir='/home/chiebotgpuhq/Share/winshare/origin' save_path='/home/chiebotgpuhq/MyCode/python/pytorch/mmdetection-master/' main(test_dir,save_path)
PS: Here again for you to provide a few online tools on xml operation for your reference:
on-lineXML/JSON interconversion tool:
http://tools./code/xmljson
Online FormattingXML/Online CompressionXML:
http://tools./code/xmlformat
XMLOnline compression/formatting tool:
http://tools./code/xml_format_compress
XMLCode online formatting beautification tool:
http://tools./code/xmlcodeformat
Readers interested in more Python related content can check out this site's topic: thePython manipulate xml data skills summary》、《Python Data Structures and Algorithms Tutorial》、《Python Socket Programming Tips Summary》、《Summary of Python function usage tips》、《Summary of Python string manipulation techniques》、《Python introductory and advanced classic tutorialsand theSummary of Python file and directory manipulation techniques》
I hope that what I have said in this article will help you in Python programming.