SoFunction
Updated on 2024-12-15

Example of a python implementation that analyzes the number of categories in an xml tag in a batch.

In this paper, an example of the python implementation of the batch analysis of xml tags in the number of individual categories function. Shared for your reference, as follows:

Article Catalog

I need a script to analyze the number of targets and practice multiprocessing, for my own use, in code:

# -*- coding: utf-8 -*-
# @Time  : 2019/06/10 18:56
# @Author : TuanZhangSama
import os
import  as ET
from multiprocessing import Pool,freeze_support,cpu_count
import imghdr
import logging
def get_all_xml_path(xml_dir:str,filter=['.xml']):
  # Iterate over all xml in the folder
  result=[]
  #maindir is the currently searched directory subdir is the name of the folder in the current directory file is the name of the file in the directory
  for maindir,subdir,file_name_list in (xml_dir):
    for filename in file_name_list:
      ext=(filename)[1]# Returns the extension
      if ext in filter:
        ((maindir,filename))
  return result
def analysis_xml(xml_path:str):
  tree=(xml_path)
  root=()
  result_dict={}
  for obj in ('object'):
    obj_name = ('name').text
    obj_num=result_dict.get(obj_name,0)+1
    result_dict[obj_name]=obj_num
  if (xml_path.replace('.xml','.jpg')) != 'jpeg':
    print(xml_path.replace('.xml','.jpg'),'is worng')
    # (xml_path.replace('.xml','.jpg'))
  if is_valid_jpg(xml_path.replace('.xml','.jpg')):
    pass
  return result_dict
def analysis_xmls_batch(xmls_path_list:list):
  result_list=[]
  for i in xmls_path_list:
    result_list.append(analysis_xml(i))
  return result_list
def collect_result(result_list:list):
  all_result_dict={}
  for result_dict in result_list:
    for key,values in result_dict.items():
      obj_num=all_result_dict.get(key,0)+values
      all_result_dict[key]=obj_num
  return all_result_dict
def main(xml_dir:str,result_save_path:str =None):
  r'''Count the number of all samples according to the xml file. Incomplete images and samples with xml but no images are deleted directly. Default run all cpu cores
  Parameters
  ----------
  xml_dir : str
    The folder where the xml is located. It is recursive, so just make sure the xml is in a subdirectory of this directory. The corresponding image and its xml should be in the same directory.
  result_save_path : str
    The path to save the log of the result of the analysis. Default None No logs
  '''
  if result_save_path is not None:
    assert isinstance(result_save_path,str),'{} is illegal path'.format(result_save_path)
  else:
    (filename=result_save_path,filemode='w',level=)
  freeze_support()#windows
  xmls_path=get_all_xml_path(xml_dir)
  worker_num=cpu_count()
  print('your CPU num is',cpu_count())
  length=float(len(xmls_path))/float(worker_num)
  # Calculate subscripts to divide the list of input files as evenly as possible
  indices=[int(round(i*length)) for i in range(worker_num+1)]
  # Generate a list of subfiles to be processed by each process
  sublists=[xmls_path[indices[i]:indices[i+1]] for i in range(worker_num)]
  pool=Pool(processes=worker_num)
  all_process_result_list=[]
  for i in range(worker_num):
    all_process_result_list.append(pool.apply_async(analysis_xmls_batch,args=(sublists[i],)))
  ()
  ()
  print('analysis done!')
  _temp_list=[]
  for i in all_process_result_list:
    _temp_list=_temp_list+()
  result=collect_result(_temp_list)
  (result)
  print(result)
def is_valid_jpg(jpg_file):
  """Determining if a JPG file download is complete """
  if not (jpg_file):
    print(jpg_file,'is not existes')
    (jpg_file.replace('.jpg','.xml'))
  with open(jpg_file, 'rb') as fr:
    (-2, 2)
    if () == b'\xff\xd9':
      return True
    else:
      (jpg_file)
      (jpg_file.replace('.jpg','.xml'))
      print(jpg_file)
      (jpg_file,'is imperfect img')
      return False
if __name__=='__main__':
  test_dir='/home/chiebotgpuhq/Share/winshare/origin'
  save_path='/home/chiebotgpuhq/MyCode/python/pytorch/mmdetection-master/'
  main(test_dir,save_path)

PS: Here again for you to provide a few online tools on xml operation for your reference:

on-lineXML/JSON interconversion tool:
http://tools./code/xmljson

Online FormattingXML/Online CompressionXML
http://tools./code/xmlformat

XMLOnline compression/formatting tool:
http://tools./code/xml_format_compress

XMLCode online formatting beautification tool:
http://tools./code/xmlcodeformat

Readers interested in more Python related content can check out this site's topic: thePython manipulate xml data skills summary》、《Python Data Structures and Algorithms Tutorial》、《Python Socket Programming Tips Summary》、《Summary of Python function usage tips》、《Summary of Python string manipulation techniques》、《Python introductory and advanced classic tutorialsand theSummary of Python file and directory manipulation techniques

I hope that what I have said in this article will help you in Python programming.