SoFunction
Updated on 2024-11-21

Python3 filecmp module test compare file principle analysis

Comparison of documents

The filecmp module provides a number of functions and a class to compare files and directories on a file system.

1.1 Example data

Use the following code to create a set of test files.

import os
def mkfile(filename, body=None):
  with open(filename, 'w') as f:
    (body or filename)
  return
 
def make_example_dir(top):
  if not (top):
    (top)
  curdir = ()
  (top)
  ('dir1')
  ('dir2')
  mkfile('dir1/file_only_in_dir1')
  mkfile('dir2/file_only_in_dir2')
  ('dir1/dir_only_in_dir1')
  ('dir2/dir_only_in_dir2')
  ('dir1/common_dir')
  ('dir2/common_dir')
  mkfile('dir1/common_file', 'this file is the same')
  ('dir1/common_file', 'dir2/common_file')
  mkfile('dir1/contents_differ')
  mkfile('dir2/contents_differ')
  # Update the access and modification times so most of the stat
  # results will match.
  st = ('dir1/contents_differ')
  ('dir2/contents_differ', (st.st_atime, st.st_mtime))
  mkfile('dir1/file_in_dir1', 'This is a file in dir1')
  ('dir2/file_in_dir1')
  (curdir)
  return
 
if __name__ == '__main__':
  ((__file__) or ())
  make_example_dir('example')
  make_example_dir('example/dir1/common_dir')
  make_example_dir('example/dir2/common_dir')

Running this script will generate a file tree in the axample directory.

The common_dir directory has the same directory structure to provide interesting recursive comparison options.

1.2 Comparison of documents

cmp() is used to compare two files on a file system.

import filecmp
print('common_file  :', end=' ')
print(('example/dir1/common_file',
         'example/dir2/common_file',
         shallow=True),
   end=' ')
print(('example/dir1/common_file',
         'example/dir2/common_file',
         shallow=False))
print('contents_differ:', end=' ')
print(('example/dir1/contents_differ',
         'example/dir2/contents_differ',
         shallow=True),
   end=' ')
print(('example/dir1/contents_differ',
         'example/dir2/contents_differ',
         shallow=False))
print('identical   :', end=' ')
print(('example/dir1/file_only_in_dir1',
         'example/dir1/file_only_in_dir1',
         shallow=True),
   end=' ')
print(('example/dir1/file_only_in_dir1',
         'example/dir1/file_only_in_dir1',
         shallow=False))

The shallo argument tells cmp() whether to look at the contents of the file in addition to the file's metadata. By default, a shallow comparison is done using the information obtained by (). If the results are the same, the files are considered identical. Thus, files of the same size that are created at the same time are reported as being the same file even if their contents are different. When shallow is False, then the contents of the files are compared.

If comparing a set of files in two directories non-recursively, cmpfiles() can be used. The arguments are the name of the directory and a list of I Just Love You to check in both locations. The public list of files passed in should contain only the filenames (directories will cause the match to be unsuccessful), and the files should appear in both locations. The next example shows a simple way to construct a public list. Like cmp(), this comparison has a SHALLOW flag.

import filecmp
import os
# Determine the items that exist in both directories
d1_contents = set(('example/dir1'))
d2_contents = set(('example/dir2'))
common = list(d1_contents & d2_contents)
common_files = [
  f
  for f in common
  if (('example/dir1', f))
]
print('Common files:', common_files)
# Compare the directories
match, mismatch, errors = (
  'example/dir1',
  'example/dir2',
  common_files,
)
print('Match    :', match)
print('Mismatch  :', mismatch)
print('Errors   :', errors)

cmpfiles() returns a list of 3 filenames containing matched files, unmatched files, and files that can't be compared (due to permissions issues or for other reasons).

1.3 Comparative catalog

The previously described functions are suitable for relatively simple comparisons. For recursive comparisons of large directory trees or more complete analyses, the dircmp class is very much more useful. In the simplest use case, report() prints a report comparing two directories.

import filecmp
dc = ('example/dir1', 'example/dir2')
()

The output is a plain text report showing results that include only the contents of the given directory, without recursively comparing its subdirectories. Here, the files not_the_same are considered to be the same, because the contents are not compared here. It is not possible to have dircmp compare the contents of files as cmp() does.

For more detail, and to complete a recursive comparison, use report_full_closure().

import filecmp
dc = ('example/dir1', 'example/dir2')
dc.report_full_closure()

The output will include a comparison of all sibling subdirectories.

1.4 Use of differences in procedures

In addition to generating printed reports, dircmp also calculates lists of files, which can be used directly in a program. Each of the following attributes is only computed at the time of request, so there is no overhead associated with creating a dircmp instance for unused data.

import filecmp
import pprint
dc = ('example/dir1', 'example/dir2')
print('Left:')
(dc.left_list)
print('\nRight:')
(dc.right_list)

The files and subdirectories contained in the compared directories are listed in left_list and right_list, respectively.

Input can be filtered by passing the constructor a list of names to be ignored (the names specified in this list will be ignored). By default, names such as RCS, CVS and tags are ignored.

import filecmp
import pprint
dc = ('example/dir1', 'example/dir2',
          ignore=['common_file'])
print('Left:')
(dc.left_list)
print('\nRight:')
(dc.right_list)

Here, common_file is removed from the list of files to compare.

File names common to both input directories are kept in common, and files unique to each directory are listed in left_only and right_only.

import filecmp
import pprint
dc = ('example/dir1', 'example/dir2')
print('Common:')
()
print('\nLeft:')
(dc.left_only)
print('\nRight:')
(dc.right_only)

The "left" directory is the first argument to dircmp(), and the "right" directory is the second.

The public members can be further broken down into files, directories, and "interesting" elements (content of different types in two directories, or places where errors have been pointed out).

import filecmp
import pprint
dc = ('example/dir1', 'example/dir2')
print('Common:')
()
print('\nDirectories:')
(dc.common_dirs)
print('\nFiles:')
(dc.common_files)
print('\nFunny:')
(dc.common_funny)

In the example data, the file_in_dir1 element is a file in one directory and a subdirectory in another, so it appears in the "interesting" list.

A similar division can be made for the differences between documents.

import filecmp
dc = ('example/dir1', 'example/dir2')
print('Same   :', dc.same_files)
print('Different :', dc.diff_files)
print('Funny   :', dc.funny_files)

The file not_the_same is compared by () and the contents are not checked, so it is included in the same_files list.

As a final point, subdirectories are also saved so that recursive comparisons can be done easily.

import filecmp
dc = ('example/dir1', 'example/dir2')
print('Subdirectories:')
print()

The attribute subdirs is a dictionary that maps directory names to new dircmp objects.

This is the whole content of this article.