Version:
Platform: ubuntu 14 / I5 / 4G RAM
python version: python2.7
opencv version: 2.13.4
Dependency:
If your system does not have python, you will need to install it
sudo apt-get install python
sudo apt-get install python-dev
sudo apt-get install python-pip
sudo pip install numpy mathplotlib
sudo apt-get install libcv-dev
sudo apt-get install python-opencv
Image de-duplication using perceptual hashing algorithms
Principle: Iterate through all the files to de-emphasize each one, so the more images the slower it is, but it saves manual operations
Perceiving the Hash Principle:
1, need to compare the picture are scaled to 8 * 8 size grayscale map
2. Get the comparison of each pixel of each image with the average value to get the fingerprints
3. Calculate Hamming distance based on fingerprints
5. If the resulting different elements are less than 5 then they are identical (similar?). of the picture
#!/usr/bin/python # -*- coding: UTF-8 -*- import cv2 import numpy as np import os,sys,types
def cmpandremove2(path): dirs = (path) () if len(dirs) <= 0: return dict={} for i in dirs: prepath = path + "/" + i preimg = (prepath) if type(preimg) is : continue preresize = (preimg, (8,8)) pregray = (preresize, cv2.COLOR_BGR2GRAY) premean = (pregray)[0] prearr = () for j in range(0,len(prearr)): if prearr[j] >= premean: prearr[j] = 1 else: prearr[j] = 0 print "get", prepath dict[i] = prearr dictkeys = () () index = 0 while True: if index >= len(dictkeys): break curkey = dictkeys[index] dellist=[] print curkey index2 = index while True: if index2 >= len(dictkeys): break j = dictkeys[index2] if curkey == j: index2 = index2 + 1 continue arr1 = dict[curkey] arr2 = dict[j] diff = 0 for k in range(0,len(arr2)): if arr1[k] != arr2[k]: diff = diff + 1 if diff <= 5: (j) index2 = index2 + 1 if len(dellist) > 0: for j in dellist: file = path + "/" + j print "remove", file (file) (j) dictkeys = () () index = index + 1
def cmpandremove(path): index = 0 flag = 0 dirs = (path) () if len(dirs) <= 0: return 0 while True: if index >= len(dirs): break prepath = path + dirs[index] print prepath index2 = 0 preimg = (prepath) if type(preimg) is : index = index + 1 continue preresize = (preimg,(8,8)) pregray = (preresize, cv2.COLOR_BGR2GRAY) premean = (pregray)[0] prearr = () for i in range(0,len(prearr)): if prearr[i] >= premean: prearr[i] = 1 else: prearr[i] = 0 removepath = [] while True: if index2 >= len(dirs): break if index2 != index: curpath = path + dirs[index2] #print curpath curimg = (curpath) if type(curimg) is : index2 = index2 + 1 continue curresize = (curimg, (8,8)) curgray = (curresize, cv2.COLOR_BGR2GRAY) curmean = (curgray)[0] curarr = () for i in range(0,len(curarr)): if curarr[i] >= curmean: curarr[i] = 1 else: curarr[i] = 0 diff = 0 for i in range(0,len(curarr)): if curarr[i] != prearr[i] : diff = diff + 1 if diff <= 5: print 'the same' (curpath) flag = 1 index2 = index2 + 1 index = index + 1 if len(removepath) > 0: for file in removepath: print "remove", file (file) dirs = (path) () if len(dirs) <= 0: return 0 #index = 0 return flag def main(argv): if len(argv) <= 1: print "command error" return -1 if (argv[1]) is False: return -1 path = argv[1] ''' while True: if cmpandremove(path) == 0: break ''' cmpandremove(path) return 0 if __name__ == '__main__': main()
To save operations, iterate through all directories, going through the directories you want to de-duplicate
#!/bin/bash indir=$1 addcount=0 function intest() { for file in $1/* do echo $file if test -d $file then ~/ $file/ intest $file fi done } intest $indir
The above this method of using python opencv to de-duplicate images in a directory is all I have to share with you, I hope it will give you a reference, and I hope you will support me more.