[NTLUG:Discuss] Any "duplicate files" utilities?

Steve Baker steve at sjbaker.org
Sun Apr 13 19:56:50 CDT 2008


I'm not going to write the script for you - but it's not difficult.  In 
outline: I would collect a list of filenames and checksums using 
'find'.   It's convenient because you can set it's parameters to exclude 
things you don't want to test - not look in cross-mounted file systems, 
or in areas of the disk where system files live...etc.   You can give it 
an '-exec' parameter to run the 'cksum' tool on every file that it 
finds.   Do that right and you now have a L-O-N-G list of checksums and 
corresponding filenames - you can sort by whichever column the checksums 
ended up in using 'sort' and pipe that into 'uniq -d -sXX' to get a list 
of just those files that have duplicated checksums.  That gives you a 
very short list of files that are ALMOST guaranteed to be identical...if 
"ALMOST" is good enough then you're done - if not, use 'diff -s' to be 
absolutely certain.  I'm not sure how you indend to decide which of two 
identical files to remove - but the list will probably be a short enough 
one to deal with manually...or you could replace one of the files with a 
link to the other file.

geoffrey at justaweebitcloser.com wrote:
>> I'm sure there must be at least half a dozen such utilities out there, I
>> just can't think of any for Linux.
>>
>> I seek a simple script or utility or whatever to find duplicate files on
>> media so that I can trim some redundancies out.
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> http://www.ntlug.org/mailman/listinfo/discuss
>>
>>     
>
> A quick search for "duplicate" on Freshmeat resulted in the following:
>
> http://freshmeat.net/projects/fdmf/
> http://freshmeat.net/projects/freedup/
> http://freshmeat.net/projects/dupseek/
> http://freshmeat.net/projects/fdupes/
> http://freshmeat.net/projects/dupefinder/
> http://freshmeat.net/projects/duper/
>
> There were more than that, but these were on the front page.
>
> --
> Geoffrey
>
> _______________________________________________
> http://www.ntlug.org/mailman/listinfo/discuss
>   




More information about the Discuss mailing list