[NTLUG:Discuss] Any "duplicate files" utilities?
Steve Baker
steve at sjbaker.org
Sun Apr 13 19:56:50 CDT 2008
I'm not going to write the script for you - but it's not difficult. In
outline: I would collect a list of filenames and checksums using
'find'. It's convenient because you can set it's parameters to exclude
things you don't want to test - not look in cross-mounted file systems,
or in areas of the disk where system files live...etc. You can give it
an '-exec' parameter to run the 'cksum' tool on every file that it
finds. Do that right and you now have a L-O-N-G list of checksums and
corresponding filenames - you can sort by whichever column the checksums
ended up in using 'sort' and pipe that into 'uniq -d -sXX' to get a list
of just those files that have duplicated checksums. That gives you a
very short list of files that are ALMOST guaranteed to be identical...if
"ALMOST" is good enough then you're done - if not, use 'diff -s' to be
absolutely certain. I'm not sure how you indend to decide which of two
identical files to remove - but the list will probably be a short enough
one to deal with manually...or you could replace one of the files with a
link to the other file.
geoffrey at justaweebitcloser.com wrote:
>> I'm sure there must be at least half a dozen such utilities out there, I
>> just can't think of any for Linux.
>>
>> I seek a simple script or utility or whatever to find duplicate files on
>> media so that I can trim some redundancies out.
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>> http://www.ntlug.org/mailman/listinfo/discuss
>>
>>
>
> A quick search for "duplicate" on Freshmeat resulted in the following:
>
> http://freshmeat.net/projects/fdmf/
> http://freshmeat.net/projects/freedup/
> http://freshmeat.net/projects/dupseek/
> http://freshmeat.net/projects/fdupes/
> http://freshmeat.net/projects/dupefinder/
> http://freshmeat.net/projects/duper/
>
> There were more than that, but these were on the front page.
>
> --
> Geoffrey
>
> _______________________________________________
> http://www.ntlug.org/mailman/listinfo/discuss
>
More information about the Discuss
mailing list