[NTLUG:Discuss] Any "duplicate files" utilities?
Daniel Hauck
daniel at yacg.com
Sun Apr 13 21:33:44 CDT 2008
Thanks! I have found some utils and some were the ones listed below,
but they seem to be so heavy on overhead that they don't actually seem
to work on file collections as large as mine. I think these utilities
aren't light enough...a script will likely do the job better.
Steve Baker wrote:
> I'm not going to write the script for you - but it's not difficult. In
> outline: I would collect a list of filenames and checksums using
> 'find'. It's convenient because you can set it's parameters to exclude
> things you don't want to test - not look in cross-mounted file systems,
> or in areas of the disk where system files live...etc. You can give it
> an '-exec' parameter to run the 'cksum' tool on every file that it
> finds. Do that right and you now have a L-O-N-G list of checksums and
> corresponding filenames - you can sort by whichever column the checksums
> ended up in using 'sort' and pipe that into 'uniq -d -sXX' to get a list
> of just those files that have duplicated checksums. That gives you a
> very short list of files that are ALMOST guaranteed to be identical...if
> "ALMOST" is good enough then you're done - if not, use 'diff -s' to be
> absolutely certain. I'm not sure how you indend to decide which of two
> identical files to remove - but the list will probably be a short enough
> one to deal with manually...or you could replace one of the files with a
> link to the other file.
>
> geoffrey at justaweebitcloser.com wrote:
>>> I'm sure there must be at least half a dozen such utilities out there, I
>>> just can't think of any for Linux.
>>>
>>> I seek a simple script or utility or whatever to find duplicate files on
>>> media so that I can trim some redundancies out.
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> http://www.ntlug.org/mailman/listinfo/discuss
>>>
>>>
>> A quick search for "duplicate" on Freshmeat resulted in the following:
>>
>> http://freshmeat.net/projects/fdmf/
>> http://freshmeat.net/projects/freedup/
>> http://freshmeat.net/projects/dupseek/
>> http://freshmeat.net/projects/fdupes/
>> http://freshmeat.net/projects/dupefinder/
>> http://freshmeat.net/projects/duper/
>>
>> There were more than that, but these were on the front page.
>>
>> --
>> Geoffrey
>>
>> _______________________________________________
>> http://www.ntlug.org/mailman/listinfo/discuss
>>
>
>
> _______________________________________________
> http://www.ntlug.org/mailman/listinfo/discuss
>
More information about the Discuss
mailing list