01-16-2008, 09:46 PM
There are a couple of utilities for tracking duplicates. Two examples: http://www.apple.com/downloads/macosx/sy...idyup.html, http://www.apple.com/downloads/macosx/sy...eguru.html
I haven't used either.
Deduplication is one of the current buzz-words in the storage industry as big biz suffers from the same problem except way bigger. Pretty much all the big storage vendors have something. There are various implementations but in principle they identify data using a hashing algorithm (sha1, md5 or similar) and eliminate the duplicates. By doing this down at sub-file level, they can deduplicate across files aswell. Apart from saving storage, big-biz gains by really cutting back on what goes on across their network as the duplicates can be identified before the data is sent to the bback-up device.
I haven't used either.
Deduplication is one of the current buzz-words in the storage industry as big biz suffers from the same problem except way bigger. Pretty much all the big storage vendors have something. There are various implementations but in principle they identify data using a hashing algorithm (sha1, md5 or similar) and eliminate the duplicates. By doing this down at sub-file level, they can deduplicate across files aswell. Apart from saving storage, big-biz gains by really cutting back on what goes on across their network as the duplicates can be identified before the data is sent to the bback-up device.