Busting the FUD on Hash Collisions

Author(s): John Sloan

Data deduplication has become a hot storage feature in disk-to-disk backup and even some primary disk storage. But not all dedupe works the same way. As competition heats up, so does the rhetoric about what works best. Separate the fear, uncertainty, and doubt (FUD) from the facts when it comes to using hash values in data deduplication.

Deduplication and Hashing

Deduplication – also sometimes called single instancing – eliminates redundant data on a given storage media. As data is stored to disk, duplicate blocks of data are identified. Instead of writing the duplicate block to disk, a much smaller pointer is inserted in its place. An index of the data blocks is maintained so the single instance of data may be retrieved for multiple different file requests.

Busting the FUD on Hash Collisions

Deduplication and Hashing

Tags

Related Content: Infrastructure and Operations