Busting the FUD on Hash Collisions

Info-Tech Advisor: Research Note

Published: April 01, 2008


Data deduplication has become a hot storage feature in disk-to-disk backup and even some primary disk storage. But not all dedupe works the same way. As competition heats up, so does the rhetoric about what works best. Separate the fear, uncertainty, and doubt (FUD) from the facts when it comes to using hash values in data deduplication.

Deduplication and Hashing

Deduplication – also sometimes called single instancing – eliminates redundant data on a given storage media. As data is stored to disk, duplicate blocks of data are identified. Instead of writing the duplicate block to disk, a much smaller pointer is inserted in its place. An index of the data blocks is maintained so the single instance of data may be retrieved for multiple different file requests.

Significantly less disk capacity is required when multiple files are sharing single instances of a common data block. Vendors typically claim that 20:1 total data to actual disk usage can be achieved. That means only 1TB of disk is needed to back up 20TB of data – a 95% reduction in disk requirement. For more information, refer to the Info-Tech Advisor research note, “Save on Storage Costs with Data De-Duplication Solutions.”

«  Previous ITA Research Note Back to Current Research Next ITA Research Note »
This article is available in full to members of Info-Tech Advisor.
Already a member? Please log in.

Username:

Password:

Remember me:

I forgot my password.

E-mail address:

 

I am not an Info-Tech Advisor member, but...
  • I would like to become a member (starting at $495/yr).
  • I would like to learn more.