Data deduplication has become a hot storage feature in disk-to-disk backup and even some primary disk storage. But not all dedupe works the same way. As competition heats up, so does the rhetoric about what works best. Separate the fear, uncertainty, and doubt (FUD) from the facts when it comes to using hash values in data deduplication.
Deduplication and Hashing
Deduplication – also sometimes called single instancing – eliminates redundant data on a given storage media. As data is stored to disk, duplicate blocks of data are identified. Instead of writing the duplicate block to disk, a much smaller pointer is inserted in its place. An index of the data blocks is maintained so the single instance of data may be retrieved for multiple different file requests.
Significantly less disk capacity is required when multiple files are sharing single instances of a common data block. Vendors typically claim that 20:1 total data to actual disk usage can be achieved. That means only 1TB of disk is needed to back up 20TB of data – a 95% reduction in disk requirement. For more information, refer to the Info-Tech Advisor research note, “Save on Storage Costs with Data De-Duplication Solutions.”