A storage strategy with the goal of saving everything all the time and forever is prohibitively expensive to execute, and probably unnecessary. An efficient and effective data protection strategy is based on a realistic understanding of what the enterprise can afford to lose.
Assessing tolerance for loss is critical to understanding the business value of data stored on primary storage tiers as well as longer term storage for archiving. Info-Tech sees increasing convergence between the management of backup and archiving; however, the two are not the same.
- Backup is the creation of a secondary copy of primary online and nearline data so that the data can be restored in the event of loss.
- Archiving is the creation of a primary copy of secondary data that needs to be kept for a period of time for historical, regulatory, or business continuity (such as charter documents, licenses, etc.) reasons.
Note that an archive, as the primary copy of archival data, will likely require its own secondary copy. In other words, the archive will require backup.
Loss Tolerance and Backup
Info-Tech recommends that all investigation of backup solutions be preceded by a careful assessment of the business value of what is being backed up, as well as the establishment of Recovery Time Objectives (RTO) and Recovery Point Objectives (RPO). For more information, refer to the McLean Report research note, “Backup Strategy: Start with a Business Needs Assessment.”
A starting point for assessing the short and long-term business value of any data is to ask: What would happen if we were to lose this data? This question can be broken down further into permanent and temporary scenarios:
- What would be the cost of losing this data permanently?
- If we temporarily lose access to this data, how long can we tolerate its absence?
For primary storage, the question of loss tolerance figures prominently in business continuity and disaster recovery. A day's work in a personal folder, for example, might be an acceptable loss. Tolerance for loss of key transactional data that enables a core business process will be very low.
RTO relates to how long a temporary loss of data can be tolerated. RPO comes out of the assessment of what permanent data loss can be tolerated. If data is copied via point in time snapshot twice daily, for example, it has a maximum restore point of 12 hours previous to the data loss.
For some high value transactional data, this may not be sufficient. A more fine-grained copying method such as continuous data protection (CDP) allows for a much tighter restore point and less potential data loss. The decision to invest in CDP solutions for a given data set is therefore based on a low tolerance for loss.
Loss Tolerance and Archiving
Archived data is stored safely for a reason. In recent years, with increased regulation and new rules for civil procedure, those reasons have multiplied. With that multiplication has come demands for ever more storage capacity. For more information, refer to the McLean Report research notes, “E-Discovery: What It Is and How It Affects IT” and “Use International Standards as a First Step Towards Records Management.”
An archive has been likened to a corporate memory bank. However here, as with backup, a key question is not what needs to be remembered but what the enterprise can afford to forget. Defining an instance of data or a document as a necessary record again poses the question, what is the potential cost of losing this?
In some cases – such as inappropriate personal files being stored on corporate assets – there may be little cost in “forgetting” but a large potential liability in “remembering.”
As with backup there is also a question of time sensitivity. With backup the question of time centers around how long can we do without? With archiving, the question is how long can we tolerate keeping this record? When an expiry date is reached on a specific piece of archived data (say five to seven years) how do we go about losing it?
Key Takeaways
- Knowing what needs to be saved includes knowing what can be lost. Assessing the business value of data involves asking the “what if” question about loss. Backup systems, for example, assume that there will be loss in certain instances. Investment level in data protection should be driven by loss tolerance – i.e. higher availability and recoverability for lower tolerance for loss.
- The business has to be involved. Data protection planning has to involve the business because the data owners know best what can be lost. For the long-term memory of a corporate archive, the senior level and legal counsel has to be involved. In many cases, the question is avoided with the admonition to just keep everything. Senior decision makers need to understand the cost and potential liabilities of doing so.
- Disk is cheap but storage (can be) expensive. “Just keeping everything all the time and forever” can be seen as an easier alternative because disk is cheap and getting cheaper. However, ensuring enterprise-level availability and recoverability of primary data storage and security and recoverability of archival records has significant software and labor costs. Controlling these costs requires careful consideration of what needs to be stored and for how long.
Bottom Line
Data protection strategy typically focuses on saving data from corruption, physical damage to media, and accidental or inappropriate deletion. Data protection strategy is really about figuring out what the enterprise can afford to lose. Knowing what loss can be tolerated is key to assessing the business value of data in primary storage and secondary archive.