Key Metrics for Measuring System Availability


Download

Get Instant Access
To unlock the full content, please fill out our simple form and receive instant access.

With the increased reliance on IT systems, companies are becoming increasingly vulnerable to the massive costs and harmful impacts related to system failures. Therefore, it is essential to measure, track, and improve the amount of time a system is functioning properly. With the use of key IT metrics to measure availability, companies can evaluate their systems' current resistance to downtimes, identify areas that require attention, and improve overall system efficiency.

Measuring Availability

Availability is the amount of time a system is working at its full functionality during the time it is required to do so. The key metrics involved in measuring availability are Mean Time Between Failure (MTBF), sometimes referred to as Mean Time to Failure (MTTF), and Mean Time to Repair (MTTR).

Hide Details

Search Code: 2505
Published: December 9, 2008
Last Revised: December 9, 2008

5 Comments

  • Missing comment
    Gordon McCague | 01-10-2010

    Excellent article. Well written and easy to understand while providing important information.

  • Missing comment
    Gordon McCague | 01-10-2010

    Unfortunately the above link does appear to be broken.

    • 9eb6a2810126f534ebf65557616f34d1 comment
      Info-Tech Research Group | 10-21-2011

      Thank you, Gordon. The link has been revised.

  • Missing comment
    Daphne Rackley | 07-05-2016

    Are there any standards regarding when to calculate an application is unavailable? For instance,
    1) A large application with multiple modules; whereby one small module or app is not functioning, but the remaining modules are. How do best calculate? or
    2) An isolated network outage causes application availability issues (ie the network outage makes the application inaccessible) for one small site, but the rest of the enterprise can access the application.

    • 432c05244a845caaca3b276adb15a11e comment
      Info-Tech Research Group | 07-06-2016

      Thank you for your question. To begin, we recommend that you consider reviewing the blueprint, Improve IT-Business Alignment Through an Internal SLA, which defines a realistic process for setting, reporting, and continually improving SLAs with the business. For example, reporting true availability without upfront exclusions for scheduled downtimes or business hours.

      The next thing we want to mention is that expectations for availability or uptime can mean either availability or reliability, usually not both. Complicating this is that the calculations reported by vendors or enterprises always have exclusions and do not account for scheduled downtime or report only on business hours. So there is no “apples-to-apples” comparison to draw upon from data points harvested from most vendors or enterprises. The lack of clarity in the reporting of availability or reliability can be illustrated as follows:
      - Availability expresses the total amount of time an end-to-end service or individual component in the service delivery chain (hardware, apps, etc.) was up.
      - Reliability expresses the number of times the end-to-end service or component went down or failed.

      The above can seem like the same thing but it is nuanced and the strategies to address them are different. Here is the example we suggest you consider:
      The infrastructure (or more likely a specific component of the end-to-end service delivery chain) has one big outage that lasts 24 hours. Reporting would indicate a high-level reliability but low availability (like 99.5% or so).
      Or, the infrastructure (or specific component) could experience 288 failures or outages of less than 4 seconds and the reporting would indicate high availability but unreliability.
      Of course in either of those scenarios the business would be unable to utilize the infrastructure or component for an entire day. Again this might seem nuanced but as you can immediately see, you would take different mitigation actions to address those scenarios.

      Other blueprints that may help include Create a Right-Sized Disaster Recovery Plan and Create Visual SOP Documents.
      Please contact your account manager if you'd like to set up a call with an analyst regarding this topic.

Related Content

GET HELP Contact Us
×
VL Methodology