You need to determine the appropriate level of visibility and integrability needed across your different environments.
- However, having less control over cloud environments means you need a different approach to monitoring their signals and inferring the state of the system.
- People are discussing observability and end-to-end visibility, but it’s not clear if this level of monitoring maturity is necessary.
Our Advice
Critical Insight
Full observability isn’t always necessary or worth the expense. Some aspects of your environment may need a greater level of monitoring than others. The key is first to understand the problem you are trying to solve and then choose the tools that give you the right level of data, context, and insights.
Impact and Result
- Define observability in the context of a monitoring capability scale: from systems monitoring to full observability.
- Right-size your monitoring implementation by aligning the different areas of your environment to the correct level of monitoring. Do you need full observability capabilities? Or will APM tools suffice?
Plan Your IT Monitoring Journey
Assess your need for observability.
EXECUTIVE BRIEF
Analyst Perspective
Determine the right level of monitoring based on the kind of analysis and response you need.
Most environments have several service management/monitoring tools in place, with the goal of improving the visibility, reporting, and alignment of the monitoring data for the infrastructure services supporting the application environment. These tools generate boatloads of data.
All these data points, however, don’t necessarily lead to insight. General metrics about an individual infrastructure element/dependency (such as a database or a network segment) are not useful for performance monitoring or capacity planning unless this data is contextualized within the end-to-end infrastructure dependencies supporting critical business transactions. Still, too frequently, end users end up serving as human sensors that provide the alert to infrastructure/application failures.
In the end, the predictive capability comes from continued analysis and logging of performance metrics in a very targeted way. The ability to do this well requires something beyond the traditional approach to monitoring. Hence, the rising profile of something called observability: a set of capabilities that can achieve a level of insight and visibility that traditional systems monitoring do not. Your challenge is to understand where this depth of visibility and insight is necessary in your environment.
When an incident occurs, you need access to the right information that will help you find issues quickly and before customers are affected. You need to have the ability to be proactive instead of reactive and conduct the investigation needed for root cause analysis and prevention. To achieve this, you need to determine where each area of your environments should land on the monitoring capability scale: from basic system monitoring to observability.
Emily Sugerman
Research Analyst, Infrastructure & Operations
Info-Tech Research Group
Darin Stahl
Distinguished Analyst & Research Fellow
Info-Tech Research Group
Executive summary
Your Challenge
|
Common Obstacles
|
Info-Tech's Approach
|
Info-Tech Insight
Full end-to-end observability isn’t always necessary or worth the expense. Some aspects of your environment may need a greater level of monitoring than others: the key is first to understand the problem you are trying to solve and then choose the tools that give you the right level of data, context, and insights.
Selecting the correct level of monitoring is a challenge
What are you monitoring – and why?
To improve mean time to repair, you need to improve mean time to respond. Improving mean time to respond depends on having an appropriate level of visibility and integrability across your different environments. However, this can be a challenge because:
- You have a raft of monitoring tools that you spend time maintaining but still lack visibility where you need it to understand the state of the system: to see, measure, act, and improve.
- Too often, your end users end up being your “sensors” for application and infrastructure failures, which does not lead to a stellar employee or customer experience.
- Your cloud environments offer less control over what you can directly monitor, so you need a means of monitoring signals around them instead.
- You don’t know if you even need end-to-end visibility in the first place. People are talking about observability, but there’s no consensus on what it means and how it differs from other levels of monitoring.
“81% of [US] technology leaders say the time their teams spend maintaining monitoring tools and preparing data for analysis steals time from innovation.”
Source: “The State of Observability 2024,” Dynatrace; n=200 US respondents
Common obstacles
Finding the right level of monitoring is complicated by the following factors:
- Vendors promote observability, but it’s not clear how observability differs from APM or whether you need it.
- Full end-to-end observability poses a significant increase in costs, so you need to have a clear understanding of the value it provides.
- You may not need to monitor each aspect of your environment in the same way. Some areas may need observability, while a more basic level of monitoring may suffice for others.
- You risk cobbling together a bunch of open-source tools to try to build your own – but you're not in the software business.
- Errors can appear that have no clear cause without data. Having the correct data is necessary, but many tools just create noise, and that makes the job more difficult when you don’t have access to the information you need. Often, people set up alerts that are well intentioned but end up creating noise and do not help when the pager goes off at 2 a.m.
Common challenges with observability implementations
- 63% of respondents report challenges with data management/storage.
- 57% of respondents report challenges with complexity/data analysis.
Source: “The State of Observability 2024,” OpsRamp; n=603
Determine what level of monitoring will help meet your goals
Cut through the definitional confusion by placing observability on a broader monitoring capability scale
The capabilities on the scale range from system monitoring, which affords you basic visibility to achieve basic detection and response, to application performance management, which layers on more context to help achieve a more precise understanding of the issue, to observability, which enables the investigation needed for root cause analysis and prevention.
To understand what you need, you must first understand each level’s:
- Scope
- Benefits
- Pain points
Insight summary
Understand your environment, what you’re trying to achieve, and who you need to benefit
Full end-to-end observability isn’t always necessary or worth the expense. Some aspects of your environment may need a greater level of monitoring than others. The key is first to understand the problem you are trying to solve and then choose the tools that give you the right level of data, context, and insights.
Don't focus solely on tools
Observability should be understood more as a cultural change rather than a change in technology, with the goal of getting to insights more directly. Observability should incorporate the experience and perspective of the users/customers in its design.
Understand where your capabilities fall short of your needs
To avoid assuming you are achieving observability when you are not, assess whether you can successfully manage recurring incidents downward, identify pending failures, prevent issues from occurring in the first place, generate automated insights, and meet service-level objectives and agreements.
Benefits of planning your IT monitoring journey
IT Benefits
- The right monitoring should provide you with the information you need to find issues quickly and before customers are affected. It allows you to become proactive instead of reactive.
- Setting up useful alerts makes being on call less stressful. It’s never fun being paged that the database is on fire, but having the right data and dashboards helps solve the problem.
- The more automated your monitoring is, the better you can prevent issues or resolve them before users are affected.
Business Benefits
- When your IT department knows the state of their systems, they can find and solve performance issues that impact your external customers’ experience and satisfaction.
- Too frequently, your business users end up becoming the “human sensors” that inform IT when there is a problem. With better monitoring, issues are found before they impact business users, which improves employee experience and productivity.
- It’s easy to run up large bills for observability. Right-sizing your monitoring will avoid risky and potentially unnecessary costs.
Project deliverables
Each step of this storyboard is accompanied by supporting deliverables to help you accomplish your goals:
Monitoring Effectiveness Assessment
Use this tool in Activity 2 to identify the effectiveness of your current approach to monitoring and supporting up to five applications, systems, or services.
Key deliverable
Monitoring Capability Scale Template
Use this template to document the results of activities 1 to 3 and to communicate your current and target states on the monitoring capability scale.
Measure the value of planning your IT monitoring journey
IT operations should focus on the continuous improvement of the three core metrics that underpin all the other tactical IT service management metrics, demand metrics, and dashboards:
- How soon can I see the issue?
- How quickly am I responding to the issue?
- How quickly can I make the issue go away?
- Mean time to detect
- Mean time to respond
- Mean time to repair
Reported observability benefits
Source: “The State of Observability 2024,” OpsRamp; n=603
Plan Your IT Monitoring Journey
1. Identify Your Current State
- Identify how your ability to monitor systems supports business goals.
- Understand where your current approach falls on the monitoring capability scale.
- Understand the differences between systems monitoring, APM, and observability.
2. Assess Current-State Effectiveness
- Assess the effectiveness of your current approach to monitoring your applications, systems, or services. Identify which high-importance areas are being underserved by current monitoring capabilities.
3. Determine Your Target State
- Determine your target state on the monitoring capability scale.
Activity 1: Identify your current state
- Identify up to five areas of your environment you would like to assess. If you have already conducted a business impact analysis, draw from that categorization of your systems. Otherwise, brainstorm a list of the systems, applications, or services you would like to assess for their monitoring needs.
- For each system, discuss how the performance of this system affects the organization’s mission or business goals. Consider its impact on the organization’s value-generating activities, employee experience, and customer experience.
- Identify the current state of monitoring for each system. Use the monitoring capability scale to determine what your current approach is and which band it falls into. Review the slides on system monitoring, APM, and observability for more detail and to help define your current approach.
- Manually plot the current state for each system on the monitoring capability scale, using the Monitoring Capability Scale Template (pictured).
Download the Monitoring Capability Scale Template
Connect IT metrics to organizational goals
What level of visibility and insight do you need to enable the organization’s mission and goals?
- Your range of monitoring capabilities should help achieve the continuous reduction of mean time to detect, mean time to respond, and mean time to repair.
- Similarly, automation and orchestration should allow you to resolve known problems without escalation.
-
However, stating these IT goals in isolation is only half the battle. How will achieving these goals help facilitate your organization’s core value streams? How will reducing these metrics translate into an improved end-user and customer experience? For example:
- Better employee retention and satisfaction, measured by employee engagement and pulse surveys
- Increased subscriptions and renewals, improved Net Promoter Score (likeliness of recommending service to others)
Roles typically involved in this discussion
- IT Operations
- Applications Support
- Application Development
- Application owner (product owner)/line of business
- DevOps team
- Performance engineering/platform support/site reliability engineering (SRE)
Info-Tech Insight
Observability should be understood more as a cultural change rather than a change in technology, with the goal of getting to insights more directly. Observability should incorporate the experience and perspective of the users/customers in its design.