Service Stack Analysis

Can you predict your next downtime?

By 2022, 40% of enterprises will overcome IT operations management (ITOM) tool integration challenges by centralizing the data exchange and interchange functions in artificial intelligence for IT operations (AIOps), an increase from fewer than 2% today


Important KPIs for CIOs today are uptime and availability of the (IT) services and mean time to repair (MTTR). These are mainly driven by today’s customers. They want what they prefer, when they want it, and expect to receive it with a great customer experience. According to EMA research, 83% of consumers say that a positive customer experience with a brand is more important than the product itself. Knowing these customers expect to be always-on adds some additional challenges.

Over the years, companies have invested in multiple purpose-fit tools to better understand and get better insights into their environments. Unfortunately, this approach did not bring the clarity we aimed for. They are typically implemented in siloed environments with siloed teams and alerts on isolated problems. As a result, we lost all endto- end visibility, common understanding of KPIs, and the context needed to pinpoint a problem easily, resulting in a finger-pointing blame game instead of focusing on getting our services back up and running again.

One way to avoid the problems of siloed monitoring is building with observability in mind. Observability is the practice of constructing systems and applications to collect metrics, logs, and traces, creating them with the idea that administrators will watch over the system holistically. This is not the same as having all of your monitoring go through a single individual or team, but rather giving all roles across the stack visibility into the system as a whole. When infrastructure, operations, and development teams understand the relationship of their roles to the performance of the entire system, channels of communication open up, allowing teams to solve problems more efficiently or prevent them altogether.

This is why we’ve put the service back at the center of our solutions. At the base of our solution is a data lake that allows us to collect and analyse logs, metrics, and traces. So you can not only see when a problem is occurring but immediately have an idea where it is situated and can start looking at what the problem exactly is, basically giving you back the observability you always wanted. On top of that, Service Stack Analysis will identify issues within the context of the service being impacted instead of using a siloed technology by technology approach.

Apart from giving you a better service based view, allowing you to better observe and conduct impact analysis, our service stack analysis solution will offer you intelligent event management. Today’s operation centers are often dealing with event storms, as every tool is just forwarding as verbose as possible any event that might be of interest. Our service-oriented solution will cluster events, relate them to the service it is impacting and provide you with actionable insights that allow you to decide and act in a split second and reduce your MTTR.

"You can monitor a system using various instrumentation. But if the system doesn’t externalize its state well enough that you can figure out what’s actually going on in there, then you’re stuck."

Ernest Mueller, “Monitoring and Observability”