Nearly Half of IT Ops Professionals Say Preventing and Resolving IT Outages Are Their Biggest Challenges
BigPanda, Inc., provider of the first Event Correlation and Automation platform powered by AIOps, revealed the results of an IDG Research survey conducted in the early days of the pandemic. The study explores challenges IT Ops, NOC, DevOps, and SRE teams face as their organizations’ race to capture the digital-led market. The results of the survey show that, in addition to managing complex and ever-changing IT environments with many different tools, teams are now plagued with an increasing volume of IT incidents and outages which results in customer churn and costly service outages.
“An influx of data from multiple tools, coupled with low levels of automation, can have a paralyzing effect on IT incident management processes,” said Jen Garofalo, IDG’s Research Director. “More than 40% of respondents indicate IT incident remediation is handled with a mix of manual and automated processes, while another 20% report these processes are mostly manual.”
Complex Environments Lead to Longer Incident Management Cycles
Nearly one-quarter of respondents (22%) have 20 or more distinct IT teams supporting the different IT and business services at their organizations. On average, enterprises use 20 different monitoring and observability tools to detect potential issues with infrastructure, applications, and services. The average respondent reports that infrastructure is hosted in more than one location including on-premises infrastructure (60%), public cloud (57%), private cloud (47%), and commercial data centers (24%).
Nearly half of IT Ops professionals, 47%, said coordinating IT incident or outage detection, analysis, and response across siloed IT teams is the biggest challenge they face. Reasons why include:
- More than 14,000 alerts are generated from IT monitoring tools on average, and nearly two-thirds of respondents (65%) report that alerts have increased in frequency over the past 12 months.
- More than four in 10 alerts (44%) are caused by infrastructure or software changes made by someone in the organization who doesn’t have visibility across all systems to understand the impact of their change.
- Respondents report an average of 12 hours to determine the root cause of a P1 (major) incident.
- Further, the survey uncovered the largest business impacts of IT incident management challenges, including increased operating costs (43%), delays in time to market (42%), and decreased IT Ops productivity (41%).
While all of this is happening, more applications are being built and put into production — nearly three-quarters of respondents (74%) expect Development/DevOps workloads to increase over the next 12 months, with 30% expecting a significant increase.
“For a variety of reasons, the COVID-19 pandemic is accelerating the pace at which enterprises are digitally transforming. This, in turn, increases the challenge facing IT Operations teams to keep their companies running smoothly,” said Assaf Resnick, co-founder and Chief Executive Officer for BigPanda. “The IDG report clearly shows that corporate executives are not just driving business teams to increase their digital footprint – they are doubling-down on IT’s parallel effort to adopt AI and automation in order to support those new revenue-generating initiatives.”
A majority of respondents (79%) expect budgets for IT Operations to increase over the next 12 months (34% significantly, 45% somewhat). This will be reflected in multiple areas including automating IT incident management, increasing communication/knowledge sharing, and improving IT monitoring and event correlation, all of which were cited by more than 50% of respondents.
Meanwhile, most respondents have heard the term AIOps, and 44% are considering or already have a solution with AIOps in place. Those who are considering or already have a solution with AIOps in place are most likely to leverage it to automate IT incident response. Overall, respondents are most interested in the potential to leverage AIOps to accelerate IT incident and outage resolution.