Automation and Full Integration for Service Management Keys to DevOps-driven Digital Transformation
Automation Remains Priority for More Than Half of DevOps, ITOps and SRE Pros as Downtime Costs Rise, Says Second Annual DevOps Automation Survey
Automation and full integration for Service Management have a significant impact on the DevOps lifecycle in digital transformation journeys. According to a latest report, automation can impact the outcomes of workflows linked to digital transformation.
Transposit, the company that delivers connected workflow for DevOps, today released its 2022 State of DevOps Automation Report which reveals the increasing need for automation and site reliability engineering (SRE) practices as organizations continue to adopt hybrid work environments and execute digital transformation initiatives at a similarly accelerated pace as the prior year. Nearly half of respondents (48.2%) pointed to automation as a way to decrease Mean Time to Resolution/Repair (MTTR) and improve service management, which aligns with the results of the 2021 study where 52.3% said they would implement new automation tools or applications. This year’s findings were collected from 1,046 engineering, IT Operations, DevOps and Site Reliability Engineering professionals in the United States with the role of VP, Director, Manager or individual contributor at organizations with over 300 employees.
Featured AIThority.com News: Innovative IoT Modem Card From Thales Makes It Easier To Build A 5G World Of Trust
The following survey results highlight the ongoing impact of downtime-producing incidents on organizations:
- 7% of organizations said their cost of downtime has increased during the last year (March 2021 to now).
- 2% reported that downtime (i.e., application outages, service degradation) cost their organization up to $499,999 per hour on average.
“Organizations need to deliver innovation faster and more efficiently than ever before. However, too many SRE, ITOps and DevOps teams are wasting time on disconnected, manual processes and playing a reactive game of whack-a-mole as they try to keep applications running. This has the opposite of the desired effect on organizations looking to accelerate digital transformation,” said Divanny Lamas, CEO of Transposit.
Divanny added, “This year’s study clearly points to the impact that automation can make on an organization’s resources and talent. Implementing automation that harnesses human judgment has the power to improve service reliability, boost innovation and ultimately improve customer experience.”
Full Integration for Service Management: The X-factor for Ops and Engineering Teams as the Push for Digital Transformation and Hybrid Work Persists
Digital transformation remains at the core of business strategy, with 9 in 10 organizations increasing their focus on these initiatives during the last year. This increase, combined with the rise of hybrid work, has driven Ops teams to modernize their tech stack, adding to the complexity of operations workflows. Despite the addition of new tools, organizations still lack full integration of the platforms and services used during incident response, which is having a dampening effect on customer experience and the business overall.
In fact, when asked how well integrated the various tools used during incident responses are, only a quarter (24.7%) said their tools are integrated through one tool or platform. Without full integration of incident management tools for 75.3% of organizations, teams run the risk of slow detection and analysis, delayed response and operational impact, and a decrease in quality of service reliability. For the 3 in 4 organizations that have adopted a hybrid workforce model since the pandemic, implementing a continuous integration workflow to incident response for service management is key to minimizing downtime and faster resolution.
Service Incidents Are Becoming More Complex and Challenging to Solve
Ops teams are encountering more service incidents and are experiencing challenges while trying to solve them, including difficulties reaching people with specialized knowledge, inadequate support from collaboration methods and tools, and lack of automation. A worrisome 62.9% of respondents reported an increase in the frequency of service incidents that have affected their customers over the course of the last year. Although this is a decrease (of 27.5%) from last year’s study—which means some organizations have adjusted to massive digital transformation brought by the pandemic—there’s still a significant majority reporting an increase in the frequency of service incidents, and they need more support solving them.
Service incidents are becoming more complex and taking longer to resolve. Of the survey respondents who reported an increase in the amount of time it takes to resolve incidents, they cited the following reasons as the top three contributing factors:
- Lack of unified communication with teammates (people are collaborating using disparate tools) – 45.2%
- Processes have changed or are harder to follow while working remotely – 41.5%
- Lack of visibility into dependencies and what teams or people are responsible for the code or infrastructure – 38.8%
During a service incident, speed is of the essence and bringing the right team members together is critical. Almost half (48.6%) of all respondents said it takes anywhere between 15 to 30 minutes to tap the right people to solve an incident. This is further exacerbated with VP/Director/Manager roles among SRE respondents, of which nearly one-third (30%) say it takes 31 minutes to one hour. Incident response must get faster in coordinating all stakeholders to successfully identify next steps and recover from an event.
The focus on SRE efforts directly impacts the number of service incidents an organization can experience in a given year. Interestingly, nearly 1 in 3 respondents (29.4%) from organizations that rely on the operations team for service reliability experienced 20 or more major incidents over the last 12 months. This is a stark contrast to the organizations that have increased their focus on SRE practices and have plans to expand SRE efforts in 2022, with 38.2% of teams having only encountered less than five major incidents during the same timeframe.
Implementing Automation to SRE Practices Provides Relief for Service Incident Complexity
Better collaboration methods and tools as well as harnessing human judgment are key to successfully enhancing service reliability, resolving incidents faster and expanding automation. Specifically, the need for new tools is evident in the SRE roles to support organizations’ increased focus on site reliability practices. The study revealed that the need for site reliability is on the rise:
- 6% of respondents said there has been an increased focus on site reliability engineering practices in their organization in the past 12 months. And of those, 35.1% plan to expand SRE efforts in 2022.
- 1% of respondents plan to hire site reliability engineers in the next 12 months.
Despite the increasing demand for SRE practices, SREs are still largely leaning on time-intensive processes. Over half of SREs (56.5%) said they manually enter data into an ITSM system or other system of record to keep track of actions that were taken by humans during the resolution of an incident. As organizations scale, manual processes will no longer be able to keep up and teams must turn to automation. In fact, 100% of the respondents with a VP/Director/Manager SRE title who cited a decrease or no change in service incidents said it was because their organization implemented automation technology to help reduce the number of service incidents. However, many organizations still struggle to expand automation, citing the top barriers to implementing automation as inadequate documentation of institutional knowledge and existing processes (56.4%), lack of clarity about what to automate (55%) and share of knowledge is not enough (51.8%).
“In year one, the pandemic brought about disruption as companies accelerated their digital transformation initiatives, many from a total standstill. Fast forward to today—more than two years after the start of the pandemic—hybrid work has pushed us into an era of hyper digitization and magnified the level of complexity development and operations teams are dealing with,” said Stephen O’Grady, RedMonk co-founder and principal analyst. “The shift highlights a need for tools and platforms, such as human-in-the-loop automation, that help TechOps professionals contend with the increased complexity.”
Automation can take different forms, depending on the needs of an organization. This year’s survey results illustrate that:
- A majority (63%) responded that their approach to automation was incremental automation, in which they begin by codifying processes and work up to more advanced, fully automated scenarios;
- 4% of respondents said automation should let humans use their judgment at critical decision points to be more reliable and effective;
- The top three tasks respondents would like automated were service requests (52.6%), change requests (42.9%) and user provisioning (39.8%).
Lamas observed, “Given the rising costs of downtime and incidents that affect customers, it’s no wonder that nearly half of organizations plan to implement automation to strengthen their service reliability processes and reduce MTTR. Having reliable automation in place can prevent organizations from experiencing a massive drain of resources, time and money and drive sustainable, fast-paced innovation.”