Emerging Tech Trends in AI: AI Eats the Data Center
Artificial Intelligence (AI) has proven to be the rocket fuel consumer side technologies increasingly turn to for generating bone-crushing hits in football simulation games, to understanding and translating all forms of human speech, to generating that perfect playlist seeded by your unhealthy love of Steely Dan. As smart consumer applications become ubiquitous, it is not surprising that we are seeing learning systems explode into the enterprise IT landscape.
The injection of AI into many standard enterprises IT use cases has already revolutionized smart security operations, AI-enhanced IT Operations (AIOps), robotic process automation, data modeling, learning data preparation and smart edge analytics. The spread of AI into another enterprise IT use cases will only accelerate in the coming year. But let’s look a bit deeper at how AI will fundamentally change the data center itself.
Read More: Cryptocurrency Tax Returns and the IRS
AI for Data Centers – We expect to see data centers couple AI with data center information management (DCIM) systems to provide smart data center operations. For example, in 2014, DeepMind was used to observe and recommend control tweaks to fans, ventilation and cooling equipment across Google’s data centers. This resulted in a utility cost reduction of 40%1. So, in 2018, Google turned over full control to its self-taught algorithm to autonomously adjust data center environmental controls, observe the result, learn and get smarter2. The stage is now set for smart DCIM tools to migrate from the large cloud providers into the co-location data center hosting companies and from there into private data centers.
Next, on the horizon, smart DCIM will virtually-relocate heat generating compute loads across row and rack locations for optimal temperature control. Smart products are emerging that vary local target temperatures based on evolving hardware tolerances, power consumption/cost trends and transient workloads. With data center power and cooling consuming 1.8% of all power in the US3 alone, the cost savings realized from AI-driven power distribution and management is colossal.
Looking further down the road, smart DCIM systems and sensors will not only monitor heat, airflow, vibration, ultrasound, power consumption, water, and smoke detection to detect anomalous system behavior but will also determine the source and cause of the issue4. Future smart DCIM systems will not only say when, where and why something failed but will also predictively alert operators before things go awry5 and, eventually, autonomously interdict.
Data Centers for AI – As the applications housed in the data center increasingly use AI models as the core of their decision-making functions, this is changing the very software development lifecycle (SDLC) itself. In traditional applications, programmatic changes are rigorously verified and deployed via a controlled, repeatable and unidirectional fashion from non-production into production. AI-based applications, however, do not solely rely on code changes or one-way deployment. Rather, AI apps evolve smarter and smarter models with very little changes to the underlying application. Additionally, many AI-based applications continue to train after deployment and propagate their learnings back into the development environment via a bi-directional pipeline. Today’s CI/CD pipelines are not designed for this type of train-heavy-test-light two-way deployment, often referred to as MLOps, and will need to be rethought. The widespread adoption of MLOps will also change networking from a protective moat protecting production workloads into a more fluid and permeable micro-segmented mesh.
AI training can drink up an ocean of computation. This is being provided increasingly by non-CPU-centric servers built on massive arrays of GPUs, FPGAs, custom ASICs or purpose-built deep learning processing units that greatly reduce training time. These compute engines have an unquenchable thirst for power with many of today’s systems gobbling up 30-50 kW/rack and next-gen systems estimated to reach upwards of a staggering 100 kW/rack. This is simply not supportable at scale by most of today’s data centers without substantial power and cooling re-engineering.
These mega-computers only operate as fast as the training data provided to them … and AI algorithms require an inordinate amount of training data. This mountain of training data will need to be housed in a trinity of large, cheap and lightning fast near-line storage. This is triggering a new storage arms race of faster controllers, protocols (e.g., NVMe & NVMe-oF) and media (e.g., 3D XPoint & 3D NAND). These will equip training environments with an order of magnitude faster storage than their production counterparts. This is the complete opposite of how storage is allocated in data centers today.
It is clear that AI-centric non-production environments will be built with more compute and storage horsepower than production, reversing the time-honored tradition of building non-production environments from retired production hand-me-downs. Shiny new computer and storage platforms will be deployed directly into development and training environments, requiring a radical transformation for server, network and storage infrastructure topologies throughout the data center.
So, while AI-based applications are changing what is housed inside the data center, it is also changing the operation and infrastructure of the data center itself at a very fundamental level. Far-sighted data center operators will transition to support AI-based applications while leveraging AI to streamline and automate data center operations.
Mark Campbell is the chief innovation officer at Trace3, where he combines an insider’s advantage from leading venture firms with his 25 years of real-world IT experience to help enterprises discover, vet and adopt emerging technologies. His “from the trenches” perspective gives him the material for his frequent articles and speaking engagements. Mark will speak about “Emerging Tech Trends in AI” at EVOLVE, Trace3’s annual leadership and technology conference, which takes places in Las Vegas, May 7-9.