In-Memory Computing: An Efficient, Cost-Effective Path to Continuous Learning Applications
An In-Memory Computing Platform Enables Companies to Drive Optimal Decision Making Powered by Integrated Machine Learning
To remain competitive in an increasingly digital world, companies must be born as or become digitally driven enterprises. They must develop the ability to process, analyze and act on massive amounts of data in real-time to drive business success. Doing this successfully requires drastically shrinking the time required to move from data processing to analysis to action by employing an in-memory computing platform with a continuous learning capability that eliminates the delays introduced by today’s time-consuming extract, transform and load (ETL) process.
Consider the following applications:
- Financial fraud mitigation – Successfully thwarting fraudulent credit applications requires that banks detect a new fraud vector as quickly as possible. Major banks often request the leading credit card companies to issue hourly updates to their fraud detection models. Nightly retraining of the machine learning model based on data that is moved to an analysis database using an ETL process would leave the bank vulnerable to new fraud vectors for a full day, well outside the SLA required by banks issuing the cards.
- Information security – Predictive analytics for network and data security is an effective strategy, but it requires a frequently updated model of normal network activity in order to detect a new anomalous threat. Since normal activity in large networks can evolve rapidly—due to the addition of new types of devices, endpoints, or protocols, for example—less frequent model updates translate to increased vulnerability.
- Recommendation engines – Improving the relevance of e-commerce and media recommendations requires machine learning models trained using data from the browsing history of thousands or even millions of site visitors, along with their purchase history, product information and availability data, and even trend information from social media sites. For optimal performance, the machine learning model driving the recommendation engine must be frequently updated based on the latest data collected from a variety of sources, as well as the addition of new products and new web pages. Less frequent updates equal to less recommendation relevance.
- Next-generation spam filters – A simple rules-based spam filter is easily outwitted by spammers who simply update their messages to get around the rules. A next-generation spam filter based on a frequently updated machine learning model can automatically adapt to the content of messages, the metadata, and user interactions to determine the parameters that allow it to identify and block messages that are likely spam.
Read More: Cryptocurrency Tax Returns and the IRS
Given the amount of data companies contend with and the speed of business today, increasing the frequency at which a machine learning model is updated can be critical to business success. Frequently updating a machine learning model may require a continuous learning framework composed of:
- A distributed in-memory computing platform that provides the necessary speed and scalability to process, analyze and act on massive amounts of data in real-time while also eliminating the need for ETL.
- Machine learning training algorithms running on the distributed in-memory computing platform so they can directly access the massive amounts of operational data and retrain the machine learning model at any time without impacting system performance.
In-Memory Computing for Application Speed and Scalability
Most organizations today cannot achieve the speed and scalability required to reach their digital transformation goals because they still rely on a bifurcated infrastructure and database model. The ETL process required to periodically move data from online transactional processing (OLTP) database to online analytical processing (OLAP) database introduces delays which frustrate their requirements for real-time data ingest, analysis and action.
However, a new generation of in-memory computing platforms that can function as a hybrid transactional/analytical processing (HTAP) system eliminates the need for separate transactional and analytical databases. An in-memory computing platform deployed on a cluster of commodity servers pools the available CPUs and RAM and distributes data and compute across the cluster. The cluster can be deployed on-premises, in a public or private cloud, or in a hybrid environment.
This in-memory computing platform can be deployed as an in-memory data grid (IMDG) inserted between an existing application and database—without rip-and-replacement of the existing database. Once the data from the underlying database loads into the IMDG, the IMDG processes all the reads and writes. New transactions are sent by the application layer to the IMDG, which then writes them to the underlying database, ensuring consistency and availability of the data. By holding all data in memory and applying massively parallel processing across the distributed cluster, processes can run up to 1000x faster than when the application must constantly interact directly with the underlying disk-based database.
The distributed architecture of the in-memory computing platform allows the CPU power and RAM of the cluster to be increased simply by adding nodes to the cluster. The in-memory computing platform can automatically detect the additional nodes and redistribute data to ensure that all the cluster CPU and RAM is used optimally. This unified architecture for transactions and analytics is referred to as hybrid transactional/analytical processing (HTAP) or hybrid operational/analytical processing (HOAP). Gartner refers to an HTAP system with a continuous learning capability as “in-process HTAP.”
Gartner has predicted that by 2020, in-memory computing will be incorporated into most mainstream products.