Machine Learning’s Vital Role in Malware Detection
The use of Machine Learning for threat detection is essential to counter the massive growth in malware. AV-Test, an independent research institute for IT security, claims it detects an astonishing 350,000 new malware samples every day. The company has calculated that over 972 million malware specimens are currently swarming the internet.
The cybersecurity industry invests billions of dollars in attempting to recognize and defend against malware. Cybercriminals attempt to beat the game by tweaking their code and techniques, with the aim of creeping past cybersecurity defenses.
Not all malware is equal, however. Simpler malware is easily detectable based on its signature. Some malware has similar underlying code structures suggesting it originates from a ‘family’ of malware, and as such, it can typically be easily identified. More often, however, malware comes from sophisticated developers and is designed to evade simple signature detection. The increasing velocity, volume, and complexity of malware all pose significant challenges.
Modern anti-malware software employs behavioral heuristic algorithms to spot suspicious characteristics that indicate new viruses or modified versions of existing threats, as well as known malware samples. This is generally layered over signature-based detection to keep out the millions of known viruses.
Machine Learning in cybersecurity goes further by enabling anti-malware software to learn which files are malicious and which are benign based on patterns learned from analyzing large corpora of known good and known bad files. It makes decisions about whether or not analyzed code is potentially harmful based on features that are learned based on code structure, content, source, recency, entropy, behavior and more. As is typical in Machine Learning, not all features have equal weight, and the relevant features and weighting will change over time. Naturally, success is dependent on large and dynamic pools of data that is constantly updated.
Code that is determined to be benign might have some traits that the Machine Learning system considers to be a possible indication of malware. Using this knowledge, the algorithm can compare the code properties of other files within its database. As such, previously unseen code could be a new file such as a zero-day threat, which can be one of the most deadly types of malware threats.
Zero-day refers to a newly discovered software vulnerability. It’s called zero-day because the developer has just learned of the flaw, and it also means an official patch or update to fix the issue hasn’t yet been released. Developers have had zero days to fix the just exposed problem, and the vendor may fail to release a patch before hackers manage to exploit the security hole. Zero-day attacks can rapidly impact millions of systems around the world.
One of the most infamous examples of a zero-day attack is the Stuxnet virus, which was used to derail the Iranian nuclear program. It was never intended to spread beyond the Iranian nuclear facility at Natanz but spread, it did – and all over the world, due to its extremely sophisticated and aggressive nature.
Stuxnet was first identified in 2010 before the popular advent of Machine Learning in cybersecurity. Had Machine Learning been widely deployed at the time, it would likely have prevented the needless Stuxnet infections of other systems beyond its Iranian target. Despite Stuxnet’s immense coding intelligence, today’s machine learning would have been able to weigh benign code with unknown code traits and reach the conclusion that it was malware.
Machine Learning malware detection is on the front lines of the effort to defend against malware. It identifies and neutralizes zero-day attacks, and is supported by traditional signature-based detection and behavioral analysis. This layered approach to cybersecurity is increasingly important in a constantly evolving cyber threat landscape where new malware and attack methods surface every day and allows cybersecurity teams to be more proactive in preventing threats and responding to active attacks in real-time.