In this new series, we are going to be looking at the Machine Learning research occurring in the malware analysis domain and seeing where the individual pieces of research fit into the “big picture” of malware analysis. Look for new posts every Friday!
Machine learning is about computers learning from data. The classic illustrative example is that of a spam filter. A machine learning based spam filter is shown lots and lots of examples of both spam and not spam. From these examples, the filter “learns” what both spam and regular emails look like and is so able to distinguish between the two. Of course, the details of machine learning are much more complicated than this simple example, but it serves well to illustrate the basic idea.
There are two basic uses for machine learning in malware analysis: automation and knowledge discovery. The spam filter example above is a good example of using machine learning for automation. While it is conceivable that a human could act as a spam filter, a human simply doesn’t scale (not to mention the privacy concerns). Automation allows us to do things on a larger scale than would otherwise be possible. Some more examples of the uses of automation in malware anlaysis are detecting malware, selecting the priority of analysis, and automatic generation of detection signatures.
The second use of machine learning in malware analysis is knowledge discovery. The way a machine learning algorithm “learns” is to construct a model of the concept it is learning based on the data it is examining. What this model looks like can be very informative in and of itself. For example, in one paper researchers used machine learning to learn what “bad” behavior looks like. These learned behavioral patterns can then help guide a human analyst when looking at new malware.
Through this series, we are going to take a look at the various problems that researchers are attempting to or have solved using techniques and concepts from machine learning. Our particular focus will be on showing how solving these problems fits into the big picture of malware analysis. Next week we will take a look at what this “big picture” is.