There are several ways to analyze Android applications for suspicious behavior. These are typically categorized as static or dynamic analysis. Static analysis evaluates code without executing it while dynamic analysis tests the behavior of code during execution. This article will discuss current dynamic analysis techniques for Android applications and the open problems associated with them.
A significant challenge for dynamic analysis is input generation. Many malicious applications initiate suspicious activity without regard for user interaction, but more stealthy attacks could wait for a user to perform a predefined action before activating. To detect this type of malware or to thoroughly test an application, we need systems that can simulate human input.
These systems are evaluated by the amount of code coverage they achieve. Code coverage refers to the amount of source code that has been executed in a given program. For example a program that has been tested with 100% code coverage will have executed every piece of code in the program. This does not mean the test has considered every possible value for each variable, but code coverage is still a useful metric for input generation.
The Monkey is a system that helps developers test applications by producing pseudo-random input during program execution. The Monkey does not simulate system events such as receiving a phone call. Just as malware could wait for a user action, it can also respond to system events.
Dynodroid (MacHiry, Tahiliani, Naik,) is a more efficient and comprehensive system for input generation. Dynodroid, human testers, and The Monkey achieved code coverage of 55%, 60%, and 53% respectively. The Monkey also generated 20 times more events than Dynodroid suggesting that input generated by The Monkey is often redundant. Dynodroid simulates system events and allows a human to assist with input generation during testing. It uses an observer to determine which events are relevant. This observer can prevent wasted input by insuring that acceptable input is generated in relevant locations. Dynodroid also implements a selector that chooses relevant input and attempts to mitigate redundant events.
There are still several challenges for input generation. While Dynodroid is an improvement over the Monkey it is less effective than human testing, and none of them are close to achieving 100% code coverage. Dynodroid must be run in an emulator, but many applications are designed to resist analysis by behaving differently inside emulators (Raffetseder, Kruegel, Kirda, 2007). It is very difficult to play games with automatic input. Humans can follow instructions and solve puzzles, but current input generation systems cannot. An infected game could resist dynamic analysis by waiting for the user to beat the first level before doing anything malicious.
Some dynamic analysis tools are designed to run alongside untrusted applications during real-time use. These can report to the user any time suspicious behavior is detected, but there is much at risk when the applications are running on devices containing personal information and credentials. Real-time monitoring systems can also reduce the performance of an Android device by requiring computational resources. These required resources are referred to as overhead.
TaintDroid taint tracking tool that simulatneously tracks multiple sources of sensitive data. Private data is considered tainted and anything that reads this data is also considered tainted. All tainted data is tracked. If such data leaves the Android device the user is provided with a report logging the data that was leaked, where it was sent, and which application leaked it.
Aurasium (Xu, Saïdi, Anderson, 2012) was discussed as a benign use of repackaging in a previous article. However, its true purpose is real-time monitoring or untrusted applications. By repackaging an application, Aurasium can form a parasitic bond with the application. The Aurasium components of the repackaged application drain computational resources, and they closely observe the application’s behavior. If the application performs potentially malicious activity, Aurasium will stop the activity and ask for the users permission before continuing. For example, if an application tries to send an SMS message, Aurasium will provide the message contents and destination before asking for permission to send it. This prevents the application from harming the user and allows the user to know which applications need to be removed from the device. One could even run a malicious application safely if it is never given permission to do anything harmful. Figure 1 shows a notification provided by Aurasium.
Figure 1: Aurasium asking the user’s permission to allow an application to send an SMS message
There are a number of open problems in real-time monitoring of applications. Reducing overhead of analysis systems is a significant concern. Overhead reduces battery life and could make it impractical to run resource intensive applications such as video games. Keeping use of these tools simple and secure is another concern. TaintDroid requires rooting one’s device to install a customized operating system. Many users would be intimidated by such a task or may not trust the new operating system itself. Aurasium requires users to accept applications from sources other than the official Android market. This is required when Aurasium returns repackaged applications to the user. Many users consider accepting third party application an insecure practice. Aurasium also requires that applications be easy to repackage. Solutions to malicious repackaging such as tamper-proofing could make it impractical for Aurasium to repackage applications.
A sandbox is a protected area for running untrusted applications that does not allow malicious behavior to influence anything outside of the protected area. For example, an isolated virtual machine can act as a sandbox for testing malicious applications. Researchers can observe malicious activity without risking infection of other devices or leaking sensitive information.
DroidBox runs Android applications in a sandbox and reports any suspicious activities. Figure 2 provides a chart created by DroidBox on the activity of an application over time. By comparing expected activity to recorded, suspicious activity one can often identify malicious intentions in an application.
Figure 2: Suspicious Activity Over Time
Image Source : http://code.google.com/p/droidbox/
Andrubis is a similar tool based on sandbox testing. It uses a combination of analysis tools including DroidBox and TaintDroid. Andrubis is an extension of the tool Anubis and performs a similar function. Users can upload an application for testing, and Andrubis will use static and dynamic analysis techniques to generate a report about the application.
Google uses an application verification system called Bouncer. It uses a sandbox to test applications submitted to the official Android market for malicious activity. Unfortunately, there are several ways for malicious software to evade Google Bouncer’s detection. A few of these methods are demonstrated by Oberheide and Miller in a humorous yet thorough presentation (Oberheide, Miller, 2012). Overall they use three strategies to avoid detection or mitigate its consequences. Prepaid phones, prepaid credit cards, and Amazon EC2 micros were used to create inexpensive, anonymous developer accounts in case of detection. They realized that Google Bouncer was only running each program for 5 minutes, so applications that delayed malicious activity for more than 5 minutes were not detected. This is referred to as a delayed execution attack. Lastly, they proposed multiple strategies for detecting the Bouncer environment. If their malicious application detected the Bouncer environment it would remain benign. Outside of Bouncer the malicious application would not restrain itself.
As Oberheide and Miller demonstrated there are many ways to circumvent sandbox testing. These also constitute open problems in the field. Is it possible to make an emulator with an environment indistinguishable from a physical device’s? How can you detect delayed execution attacks with a minimal increase in resources required for testing? Many sandbox testing systems must generate user input to thoroughly test applications, but as discussed earlier input generation is a difficult problem itself.