Skip to content

Gowtham1729/Android-App-Malware-Detector

Repository files navigation

Android-App-Malware-Detector

Most of the antivirus in the market use Signature-based malware detection, so they fail in preventing newly developed malware. The purpose of this project is to design a Neural Network that could determine the probability of an Android Application having a malware inside it. Since the neural network is trained on thousands of existing Android applications, it can successfully predict the possibility of any new application having a malware even if the malware is newly developed.

Feature Extraction

Every app project will have an AndroidManifest.xml file (with precisely that name) at the root of the project source set. The manifest file describes essential information about the app to the Android build tools, the Android operating system, and Google Play. AndroidManifest.xml consists of information regarding Permissions requested, Hardware features used, Activities, Services, Broadcast Receivers, Content Provider, Meta data, Providers etc., Most of this information is extracted from the Xml file and used as features for the training the Artificial Neural Network. On the other hand, classes.dex, which is now in the form of .smali, is also used as features. Classes.dex contains all the Api calls of the Android Application. Any Harmful or Potentially Dangerous calls indicate the possibility of malware and hence it is an important feature.

Feature Encoding

The above-mentioned features are extracted from the AndroidManifest.xml and .smali file and converted into a tsv (Tab separated values) file. This csv file is then encoded into another tsv(Tab separated values) file consisting of only numbers. So this can be given as input for the neural network without any problem. With the available list of permissions, intents, API calls etc., 916 features are obtained from a single android application. All the permissions, intents, API calls etc., which are not available in the list are counted as other permissions, other intents, other API calls etc., respectively.

Feature Selection

Among the 916 available features some of the permissions, intents, API calls etc., are commonly used and proved safe. Hence these permissions, intents, API calls etc., are ignored among the input of features for the ANN. After PCA (principal component analysis) and ignoring safe features, total number of features dropped from 916 to 386. This resulted in a greater accuracy and efficiency in predicting the malware in android application.

Neural Network Arhitecture

A four-layered Artificial Neural Network is implemented for the detection of the malware in android application. The input dimension of the first layer of the neural network is 387 due to 387 features. All the other layers of the ANN has the input dimension of 128 and the output dimension of the final layer is one. It outputs either one or zero where, one indicates no malware and zero indicates the presence of a malware. Adam optimizer is used as an optimizer along with ReLU as the activation function. A total of 4986 apps with an equal number of malware and good ware are used for training the neural network.

Layer Output Shape Param

dense_1 (Dense) (None, 128) 49664


dense_2 (Dense) (None, 128) 16512


dense_3 (Dense) (None, 128) 16512


dense_4 (Dense) (None, 1) 129

Total params: 82,817 Trainable params: 82,817

Results

An accuracy of 82.68 % is obtained with the neural network architecture mentioned above. Supporting Vector Machine has given an accuracy of 62.359% and finally random Forest algorithm has given an accuracy of 76.5%.