Skip to content

maryam-tanha/AndroidMalwareDetection-TemporalBias-FeatureExtraction

Repository files navigation

Revisiting Temporal Inconsistency and Feature Extraction for Android Malware Detection

This repo includes code and data for our paper "Revisiting Temporal Inconsistency and Feature Extraction for Android Malware Detection" submitted to IEEE CCECE 2024 conference.The input/output experiments can be downloaded from our shared [Google Drive folder] (https://drive.google.com/drive/folders/1BNkDPjbc2yRhr9mbOuueGT5iOygnMafd?usp=sharing).

The growing popularity and ease of access have turned Android applications into prime targets for malicious attackers. Within the security research community, machine learning has become an essential instrument for conducting Android malware detection and analysis. However, there are potential threats to validity of existing studies, mainly resulting from their used datasets. One of the primary issues is temporal inconsistency (also called temporal bias) that is caused by incorrect time splits of training and testing sets or using imprecise indicators for release time of apps. This paper investigates the use of Google Play Store upload time of an app as a precise indicator of its release time to address temporal bias in machine learning based Android malware detection. Using this approach is made possible by AndroZoo’s December 2023 data release. Through a three-layer filtering process, we demonstrate the unreliability of the commonly used dex date as the release time of an app and propose a more accurate approach for creating temporally consistent datasets based on an app’s upload time. Additionally, we have open-sourced our data and feature extraction process for Android malware analysis, supporting both server-side and on-device extraction, to enhance research reproducibility and facilitate community access.

Directory Structure:

1. DataCollection

  • Code to extract the upload date of a Google Play app and use it as its release time.
  • Analysis of dex_date vs. upload date as the estimators of app's release time. We confirm that dex_date is not a reliable indicator of app's release time.
  • Contains code related to data collection for Android malware analysis.
  • Particularly, we provide code and instructions on how we downloaded the malware and benign android APKs.
  • Note: We got all our malware APKs from VirusShare and benign APKs from Androzoo.

2. FeatureExtraction

  • Code and resources for extracting features from Android samples.
  • We provide 3 alternatives to doing feature extraction - Local IDE, Google Collab, and Java Mobile App
  • Each alternative extracts the features into 4 different files which are named as:
    • apps's sha256 hash-permissions.txt
    • apps's sha256 hash-api-calls.txt
    • apps's sha256 hash-hw-sw.txt
    • apps's sha256 hash-intent-action.txt

3. FeatureVectorCreation

  • Code and resources for building input datasets using different set of features.

5. README.md

  • This file you are currently reading, providing an overview of the repository and its contents.

Contribution

Contributions to improve the functionality, documentation, or any other aspect of this project are welcome. If you have suggestions, bug reports, or want to contribute code, feel free to create an issue or pull request.

License

This project is licensed under the MIT License.

Acknowledgments

Special thanks to contributors for their valuable contributions to this project.

Happy Analyzing!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published