Skip to content

skyInGitHub/Machine-Learning-Based-Cyber-Attacks-Targeting-on-Controlled-Information-A-Survey

Repository files navigation

Machine Learning Based Cyber Attacks Targeting on Controlled Information: A Survey

Everything about code sources, datasets and data sources mentioned in the survey paper "Machine Learning Based Cyber Attacks Targeting on Controlled Information: A Survey".


  • Research articles by areas
  • Data and code sources used by attacks
    • Stealing controlled user activities information
    • Stealing controlled ML model related information
    • Stealing controlled authentication inforamtion

Research articles by areas


Code source used by attacks

Related paper Code resource Introduction
[2] ProcHarvester This is the Proof-of-Concept implementation of ProcHarvester - a tool fully automated analysis of Procfs side-channel leaks on Android.
[7] Model extraction attacks on Machine-Learning-as-a-Service platforms Python implementation of extraction attacks against Machine Learning models including utilized datasets
[9] Model-Inversion-Attack A TensorFlow Implementation of the Model Inversion Attack
[11] Membership Inference Attack Python code for Membership Inference Attack against Machine Learning Models
[12] Membership Privacy using Adversarial Regularization Python code for Machine Learning with Membership Privacy using Adversarial Regularization with dataset usage
[20] neural_network_cracking Neural Network with passwords. This Python program uses a neural network to guess passwords. This is software used and maintained by students for a research project and likely will have many bugs and issues.

Dataset sources used by attacks


Stealing controlled user activities information

Related paper Data source Introduction
[1] Unlock Pattern Experimental dataset for unlock pattern inference attack enumerated in Appendix of paper [1]
[1][2][3] Google Paly Offical app store for the Android operating system
[3] Alexa Top Website List of top 500 global websites
[3] Moz Top Website List of the top 500 registered domains (∗.example.com) ranked by the number of linking root domains
[6] MPU6500 Sensor MPU-6500 Product Specification including Gyroscope and Accelerometer sensor

Stealing controlled ML model related information

Related paper Data source Introduction
[7][9] GSShappiness Dataset used for model extraction taken from GSS survey
[7] steak Dataset used for model extraction taken from Steak survey
[7][11] Adult(Income) Predict whether income exceeds $50K/yr based on census data. Also known as "Census Income" dataset.
[7][8] Iris A best known database in the pattern recognition literature.
[7] Optical Recognition of Handwritten Digits Data Set Normalized bitmaps of handwritten digits from a preprinted form
[7] Breast Cancer Wisconsin (Original) Original Wisconsin Breast Cancer Database
[7] Mushroom Mushrooms described in terms of physical characteristics; classification: poisonous or edible
[7][8] Diabetes Diabetes patient records were obtained from two sources: an automatic electronic recording device and paper records.
[8] Geographical Original of Music Instances in this dataset contain audio features extracted from 1059 wave files.
[8] UJIIndoorLoc A Multi-Building Multi-Floor indoor localization database to test Indoor Positioning System that rely on WLAN/WiFi fingerprint.
[8] Madelon An artificial dataset, which was part of the NIPS 2003 feature selection challenge.
[8] Bank Marketing Related with direct marketing campaigns (phone calls) of a Portuguese banking institution. The classification goal is to predict if the client will subscribe a term deposit (variable y).
[9] FiveThirtyEight Sharing the data and code behind some of our articles and graphics.
[10][11] MNIST The benchmark dataset of choice in several deep learning applications. It consists of handwritten grayscale images of digits ranging from 0 to 9. Each image is of 32 × 32 pixels and centered.
[10] AT&T dataset of faces a.k.a. Olivetti dataset of faces. Consists of grayscale images of faces of several persons taken in different positions.
[11][12] CIFAR CIFAR-10 and CIFAR-100 are benchmark datasets used to evaluate image recognition algorithms
[11][12] Purchase Acquire aalued shoppers: Predict which shoppers will become repeat buyers. This data captures the process of offering incentives (a.k.a. coupons) to a large number of customers and forecasting those who will become loyal to the product.
[11] Foursquare dataset Set of mobile users’ location “check-ins” in the Foursquare social network, restricted to the Bangkok area and collected from April 2012 to September 2013
[11][12] Texas hospital stays This dataset is based on the Hospital Discharge Data public use files with information about inpatients stays in several health facilities,8 released by the Texas Department of State Health Services from 2006 to 2009.
[13] Criteo Kaggle Display Advertising Challenge: the goal of this challenge is to benchmark the most accurate ML algorithms for click-through rate (CTR) estimation.
[13] Criteo Full A new dataset which is an extended version of the Kaggle click prediction dataset.
[13] MovieLens Available rating data sets from the MovieLens web site (http://movielens.org). The data sets were collected over various periods of time.

Stealing controlled authentication inforamtion

Related paper Data source Introduction
[14] Acceleration dataset The acceleration data are processed to extract the data points relevant to movements between keystrokes
[18] Dodonew Data was obtained from the Chinese website known as Dodonew.com and contained 16M accounts. The data is plaintext.
[18] CSDN leaked password dataset CSDN - web services for programmer - leaked passwords summary
[18][19] Rockyou Rockyou leaked passwords dataset
[18] Rootkit Rootkit.com database leaked by Anonymous Hackers
[18] Yahoo Password Frequency Corpus This dataset includes sanitized password frequency lists collected from Yahoo in May 2011.
[18][20] 000webhost 13 Million Passwords Appear To Have Leaked From This Free Web Host
[20] PGS Training set Password Guessability Service (PGS) used by a research work. This set totals 33 million passwords and 5.9 million natural-language words.
[20] 1class8 Passwords collected for a research study which passwords are longer than eight characters.
[20] 1class16 Passwords collected for a research study which passwords are longer than sixteen characters.
[20] 3class12 Passwords collected for a research study which passwords must contain at least three character classes (uppercase letters, lowercase letters, symbols, digits) and be at least twelve characters long.
[20] 4class8 Passwords collected for a research study which passwords must contain all four character classes and be at least eight characters long.

back

About

Brief description about this survey, and related public data source and tools are listed.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published