Machine Learning Based Cyber Attacks Targeting on Controlled Information: A Survey

Everything about code sources, datasets and data sources mentioned in the survey paper "Machine Learning Based Cyber Attacks Targeting on Controlled Information: A Survey".

[1] Stealing controlled user activities using kernel data - attack with timing analysis: No pardon for the interruption: New inference attacks on android through interrupt timing analysis (S&P, 2016)
[2] Stealing controlled user activities using kernel data - attack with timing analysis: ProcHarvester: Fully Automated Analysis of Procfs Side-Channel Leaks on Android (ACM Asia CCS, 2018)
[3] Stealing controlled user activities using kernel data - iOS side-channel attack: OS-level Side Channels without Procfs: Exploring Cross-App Information Leakage on iOS (NDSS, 2018)
[4] Stealing controlled user activities using kernel data - protect using privacy mechanism: Mitigating Storage Side Channels Using Statistical Privacy Mechanisms (ACM CCS, 2015)
[5] Stealing controlled user activities using sensor data - sensor-based attack: Leave Your Phone at the Door: Side Channels that Reveal Factory Floor Secrets (ACM CCS, 2016)
[6] Stealing controlled user activities using sensor data - protect using context-aware sensor-based detector: 6thSense: A Context-aware Sensor-based Attack Detector for Smart Devices (USENIX, 2017)
[7] Stealing controlled ML model description - stealing parameters attack: Stealing Machine Learning Models via Prediction APIs (USENIX, 2016)
[8] Stealing controlled ML model description - stealing hyperparameters attack: Stealing Hyperparameters in Machine Learning (S&P, 2018)
[9] Stealing controlled ML model's training data - model inversion attack & defence: Model inversion attacks that exploit confidence information and basic countermeasures (ACM CCS, 2015)
[10] Stealing controlled ML model's training data - the GAN attack: Deep Models Under the GAN: Information Leakage from Collaborative Deep Learning (ACM CCS, 2017)
[11] Stealing controlled ML model's training data - membership inference attack: Membership Inference Attacks Against Machine Learning Models (S&P, 2017)
[12] Stealing controlled ML model's training data - protect with adversarial regularization: Machine Learning with Membership Privacy using Adversarial Regularization (ACM CCS, 2018)
[13] Stealing controlled ML model's training data - protect with count featurization: Pyramid: Enhancing Selectivity in Big Data Protection with Count Featurization (S&P, 2017)
[14] Stealing controlled keystroke data for authentication - keystroke inference attack: When Good Becomes Evil: Keystroke Inference with Smartwatch (ACM CCS, 2015)
[15] Stealing controlled keystroke data for authentication - video-assisted keystroke inference attack: VISIBLE: Video-Assisted Keystroke Inference from Tablet Backside Motion (NDSS, 2016)
[16] Stealing controlled secret keys for authentication - attack with TLB cache data: Translation Leak-aside Buffer: Defeating Cache Side-channel Protections with TLB Attacks (USENIX, 2018)
[17] Stealing controlled secret keys for authentication - attack & protect with CPU cache data: A software approach to defeating side channels in last-level caches (ACM CCS, 2016)
[18] Stealing controlled password data for authentication - online password guessing attack: Targeted Online Password Guessing: An Underestimated Threat (ACM CCS, 2016)
[19] Stealing controlled password data for authentication - attack with semantic pattern analysis: On the semantic patterns of passwords and their security impact (NDSS, 2014)
[20] Stealing controlled password data for authentication - protect with modeling password guessability: Fast, Lean, and Accurate: Modeling Password Guessability Using Neural Networks (USENIX, 2016)

Code source used by attacks

Related paper	Code resource	Introduction
[2]	ProcHarvester	This is the Proof-of-Concept implementation of ProcHarvester - a tool fully automated analysis of Procfs side-channel leaks on Android.
[7]	Model extraction attacks on Machine-Learning-as-a-Service platforms	Python implementation of extraction attacks against Machine Learning models including utilized datasets
[9]	Model-Inversion-Attack	A TensorFlow Implementation of the Model Inversion Attack
[11]	Membership Inference Attack	Python code for Membership Inference Attack against Machine Learning Models
[12]	Membership Privacy using Adversarial Regularization	Python code for Machine Learning with Membership Privacy using Adversarial Regularization with dataset usage
[20]	neural_network_cracking	Neural Network with passwords. This Python program uses a neural network to guess passwords. This is software used and maintained by students for a research project and likely will have many bugs and issues.

Dataset sources used by attacks

Stealing controlled user activities information

Related paper	Data source	Introduction
[1]	Unlock Pattern	Experimental dataset for unlock pattern inference attack enumerated in Appendix of paper [1]
[1][2][3]	Google Paly	Offical app store for the Android operating system
[3]	Alexa Top Website	List of top 500 global websites
[3]	Moz Top Website	List of the top 500 registered domains (∗.example.com) ranked by the number of linking root domains
[6]	MPU6500 Sensor	MPU-6500 Product Specification including Gyroscope and Accelerometer sensor

Stealing controlled ML model related information

Related paper	Data source	Introduction
[7][9]	GSShappiness	Dataset used for model extraction taken from GSS survey
[7]	steak	Dataset used for model extraction taken from Steak survey
[7][11]	Adult(Income)	Predict whether income exceeds $50K/yr based on census data. Also known as "Census Income" dataset.
[7][8]	Iris	A best known database in the pattern recognition literature.
[7]	Optical Recognition of Handwritten Digits Data Set	Normalized bitmaps of handwritten digits from a preprinted form
[7]	Breast Cancer Wisconsin (Original)	Original Wisconsin Breast Cancer Database
[7]	Mushroom	Mushrooms described in terms of physical characteristics; classification: poisonous or edible
[7][8]	Diabetes	Diabetes patient records were obtained from two sources: an automatic electronic recording device and paper records.
[8]	Geographical Original of Music	Instances in this dataset contain audio features extracted from 1059 wave files.
[8]	UJIIndoorLoc	A Multi-Building Multi-Floor indoor localization database to test Indoor Positioning System that rely on WLAN/WiFi fingerprint.
[8]	Madelon	An artificial dataset, which was part of the NIPS 2003 feature selection challenge.
[8]	Bank Marketing	Related with direct marketing campaigns (phone calls) of a Portuguese banking institution. The classification goal is to predict if the client will subscribe a term deposit (variable y).
[9]	FiveThirtyEight	Sharing the data and code behind some of our articles and graphics.
[10][11]	MNIST	The benchmark dataset of choice in several deep learning applications. It consists of handwritten grayscale images of digits ranging from 0 to 9. Each image is of 32 × 32 pixels and centered.
[10]	AT&T dataset of faces	a.k.a. Olivetti dataset of faces. Consists of grayscale images of faces of several persons taken in different positions.
[11][12]	CIFAR	CIFAR-10 and CIFAR-100 are benchmark datasets used to evaluate image recognition algorithms
[11][12]	Purchase	Acquire aalued shoppers: Predict which shoppers will become repeat buyers. This data captures the process of offering incentives (a.k.a. coupons) to a large number of customers and forecasting those who will become loyal to the product.
[11]	Foursquare dataset	Set of mobile users’ location “check-ins” in the Foursquare social network, restricted to the Bangkok area and collected from April 2012 to September 2013
[11][12]	Texas hospital stays	This dataset is based on the Hospital Discharge Data public use files with information about inpatients stays in several health facilities,8 released by the Texas Department of State Health Services from 2006 to 2009.
[13]	Criteo Kaggle	Display Advertising Challenge: the goal of this challenge is to benchmark the most accurate ML algorithms for click-through rate (CTR) estimation.
[13]	Criteo Full	A new dataset which is an extended version of the Kaggle click prediction dataset.
[13]	MovieLens	Available rating data sets from the MovieLens web site (http://movielens.org). The data sets were collected over various periods of time.

Stealing controlled authentication inforamtion

Related paper	Data source	Introduction
[14]	Acceleration dataset	The acceleration data are processed to extract the data points relevant to movements between keystrokes
[18]	Dodonew	Data was obtained from the Chinese website known as Dodonew.com and contained 16M accounts. The data is plaintext.
[18]	CSDN leaked password dataset	CSDN - web services for programmer - leaked passwords summary
[18][19]	Rockyou	Rockyou leaked passwords dataset
[18]	Rootkit	Rootkit.com database leaked by Anonymous Hackers
[18]	Yahoo Password Frequency Corpus	This dataset includes sanitized password frequency lists collected from Yahoo in May 2011.
[18][20]	000webhost	13 Million Passwords Appear To Have Leaked From This Free Web Host
[20]	PGS Training set	Password Guessability Service (PGS) used by a research work. This set totals 33 million passwords and 5.9 million natural-language words.
[20]	1class8	Passwords collected for a research study which passwords are longer than eight characters.
[20]	1class16	Passwords collected for a research study which passwords are longer than sixteen characters.
[20]	3class12	Passwords collected for a research study which passwords must contain at least three character classes (uppercase letters, lowercase letters, symbols, digits) and be at least twelve characters long.
[20]	4class8	Passwords collected for a research study which passwords must contain all four character classes and be at least eight characters long.

back

Name		Name	Last commit message	Last commit date
Latest commit History 87 Commits
README.md		README.md
_config.yml		_config.yml
authentication.md		authentication.md
index.md		index.md
ml_model_related.md		ml_model_related.md
paper.md		paper.md
user_activities.md		user_activities.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

_config.yml

_config.yml

authentication.md

authentication.md

index.md

index.md

ml_model_related.md

ml_model_related.md

paper.md

paper.md

user_activities.md

user_activities.md

Repository files navigation

Machine Learning Based Cyber Attacks Targeting on Controlled Information: A Survey

Table of contents

Research articles by areas

Code source used by attacks

Dataset sources used by attacks

Stealing controlled user activities information

Stealing controlled ML model related information

Stealing controlled authentication inforamtion

About

Releases

Packages

skyInGitHub/Machine-Learning-Based-Cyber-Attacks-Targeting-on-Controlled-Information-A-Survey

Folders and files

Latest commit

History

Repository files navigation

Machine Learning Based Cyber Attacks Targeting on Controlled Information: A Survey

Research articles by areas

Code source used by attacks

Dataset sources used by attacks

Stealing controlled user activities information

Stealing controlled ML model related information

Stealing controlled authentication inforamtion

About

Topics

Resources

Stars

Watchers

Forks