Thanks for your interests.

Because I focus on collaborative learning between multi-camera networks and don't have time to organize recent research works on AI systems, I will not maintain this github. Awesome-System-for-Machine-Learning maintained by HuaizhengZhang is an comprehensive list of rencent works on AI systems, especially on distributed machine learning systems. Research notes and codes for AI-Systems.

AI-Systems

As discussed in [1, 2], there are three system-level concerns in real-world AI applications. They are deployment, cost and accessibility.

[1] Stoica et al. A Berkeley View of Systems Challenges for AI.
[2] Ratner et al. MLSys: The New Frontier of Machine Learning Systems.

Deployment Concerns

Deployment concerns include robustness to adversarial influences or other spurious factors; safety more broadly considered; privacy and security, especially as sensitive data is increasingly used; interpretability, as is increasingly both legally and operationally required; fairness, as ML algorithms begin to have major effects on our everyday lives; and many other similar concerns.

Popular approaches (todo, summary)

Video

Cryptography for Safe Machine Learning. In MLSys'20.: Shafi Goldwasser presented some techniques about cryptography in machine learning.

Paper

Telekine: Secure Computing with Cloud GPUs. In NSDI'20.: they aim to solve some concerns about privacy in the recent GPU trusted execution environments (TEE).
Themis: Fair and Efficient GPU Cluster Scheduling. In NSDI'20.: a fair GPU scheduling algorithm. further reading
Federated Optimization in Heterogeneous Networks. In MLSys'20.: they proposed a framework named FedProx to tackle heterogeneity in federated networks. Traditional feaderated learning frameworks targeted to solve problems about privacy in machine learning but suffered from challenges about systems heterogeneity and statistical heterogeneity (non-identical distributions).
FLEET: Flexible Eﬃcient Ensemble Training for Heterogeneous Deep Neural Networks. In MLSys'20.: to handle poor performance of data sharing strategy in a heterogenous set of DNNs, authors intro duced a ﬂexible ensemble DNN training framework named FLEET.
What is the State of Neural Network Pruning? In MLSys'20.: authors prop osed an open-source framework named ShrinkBench to evaluate pruning methods.
Attention-based Learning for Missing Data Imputation in HoloClean. In MLSys'20.: they utilized attention mechanism to analysis and interpret missing data imputation problem.
A Systematic Methodology for Analysis of Deep Learning Hardware and Software Platforms. In MLSys'20.: an evaluation tool for deep learning hardware and software platforms.
MLPerf Training Benchmark. In MLSys'20.: a machine learning benchmark for training tasks.

Cost

Cost on annotation, computation, latency and power.

Popular approaches (todo, summary)

Video

Theory & Systems for Weak Supervision. In MLSys'20.: Christopher Ré highlighted the importance of data in the real world deployments because we aren't usually able to get enough high quality labels for large training data. To bride this gap, he introduced many works from theoretical analysis to real world deployment about weak supervised learning, which only learns from noisy weakly labels. Also, He introduced Snorkel which is a popular weak supervised framework developed by Brown University.

Paper

Improving Resource Efficiency of Deep Activity Recognition via Redundancy Reduction. In HotMobile'20.: they target to reduce cost on computation and memory of deep human activity recognition (HAR) models. Note
Distributed Hierarchical GPU Parameter Server for Massive Scale Deep Learning Ads Systems. In MLSys'20.: they propsoed a distributed GPU hierarchical parameter server to fit terabyte-scale parameters for massive scale deep learnning ads systems.
Resource Elasticity in Distributed Deep Learning. In MLSys'20.: To relieve the hard assumptions about fixed resource allocation through the lifetime of the job in distributed training, they designed and implemented the first autoscaling engine for these workloads.
SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems. In MLSys'20.: they prop osed Sub-Linear Deep Learning Engine named SLIDE to handle fast training on large datasets and eﬃcient utilization on the current hardware. This engine blends smart randomized algorithms with multicore parallelism and workload optimization.
Breaking the Memory Wall with Optimal Tensor Rematerialization. In MLSys'20.: they prop osed a new system to accelerate training under a memory-constraint environment.
SkyNet: a Hardware-Efficient Method for Object Detection and Tracking on Embedded Systems. In MLSys'20.: authors proposed SkyNet, a hardware-efficient method to deliver the state-of-the-art detection accuracy and speed for embedded systems.
Fine-Grained GPU Sharing Primitives for Deep Learning Applications. In MLSys'20.: they identified the importance of fine-grained GPU sharing methods when multiple DL workloads accessed the same GPU but they only tested some simple scheduling algorithms (FIFO, SRTF, PACK and FAIR). From my perspective, scheduling methods can be customized for specific applications because different context help us design or implement the most suitable scheduling algorithms. Note
Improving the Accuracy, Scalability, and Performance of Graph Neural Networks with Roc. In MLSys'20.: they proposed a distributed multi-GPU framework for fast GNN training and inference on graphs.
OPTIMUS: OPTImized matrix MUltiplication Structure for Transformer neural network accelerator. In MLSys'20.: they introduced an eﬃcient inference accelerator for transformer network to improve resource utilization on hardware.
PoET-BiN: Power Eﬃcient Tiny Binary Neurons. In MLSys'20.: authors proposed a Look-up Table based power eﬃcient implementation on resource-constrained embedding devices.
Memory-Driven Mixed Low Precision Quantization for Enabling Deep Network Inference on Microcontrollers. In MLSys'20.: they presented a novel end-to-end methodology for enabling the deployment of high-accuracy deep networks on microcontrollers through mixed low-bitwidth compression and integer-only operations.
Trained Quantization Thresholds for Accurate and Eﬃcient Fixed-Point Inference of Deep Neural Networks. In MLSys'20.: authors presented a new eﬃcient method for quantization.
Riptide: Fast End-to-End Binarized Neural Networks. In MLSys'20.: they proposed a scheduled library for binarized linear algebra operations based on their analysis on the underlying challenges on binarized neural networks.
Searching for Winograd-aware Quantized Networks. In MLSys'20.: a new quantized network.
Blink: Fast and Generic Collectives for Distributed ML. In MLSys'20.: authors introduced Blink, a collective communication library that dynamically generates optimal communication primitives by packing spanning trees.
MotherNets: Rapid Deep Ensemble Learning. In MLSys'20.: to overcome large demanding resources when training deep ensemble networks, they proposed MotherNets to reduce training cost.
Willump: A Statistically-Aware End-to-end Optimizer for Machine Learning Inference. In MLSys'20.: compared with many optimizers for ML inference, they proposed Willump, an end2end optimizer for machine learning inference pipelines built on two novel optimizations: cascade feature computations and approximating top-K queries. Note
Server-Driven Video Streaming for Deep Learning Inference. In SIGCOMM'20.: they presented a new video streaming protocol to reduce cost of current video streaming systems.
Reducto: On-Camera Filtering for Resource-Efficient Real-Time Video Analytics. In SIGCOMM'20.: they proposed a novel frame filtering approach to trade-off resource usage and accuracy of real-time video analytics.

Accessibility

Accessibility to developers and organizations without PhD-level machine learning and systems expertise. From my perspective, most of distributed training works belong to this area because optimizing distributed learning tools could help developers deploy their machine learning algorithms fast.

Popular approaches (todo, summary)

Paper

A System for Massively Parallel Hyperparameter Tuning. In MLSys'20.: they proposed a new hyperparameter optimization algorithm named ASHA to solve large-scale hyperparameter optimization problems in distributed training.
PLink: Discovering and Exploiting Locality for Accelerated Distributed Training on the public Cloud. In MLSys'20.: they introduced a new optimized communication library called PLink to speed up distributed training in public cloud.
BPPSA: Scaling Back-propagation by Parallel Scan Algorithm. In MLSys'20.: they reformulated the commonly used back-propagation (BP) algorithm into a scan operation to handle the limitation of BP in a parallel computing environment.
MNN: A Universal and Eﬃcient Inference Engine. In MLSys'20.: they proposed Mobile Neural Network (MNN), a universal and efficient inference engine tailored to mobile applications.

Useful external Resources

Books for Deep Learning (a popular learning approaches in machine learning)

Course

(UW)CSE 599W: Systems for ML: Low-level optimization in Deep Learning frameworks.
(UCB)AI-Sys: Machine Learning Systems: a general course for AI systems.
(UMich)EECS 598: Systems for AI (W'20): a general course for AI systems.

Conference

Tools

TVM: End to End Deep Learning Compiler Stack

Name		Name	Last commit message	Last commit date
Latest commit History 110 Commits
Images		Images
Notes		Notes
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Images

Images

Notes

Notes

README.md

README.md

Repository files navigation

Thanks for your interests.

AI-Systems

Deployment Concerns

Popular approaches (todo, summary)

Video

Paper

Cost

Popular approaches (todo, summary)

Video

Paper

Accessibility

Popular approaches (todo, summary)

Paper

Useful external Resources

Books for Deep Learning (a popular learning approaches in machine learning)

Course

Conference

Tools

About

Releases

Packages

Jason-cs18/Awesome-AI-Systems

Folders and files

Latest commit

History

Repository files navigation

Thanks for your interests.

AI-Systems

Deployment Concerns

Popular approaches (todo, summary)

Video

Paper

Cost

Popular approaches (todo, summary)

Video

Paper

Accessibility

Popular approaches (todo, summary)

Paper

Useful external Resources

Books for Deep Learning (a popular learning approaches in machine learning)

Course

Conference

Tools

About

Topics

Resources

Stars

Watchers

Forks