Skip to content

LDO-CERT/malwareclustering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 

Repository files navigation

malwareclustering

MalwareClustering with ApiVector

Starting from pure python, it will be shown multiprocessing, numpy, cython, dask, arriving to dask-cuda with cupy: A NumPy-compatible matrix library accelerated by CUDA. The study explored also differents places to store and retrieve data such as Neo4j, MongoDB, PostgreSQL and different data format like strings, numpy vectors and numpy packbits vectors.

As today we got best results using dask-cuda, cupy and zarr.

algorithm

Presentation with benchmark and results is available here: https://ldo-cert.github.io/MISP-Summit-05/#/home

language 1 vs 1 1 vs many many vs many
python x x x
numpy x x x
numexpr x x x
numba x x x
pybind11 x x x
cython x x x
pythran x x x
dask x x x
tensorflow x x x
dask-cuda with cupy x x x
data source size times
Neo4J x x
MongoDB x x
PostgreSQL x x
Zarr x x
data data type size times
ApiScout string x x
numpy vector binary x x
numpy packbits vector binary x x
zarr arrays binary x x

algorithm

About

MalwareClustering with ApiVector

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published