Skip to content

Sparse HITS and SALSA implementations for highly parallel systems.

License

Notifications You must be signed in to change notification settings

CHPS-HITS-SALSA/hits-salsa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

62 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HITS and SALSA

Implementation of the HITS and SALSA classification algorithms on sparse matrix for highly parallel systems.

Build

cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j

Data Generation

  1. First download metadata.json.gz from http://jmcauley.ucsd.edu/data/amazon/links.html

  2. Extract file :
    gunzip metadata.json.gz

  3. Make it JSON readable (clean single quotes, make it array):
    scripts/clean.py metadata.json clean_metadata.json

Nb : Since the metadata.json file contains 9.4 million entries, you can extract a sample using, for example: head -n 10000 metadata.json > sample.json to extract the first 10000 entries