LSDM-AMS

Description

Distinct Hashtags

The implementation takes as input the path to the dataset, length of the stream and total number of hash functions divided into hash groups (numHashGroups * numHashFunctionsInGroup). Multiple hash functions are used to improve the estimate of the number of distinct items in the stream based on the approach described in section 4.4.3 of Mining Massive Datasets book.

kth Moments

This implementation estimates the value of the kth moment given the path of the dataset, length of stream, number of random variables to consider and kth moment to be estimated using the AMS streaming algorithm described in section 4.5 of Mining Massive Datasets book.

Build Details

The project is compiled with:

java version 7

Running the script

The parameters for the script are as follows:

Distinct Items

"1" to select estimation of distinct Items
Path to the input dataset
Length of stream (n)
Number of hash groups
Number of hash functions in each hash group

Following is an example:

./run2.sh 1 ebola.json.gz 2000 5 2

Also you can run the jar file directly using:

java -jar ASM.jar ebola.json.gz 2000 5 2

kth Moment

"2" to select estimation of kth moment
Path to the input dataset
Length of stream (n)
Number of random variables
kth moment to estimate

Following is an example:

./run2.sh 2 ebola.json.gz 10000 10 2

References

The space complexity of approximating the frequency moments AMS96
Chapter 4. Mining Data Streams Mining Massive Datasets

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
ASM		ASM
.gitignore		.gitignore
ASM.jar		ASM.jar
ASM_Test.jar		ASM_Test.jar
README.md		README.md
run.sh		run.sh
run2.sh		run2.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ASM

ASM

.gitignore

.gitignore

ASM.jar

ASM.jar

ASM_Test.jar

ASM_Test.jar

README.md

README.md

run.sh

run.sh

run2.sh

run2.sh

Repository files navigation

LSDM-AMS

Description

Distinct Hashtags

kth Moments

Build Details

Running the script

Distinct Items

kth Moment

References

About

Releases

Packages

Languages

ZeonTrevor/LSDM-AMS

Folders and files

Latest commit

History

Repository files navigation

LSDM-AMS

Description

Distinct Hashtags

kth Moments

Build Details

Running the script

Distinct Items

kth Moment

References

About

Topics

Resources

Stars

Watchers

Forks

Languages