Evidence-Data-Generator

Evidence-Data-Generator holds redesigned versions of the utility code used for constructing an evidence dataset from tweets, collected from Twitter, which could be used with various MLN inference tools such as Tuffy

Environment Setup

cd scripts && source setup.sh

Build the Code

sbt publishLocal && sbt clean assembly

Generate Evidence

Update conf/app.config with the twitter keys and keywords.

To collect tweets:

spark-submit --class edu.missouri.CollectTweets target/scala-2.11/Evidence-Data-Generator-assembly-0.1.jar <NO_OF_TWEETS> <TWEETS_OUT_FILE>

Eg:

spark-submit --class edu.missouri.CollectTweets target/scala-2.11/Evidence-Data-Generator-assembly-0.1.jar 1000 /mydata/tweets.json

Note: There could be duplicate tweets, which can be filtered using sort <TWEETS_OUT_FILE> | uniq -u >> <UNIQUE_TWEETS_OUT_FILE>

To construct evidence:

spark-submit --class edu.missouri.GenerateEvidence --driver-memory <DRIVER_MEMORY> target/scala-2.11/Evidence-Data-Generator-assembly-0.1.jar <TWEETS_FILE> <EVIDENCE_OUT_FILE>

Eg:

spark-submit --class edu.missouri.GenerateEvidence --driver-memory 30g target/scala-2.11/Evidence-Data-Generator-assembly-0.1.jar /mydata/tweets.json /mydata/evidence.db

To collect friends and followers:

cd scripts && bash collect.sh -tweets <TWEETS_FILE> -n <NO_OF_THREADS>

Eg:

cd scripts && bash collect.sh -tweets /mydata/tweets.json -n 10

The constructed evidence for friends and followers would be present at data/data_out/evidence.db

References

Praveen Rao, Anas Katib, Charles Kamhoua, Kevin Kwiat, and Laurent Njilla. "Probabilistic Inference on Twitter Data to Discover Suspicious Users and Malicious Content." In the 2nd IEEE International Symposium on Security and Privacy in Social Networks and Big Data (SocialSec 2016), pages 407-414, Nadi, Fiji, December 2016. [PDF] [Code]
Monica Senapati, Laurent Njilla, Praveen Rao. "A Method for Scalable First-Order Rule Learning on Twitter Data." In Proc. of 35th IEEE International Conference on Data Engineering Workshops (ICDEW) , pages 274-277, Macau, China, 2019.[PDF]

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
conf		conf
lib		lib
project		project
scripts		scripts
src/main/scala/edu/missouri		src/main/scala/edu/missouri
.gitignore		.gitignore
README.md		README.md
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

conf

conf

lib

lib

project

project

scripts

scripts

src/main/scala/edu/missouri

src/main/scala/edu/missouri

.gitignore

.gitignore

README.md

README.md

build.sbt

build.sbt

Repository files navigation

Evidence-Data-Generator

Environment Setup

Build the Code

Generate Evidence

References

About

Releases

Packages

Languages

Arun-George-Zachariah/Evidence-Data-Generator

Folders and files

Latest commit

History

Repository files navigation

Evidence-Data-Generator

Environment Setup

Build the Code

Generate Evidence

References

About

Topics

Resources

Stars

Watchers

Forks

Languages