GitHub - rahulpalamuttam/RCSB-PDB-SparkJava: Java implementation of the PubMedCentral-Spark project which is written in scala. The program mines through journal articles in search of PDB ID's and scores potential PDB IDs.

Directors to Run :

Run Variables
$ARTICLES_DIRECTORY_PATH = the directory where all the PubMed articles are located
$SERIALIZED_ARTICLES_PATH = the directory where the serialized table of articles will be located (if implemented)
$SPARK_MASTER_IP = by default this should be set to local. However when connecting to a cluster set it to the ip address of the master node followed by ":4040"

Standalone mode
1. mvn clean package
2. run : java -jar target/PDB-Finder-Java-1.0-SNAPSHOT.jar $ARTICLES_DIRECTORY_PATH $SERIALIZED_ARTICLES_PATH $SPARK_MASTER_IP PDBID_FalsePositives.csv

CLuster Mode
1. Make sure you have apache spark installed on a multi-node cluster, and is running
2. run spark-submit --class Main target/PDBFinder-Java-1.0-SNAPSHOT.jar $ARTICLES_DIRECTORY_PATH $SERIALIZED_ARTICLES_PATH $SPARK_MASTER_IP PDBID_FalsePositives.csv
3. Additional Options for cluster mode
It is recommended to run with these options in cluster mode

    --conf spark.driver.memory=10g    | Sets the driver memory (the program that launches spark jobs)
    --conf spark.executor.memory=10g  | Sets memory for the node-specific processes that do the bulk processing work
    --conf spark.akka.frameSize=25000 | The max size of objects that are communicated to the driver node (specfically on collect tasks)
    --conf spark.akka.timeout=300     | The amount of time to wait for objects to be transmitted to driver before failing the task

Additional Notes : The following build uses spark 1.1.0, and java 8. I used java 8 for it's support for lambda expressions which make writing map functions much simpler. However I also use the retrolambda plugin, to allow lambda expressions to be compiled into bytecode that is interpreted by earlier versions of the JVM i.e. Java 7. Thus the program can be deployed on clusters supporting java 7 if necessary and java 8.

Due to the use of retro lambda, always run mvn clean package. It will only work with clean builds.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
src/main/java		src/main/java
PDBID_FalsePositives.csv		PDBID_FalsePositives.csv
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src/main/java

src/main/java

PDBID_FalsePositives.csv

PDBID_FalsePositives.csv

README.md

README.md

pom.xml

pom.xml

Repository files navigation

About

Releases

Packages

Languages

rahulpalamuttam/RCSB-PDB-SparkJava

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Languages