-
Notifications
You must be signed in to change notification settings - Fork 12
Wrangler
We have created an allocation on TACC's Wrangler data system(https://www.tacc.utexas.edu/systems/wrangler). Wrangler includes 500TB of high speed flash storage attached to 96 24-core, 128GB RAM compute nodes. Optimized versions of popular data anatlytic tools are pre-installed, including R, Python and Hadoop.
Please see David Walling in conf1 for getting access to this system.
We have pre-staged datasets related to this hackathon at the following location: /data/shared/zika
c252-101.wrangler(20)# du -ksh /data/shared/zika/* 26G /data/shared/zika/austin_aerial 23M /data/shared/zika/github 47G /data/shared/zika/pubmed
In addition to the data available in github, we have included a collection of aerial photography images of the Austin area, as well as a download of the open access subset from PubMed.
For this hackathon, we have created a 10 node Hadoop cluster available under the reservation id: hadoop+Zika+1487
In order to submit jobs to this cluster, you must:
- create a TACC account.
- see David Walling to get added to the project allocation.
- ssh to wrangler: $> ssh username@wrangler.tacc.utexas.edu
- create an interactive session: $> idev -r hadoop+Zika+1487
- interact with the cluster from the commandline: Ex. hadoop fs -ls /tmp/zika
Rstudio, Jupyter and general VNC sessions are avaiable to the Wrangler compute nodes from our visualization portal: http://vis.tacc.utexas.edu
After logging into the portal, select Wrangler under the 'Jobs' tab and follow the prompts for launching your sessions.