Skip to content

Rajeshwari-Rudra/rr-shell-data-processing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

rr-shell-data-processing

Big data - shell data processing using Powershell and GitBash commands

Powershell

  • mkdir - creates a new directory
  • cd - changes the directory
  • cd .. - changes to root directory
  • ni - creates new item
  • rm - removes an item
  • ls - list the items
  • ALT SPACE C -To close the window

GitBash

  • git clone repoUrl - clones the cloud repo to local machine
  • git pull origin branchName - pulls the fresh code from repo
  • git add remote origin repoUrl - adds remote folders to cloud repo
  • git add . - adds the files
  • git commit -m "initial commit" - commits the repo with message
  • git push origin branchName - pushes the changes to that specific branchName
  • cat - concatenate files and print on the standard output
  • head -10 filename.txt - displays the top 10 lines of file
  • tail -2 filename.txt - displays last 2 lines of file

Creating a project

  • Start a new project, Right click on folder and select "Open PowerShell window here as administrator".
  • Create a new subfolder by running a command "mkdir rr-shell-data-processing" where rr-shell-data-processing is subfolder.
  • Change directory to your subfolder by cd "rr-shell-data-processing".
  • Make an empty new items named as "README.md" and as ".gitignore" using command "ni README.md" and "ni .gitignore".
  • Find an interesting web page (http://shakespeare.mit.edu/julius_caesar/full.html (Links to an external site.)) and copy it.
  • Use curl to return the page text. Hint: curl "http://shakespeare.mit.edu/julius_caesar/full.html"
  • Commands to return the next text by using curl
   curl "http://shakespeare.mit.edu/julius_caesar/full.html" -O "data.txt"

To request content from an HTTPS url,use the command:

[Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12
  • Then close the window in PowerShell.

To process the text data using Bash commands

  • Transform each space ' ' into a return character '\12' (aka ASCII line feed) [2]
  tr ' ' '\12' < data.txt
  • Functionally, this "flat maps" each line into individual words. Pipe the output to sort (send the results of one command as input into another command)
tr ' ' '\12' < data.txt | sort
  • Pipe the sorted output to uniq -c to count
 tr ' ' '\12' < data.txt | sort | uniq -c
  • Pipe the reduced output to sort with -nr flag
tr ' ' '\12' < data.txt | sort | uniq -c | sort -nr
  • To redirect the output to result.txt
tr ' ' '\12' < data.txt | sort | uniq -c | sort -nr > result.txt

Text files:-

About

Big data - shell data processing using Powershell and GitBash commands

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published