Skip to content

evpirnia/naive-sql-processing

Repository files navigation

Naive-SQL-Processing

https://lipyeow.github.io/ics421s17/morea/queryproc/experience-hw2.html

Using your code from Part 1 as a template, write a program runSQL that executes a given SQL statement on a cluster of computers each running an instance of a DBMS. The input to runSQL consists of two filenames (stored in variables clustercfg and sqlfile) passed in as commandline arguments. The file clustercfg contains access information for the catalog DB. The file sqlfile contains the SQL terminated by a semi-colon to be executed. The runSQL program will execute the same SQL on the database instance of each of the computers on the cluster (that holds data fragments for the table) concurrently using threads. One thread should be spawned for each computer in the cluster. The runSQL programm should output the rows retrieved to the standard output on success or report failure.

Write a program loadCSV that loads data from a comma-separated (CSV) file into a distributed table on the cluster. The program takes two commandline arguments clustercfg and csvfile. The clustercfg file contains access information for the catalog DB, the name of the table to be loaded, and the partitioning information. The csvfile contains the data to be loaded. The catalog should be consulted for access information for the nodes in the cluster. Your program should also update the catalog with the partitioning information. The loader does NOT need to be multi-threaded. You should use a library for parsing CSV instead of writing your own from scratch.

===========================

LOCAL MACHINE (macOS)

Install Homebrew:
$ /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

Install virtualbox & vagrant:
$ brew cask install virtualbox
$ brew cask install vagrant
$ brew cask install vagrant-manager

Install python3 if not already on machine:
$ brew install python3

============================

LOCAL MACHINE (Linux)

Install virtualbox & vagrant:
$ sudo apt install virtualbox
$ sudo apt install vagrant
$ sudo apt install vagrant-manager

Install python3 if not already on machine:
$ sudo apt install python3

==========================

Install PyMySQL if not already on machine:
$ pip3 install PyMySQL

Create a Directory for the Catalog and each Node:
$ mkdir catalog
$ mkdir machine1
$ mkdir machine2

Clone all necessary files from the repo:

  1. test.py
  2. cluster.cfg
  3. sqlfile.sql
  4. csv.py
  5. run.sh

Initialize a virtual machine in each directory:
$ vagrant init ubuntu/xenial64
$ vagrant up
$ vagrant ssh

Change to the /vagrant directory of each virtual machine
$ cd /vagrant

Open Vagrantfile in Each Directory:
Replace line 25 with:
config.vm.network "forwarded_port", guest: 3306, host: 3306, auto_correct: true
Replace line 29 with:
config.vm.network "private_network", ip: "ADDRESS_VALUE"
Note: ADDRESS_VALUE depends on Directory:
/machine2, address_value = localhost_network.20
/machine1, address_value = localhost_network.10
/catalog, address_value = localhost_network.30

Install MySQL in Each Directory:
$ sudo apt-get install mysql-server
Note: Password for MySQL-server: password
$ /usr/bin/mysql_secure_installation
Note: Respond No to everything but Remove test database and access to it, and Reload privilege tables

Run the following command then comment out the bind-address in the catalog and each machine:
$ sudo vim /etc/mysql/mysql.conf.d/mysqld.cnf
bind-address = 127.0.0.1
Note: save and exit vim $ sudo service mysql restart

Connect to MySQL in Each Directory:
$ mysql -u root -p;
Enter password: 'password'

Create a database in Each Directory:
NOTE: If a user already exists then drop it:
mysql> drop user 'username'
Create the database:
mysql> create database TESTDB;

Create Remote Users in Each Directory:
mysql> use TESTDB;
mysql> create user 'username';
mysql> grant all on TESTDB.* to username@'%' identified by 'password';
mysql> exit
$ exit

Create table for machine1 and insert some data:
mysql> use TESTDB;
mysql> create table candy (name char(80), chocolate(4), rating int (2));
mysql> insert into candy values ('Sour Patch Kids', 'No', '5');
mysql> insert into candy values ('Mike n Ikes', 'No', '1');

Create table for machine2 and insert some data:
mysql> use TESTDB;
mysql> create table movies (title char(80), released int(4), rating int (2));
mysql> insert into movies values ('Bad Boys 1', '1995', '4');
mysql> insert into movies values ('Bad Boys 2', '2003', '5');
mysql> insert into movies values ('Split', '2017', '5');

Insert the existing tables into dtables of the catalog machine:
mysql> use TESTDB;
mysql> insert into dtables values ('candy', NULL, 'jdbc:mysql://192.168.10.10:3306/TESTDB', 'evelynp', 'netflix', NULL, '1', NULL, NULL, NULL);
mysql> insert into dtables values ('movies', NULL, 'jdbc:mysql://192.168.10.20:3306/TESTDB', 'blakela', 'hulu', NULL, '2', NULL, NULL, NULL);
Note: The nodeurl has the specific user's ip address (ie. .10 or .20). nodeid, nodeuser, nodepasswd also correspond with the specific user.

Run the script from your local host repo:
$ ./run.sh ./cluster.cfg ./sqlfile.sql