The objective of this project is to analyze and query "News" database inorder to generate log results.
- Install either of the python versions from the URL provided
- Install the VirtualMachine(VM) to run an SQL database server using the tools provided*
- Download the database and unzip it
- Place the unziped file (newsdata.sql) in vagrant folder on local computer
- Bring the VM online
- Load the database by
cd
into thevagrant
directory - Set the path to the newsdata.sql inside the vagrant directory
- Execute the newsdatadb.py file using
python newsdatadb.py
- Log results can be found in results.txt on succesful execution .py file
- Use
CTRL+D
to exit the VM
- newsdatadb.py : Code to analyze and generate log results on "News" DB.
- results.txt : Output file to which analyzed log results are written
- newsdata.sql : News DB containing "articles" , "authors" and "log" tables
- Bring the VM online using
vagrant up
- Log into VM using
vagrant ssh
- Load the database using
psql -d news -f newsdata.sql
import psycopg2
is a module to connect to *** PostgreSQL *** , an open source RDBMS
- Part 1:
CREATE OR REPLACE VIEW popular_article as
SELECT articles.slug as most_popular_articles, COUNT(*) as views
FROM log, articles
WHERE CONCAT('/article/',articles.slug)=log.path
GROUP BY articles.slug
ORDER BY views DESC limit 3;
- Part 2:
CREATE OR REPLACE VIEW popular_article_authors as
SELECT authors.name,count(*) as views
FROM authors, articles,log
WHERE articles.author=authors.id
AND CONCAT('/article/',articles.slug)=log.path
GROUP BY authors.name
ORDER BY views DESC;
- Part 3:
CREATE OR REPLACE VIEW error_log as
SELECT (date(time)) as unique_date, count(*) as error
FROM log
WHERE status like '%4%'
GROUP BY unique_date
ORDER BY error DESC;
CREATE OR REPLACE VIEW total_log as
SELECT (date(time)) as unique_date, count(*) as status
FROM log
GROUP BY unique_date
ORDER BY status DESC;
CREATE OR REPLACE VIEW percent_error_requests as
SELECT total_log.unique_date,
ROUND((100.0/(total_log.status/error_log.error)),2) as error_percent
FROM total_log, error_log
WHERE total_log.unique_date=error_log.unique_date
GROUP BY total_log.unique_date, total_log.status, error_log.error
ORDER BY error_percent DESC;
News database (newsdata.sql) was provided by Udacity
Vagrant and VirtualBox are the tools to install and manage the VM*
Reference "newsdatadb.py" to find the usage of the created VIEWs**