You've been hired onto a team working on a newspaper site. The user-facing newspaper site frontend itself, and the database behind it, are already built and running. You've been asked to build an internal reporting tool that will use information from the database to discover what kind of articles the site's readers like.
In this project, you'll work with data that could have come from a real-world web application, with fields representing information that a web server would record, such as HTTP status codes and URL paths. The web server and the reporting tool both connect to the same database, allowing information to flow from the web server into the report.
- Install Vagrant and VirtualBox, Please check instructions to install the virtual machine
- Download or Clone this repository in the /vagrant directory You must finish step 1 first.
-
The virtual machine from step 1
If you need to bring the virtual machine back online with
$ vagrant up
. Then log into it with$ vagrant ssh
-
Download the data from here
- Unzip this file after downloading it. The file inside is called newsdata.sql.
- To run the reporting tool, you'll need to load the site's data into your local database. To load the data, use the command
psql -d news -f newsdata.sql
- The database includes three tables:
- The
authors
table includes information about the authors of articles. - The
articles
table includes the articles themselves. - The
log
table includes one entry for each time a user has accessed the site.
- The
-
Creating Views:
- Use
psql -d news
to connect to database. - create view
collect
using:CREATE VIEW collect AS SELECT articles.title AS article_title, articles.author AS author_id, authors.name AS author_name, count(log.path) AS views FROM log, articles, authors WHERE log.path LIKE ('%' || articles.slug) AND authors.id = articles.author GROUP BY article_title, author_id, author_name ORDER BY views DESC;
- Use
- From the vagrant directory inside the virtual machine, run
logs_analysis.py
using:$ python3 logs_analysis.py
- Check the output file for the results
output.txt