Skip to content

Analysis of NIPS SQLite database using sqlite3 and Julia

License

Notifications You must be signed in to change notification settings

vnegi10/SQL_data_analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SQL_data_analysis

This repository contains a sample SQLite database (NIPS - Neural Information Processing System papers data) obtained from Kaggle.

Practice queries are present as sql files. The sqlite3 command-line tool can be used to perform SQL queries on the database.

Additionally, the same database can also be analyzed by converting relevant tables to a Julia DataFrame. The relevant steps are discussed in this blog post.

Installation

SQLite command-line tool on Ubuntu

$ sudo apt install sqlite3
$ sqlite3 -version
## 3.37.2 2022-01-06 13:25:41 872ba256cbf61d9290b571c0e6d82a20c224ca3ad82971edc46b29818d5dalt1

Pluto.jl in Julia

using Pkg
Pkg.add("Pluto")
using Pluto
Pluto.run()

Once everthing is set up, clone this repository and open SQL_data_notebook.jl in your Pluto browser window. That's it, you should be good to go!

Examples for using sqlite3

Open a database and check the schema:

sqlite> .open nips_papers.sqlite
sqlite> .schema
CREATE TABLE papers (
    id INTEGER PRIMARY KEY,
    year INTEGER,
    title TEXT,
    event_type TEXT,
    pdf_name TEXT,
    abstract TEXT,
    paper_text TEXT);
CREATE TABLE authors (
    id INTEGER PRIMARY KEY,
    name TEXT);
CREATE TABLE paper_authors (
    id INTEGER PRIMARY KEY,
    paper_id INTEGER,
    author_id INTEGER);
CREATE INDEX paperauthors_paperid_idx ON paper_authors (paper_id);
CREATE INDEX paperauthors_authorid_idx ON paper_authors (author_id);

Select specific rows based on a condition:

sqlite> SELECT papers.year, papers.title
   ...> FROM papers
   ...> WHERE year = 2017
   ...> LIMIT 10;
2017|Wider and Deeper, Cheaper and Faster: Tensorized LSTMs for Sequence Learning
2017|Concentration of Multilinear Functions of the Ising Model with Applications to Network Data
2017|Deep Subspace Clustering Networks
2017|Attentional Pooling for Action Recognition
2017|On the Consistency of Quick Shift
2017|Breaking the Nonsmooth Barrier: A Scalable Parallel Method for Composite Optimization
2017|Dual-Agent GANs for Photorealistic and Identity Preserving Profile Face Synthesis
2017|Dilated Recurrent Neural Networks
2017|Hunt For The Unique, Stable, Sparse And Fast Feature Learning On Graphs
2017|Scalable Generalized Linear Bandits: Online Computation and Hashing

Change mode to table:

sqlite> .mode table
sqlite> .mode
current output mode: table

Perform JOINS while reading commands from a file:

> cat find_authors.sql
SELECT papers.year, papers.title, paper_authors.author_id, authors.name
FROM papers
INNER JOIN paper_authors ON papers.id = paper_authors.paper_id
INNER JOIN authors ON authors.id = paper_authors.author_id
WHERE year == 2012
LIMIT 10;

sqlite> .read find_authors.sql
+------+----------------------------------------------------------------------------------+-----------+--------------------+
| year |                                      title                                       | author_id |        name        |
+------+----------------------------------------------------------------------------------+-----------+--------------------+
| 2012 | Locally Uniform Comparison Image Descriptor                                      | 5507      | Andrew Ziegler     |
| 2012 | Locally Uniform Comparison Image Descriptor                                      | 5508      | Eric Christiansen  |
| 2012 | Locally Uniform Comparison Image Descriptor                                      | 5509      | David Kriegman     |
| 2012 | Locally Uniform Comparison Image Descriptor                                      | 2723      | Serge J. Belongie  |
| 2012 | Learning from Distributions via Support Measure Machines                         | 5654      | Krikamol Muandet   |
| 2012 | Learning from Distributions via Support Measure Machines                         | 1361      | Kenji Fukumizu     |
| 2012 | Learning from Distributions via Support Measure Machines                         | 5655      | Francesco Dinuzzo  |
| 2012 | Learning from Distributions via Support Measure Machines                         | 1472      | Bernhard Sch?lkopf |
| 2012 | Finding Exemplars from Pairwise Dissimilarities via Simultaneous Sparse Recovery | 4951      | Ehsan Elhamifar    |
| 2012 | Finding Exemplars from Pairwise Dissimilarities via Simultaneous Sparse Recovery | 3341      | Guillermo Sapiro   |
+------+----------------------------------------------------------------------------------+-----------+--------------------+

About

Analysis of NIPS SQLite database using sqlite3 and Julia

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages