Skip to content

dainantonio/Investigating-a-TMDB-5000-Movie-Dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Image of numpypandasmatplotlib

Project Overview

In this project, I analyzed a dataset and then communicated my findings about it. I used the Python libraries NumPy, pandas, and Matplotlib to make the analysis easier. This data set contains information about 10,000 movies collected from The Movie Database (TMDb), including user ratings and revenue. Certain columns, like ‘cast’ and ‘genres’, contain multiple values separated by pipe (|) characters. The final two columns ending with “_adj” show the budget and revenue of the associated movie in terms of 2010 dollars, accounting for inflation over time.

It is recommended to install Anaconda, which comes with all of the necessary packages, as well as IPython notebook.

I recommend installing Anaconda, which comes with all of the necessary packages, as well as IPython notebook. You can find installation instructions here.

What do I need to install? You will need an installation of Python, plus the following libraries:

  • pandas
  • NumPy
  • Matplotlib
  • csv

What questions I explored?

  1. What is the number of movies released each year?
  2. How long are the movies?
  3. What is the average budget and revenue for the movies?
  4. What is the relationship between movie budget and revenue?
  5. How many movies in each genre?
  6. What is the most profitable movie?

What have I learned?

After completing the project, I learned the following :

  • All the steps involved in a typical data analysis process
  • Being comfortable posing questions that can be answered with a given dataset and then answering those questions
  • Knowing how to investigate problems in a dataset and wrangle the data into a format I can use
  • Having practice communicating the results of my analysis
  • Being able to use vectorized operations in NumPy and pandas to speed up your data analysis code
  • familiarity with pandas' Series and DataFrame objects, which allows the data to be accessed more conveniently
  • how to use Matplotlib to produce plots showing the findings

Findings

I observed that Budget and Revenue have a positively correlated relationship.The revenue has remained higher than the budget throughout the years. Drama and Comedy are most popular evident by amount of movies within each genre.

About

Projects displaying the application of Excel, SQL, and Tableau to manipulate, analyze, and visualize data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published