Project Overview

In this project, I analyzed a dataset and then communicated my findings about it. I used the Python libraries NumPy, pandas, and Matplotlib to make the analysis easier. This data set contains information about 10,000 movies collected from The Movie Database (TMDb), including user ratings and revenue. Certain columns, like ‘cast’ and ‘genres’, contain multiple values separated by pipe (|) characters. The final two columns ending with “_adj” show the budget and revenue of the associated movie in terms of 2010 dollars, accounting for inflation over time.

It is recommended to install Anaconda, which comes with all of the necessary packages, as well as IPython notebook.

I recommend installing Anaconda, which comes with all of the necessary packages, as well as IPython notebook. You can find installation instructions here.

What do I need to install? You will need an installation of Python, plus the following libraries:

pandas
NumPy
Matplotlib
csv

What questions I explored?

What is the number of movies released each year?
How long are the movies?
What is the average budget and revenue for the movies?
What is the relationship between movie budget and revenue?
How many movies in each genre?
What is the most profitable movie?

What have I learned?

After completing the project, I learned the following :

All the steps involved in a typical data analysis process
Being comfortable posing questions that can be answered with a given dataset and then answering those questions
Knowing how to investigate problems in a dataset and wrangle the data into a format I can use
Having practice communicating the results of my analysis
Being able to use vectorized operations in NumPy and pandas to speed up your data analysis code
familiarity with pandas' Series and DataFrame objects, which allows the data to be accessed more conveniently
how to use Matplotlib to produce plots showing the findings

Findings

I observed that Budget and Revenue have a positively correlated relationship.The revenue has remained higher than the budget throughout the years. Drama and Comedy are most popular evident by amount of movies within each genre.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
README.md		README.md
moviedbproject (1).ipynb		moviedbproject (1).ipynb
numpypandasmatplotlib.png		numpypandasmatplotlib.png
tmdb-movies.csv		tmdb-movies.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

moviedbproject (1).ipynb

moviedbproject (1).ipynb

numpypandasmatplotlib.png

numpypandasmatplotlib.png

tmdb-movies.csv

tmdb-movies.csv

Repository files navigation

Project Overview

It is recommended to install Anaconda, which comes with all of the necessary packages, as well as IPython notebook.

What do I need to install? You will need an installation of Python, plus the following libraries:

What questions I explored?

What have I learned?

Findings

About

Releases

Packages

Languages

dainantonio/Investigating-a-TMDB-5000-Movie-Dataset

Folders and files

Latest commit

History

Repository files navigation

Project Overview

It is recommended to install Anaconda, which comes with all of the necessary packages, as well as IPython notebook.

What do I need to install? You will need an installation of Python, plus the following libraries:

What questions I explored?

What have I learned?

Findings

About

Resources

Stars

Watchers

Forks

Languages