/
TFN_DataSciencePage.html
executable file
·91 lines (73 loc) · 5.68 KB
/
TFN_DataSciencePage.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
<!DOCTYPE html>
<html>
<head>
<title>What is Data Science</title>
<link rel="stylesheet" href="Stylesheets/TFN_SiteStylesheet.css">
<link rel="stylesheet" href="Stylesheets/TFN_DropDownMenu.css">
<link rel="stylesheet" href="https://use.fontawesome.com/releases/v5.7.0/css/all.css" integrity="sha384-lZN37f5QGtY3VHgisS14W3ExzMWZxybE1SJSEsQp9S+oqd12jhcu+A56Ebc1zFSJ" crossorigin="anonymous">
</head>
<body>
<div class="dropdown">
<button class="dropbtn">
<i class="fas fa-bars"></i>
</button>
<div class="dropdown-content">
<a href="TFN_WelcomePage.html">Home</a>
<a href="TFN_AboutUsPage.html">About Us</a>
<a href="TFN_DataSciencePage.html">What is Data Science?</a>
<a href="TFN_BSProjects.html">Brown Scholars' Projects</a>
<a href="TFN_PracticePage.html">Practice Data Science</a>
<a href="TFN_ResourcesPage.html">Resources</a>
</div>
</div>
<header>
<h1>What is Data Science?</h1>
</header>
<div class = "parallax"> </div>
<h3>Overview</h3>
<p>Data science is the process of using code to explore or analyze data with the goal of answering a question or hypothesis. Data science incorporates elements of computer science and statistics, and is used in many fields.</p>
<div class = "parallax"> </div>
<h3>Real-World Applications</h3>
<ul>
<li>Companies use data science for advertising, especially targeted ads.</li>
<li>Apps and websites use user data to help them improve their products.</li>
<li>Organizations, such as AMNH, use ticketing data to analyze visitor trends.</li>
<li>Search engines, such as Google, use lots of data to determine which search results to display.</li>
<li>Websites, like Netflix, Youtube, and Instagram, use user data to choose recommendations.</li>
<li>Even delivery companies use data science to determine the best routes.</li>
</ul>
<p>Lots of people use data science without even realizing it! Some of the examples above are pretty intense, but data science doesn't have to be that complex. Data science ranges from making graphs in Excel, to analyzing data in Python, to much more!</p>
<div class = "parallax"> </div>
<h3>Advantages of Using Data Science</h3>
<ul>
<li>Has a wide range of uses.</li>
<li>Can analyze huge datasets.</li>
<li>Can make predictions based on past data.</li>
</ul>
<div class = "parallax"> </div>
<h3>A Brief History</h3>
<p>Data science began as part of statistics. As technology has continued to develop, data science has grown to include elements of computer science, Artifical Intelligence, Machine Learning, and other topics. In 1962, John Tukey wrote about the merging of statistics and computers, and what that meant for data analysis. Throughout the latter part of the 20th century, data science grew in popularity. Many people speculated about how to handle the increasing amount of data, and in 2001, Software-as-a-Service was created (a precursor to Cloud-based applications). In 2008, the term 'data scientist' became more widely used, and over the next few years, job listings for data scientists increased. Over the past 10 years, with the rise of more efficient technology and tools for data analysis, data science has grown within various fields, such as business, genetics, astronomy, humanities, engineering, and even government, among many others.</p>
<div class = "parallax"> </div>
<h3>Our Process</h3>
<ol>
<li>Found datasets to analyze on a website called Kaggle. There are many other ways to get data, such as collecting it yourself, 'scraping' it from other websites, or using a pre-existing dataset from a different website.</li>
<li>Saved the dataset in a .csv file, then created a Jypyter Notebook and connected the dataset file to it.</li>
<li>Explored the data using Pandas. We used different functions to find out more information about the types of data in the columns, and about measurements such as the mean, median, and mode of different colunms.</li>
<li>Cleaned the data by removing data points with null values and by going through the dataset and replacing incorrect names.</li>
<li>Asked questions about the data and tried to answer them by using different functions and creating new, smaller datasets out of the original that had more specific parameters.</li>
<li>Plotted graphs to visualize answers to certain questions.</li>
</ol>
<p></p>
<div class = "parallax"> </div>
<h3>Pandas</h3>
<h4> What is Pandas and Why did we use it?</h4>
<p>Pandas is a Python library which provides tools for data analysis. The Pandas library is not named after the animal, but after the econometric term “panel data”. This library allowed us to look at, describe, and visualize big datasets in an organized manner. Some of the functions we used included:</p>
<ul>
<li><b>describe</b> - which finds the mean, median, and other values of a column;</li>
<li><b>value_counts</b> - which sorts values in a column by how many times they appear;</li>
<li><b>groupby</b> - which groups the dataframe by a selected column or columns;</li>
<li><b>info</b> - which lists the columns of a dataframe and what data type they contain;</li><li><b>plot</b> - which creates different types of graphs out of data;</li>
</ul>
<p>.....and many more. In addition, we learned how to create new dataframes out of the originals with more specific data. Learning Pandas helped us clean up and better understand what our data contained. By understanding our data we can ask questions that can be answered by it. <a href="https://pandas.pydata.org">Click here to learn more</a></p>
</body>
</html>