Skip to content

5 short projects involving Data Mining and Visualization. Coursework for Denver University Data Science MS

Notifications You must be signed in to change notification settings

pogags/DataMiningPractice

Repository files navigation

DataMiningPractice

Data Mining is described as the art of combing data to discover hidden patterns, connections, and trends. It uses methods at the intersection of machine learning, statistics, programming, and AI. Like any skill, it requires practice and knowledge, and these short forays into Data Mining served as exercises to build the muscles needed to undertake more complex projects later on. The purpose of this repository is two fold; present my experience and skills with data mining and serve as a home for recursive code to be used later.

All notebooks come furnished with a data exploration section and conclusion, and some sport classes and functions created to make for easy reproduction of models.

Project completed in collaboration with Dblash, Joshua Dobbins, and DonnaMulkern

Using the infamous Iris dataset, explored techniques of initial data exploration. Determine how the Iris dataset features relate to eachother and obtain general mathematic information relating to the features.

image

Employ multiple clustering techniques on the same Iris Dataset, utilize a PCA (Principal Component Analysis), and visualize the results. Additionally, this was used to evaluate multiple clustering techniques against eachother to determine the best suited one for this dataset.

image

Rather than using clustering, this is a exploration of classification techniques when applied to the Iris Dataset. 17 classification methods from the sklearn library are utilized. These classifications are again evaluated against eachother, and all methods used are outlined as well as their advantages and disadvantages described.

Regression models are investigated in this notebook using the California Housing dataset, with the target variable being Median House Value. 8 models and 5 methods are compared, again all from sklearn

image

Using data in the (expanded.csv)[https://github.com/pogags/DataMiningPractice/blob/main/expanded.csv], determine what a forager might want to look for when picking mushrooms to ensure a safe and appealing stew. The records in this dataset represent mushrooms, and the data has 22 features and 1 target class which is a binary of whether the mushroom is edible or poisonous. This notebook uses both data exploration as well as classification techniques to determine the best way to forage, and includes a bonus section that could be employed to pick safe mushrooms if the forager in question lost their sense of smell (scent had the highest feature importance generally).

image

image

About

5 short projects involving Data Mining and Visualization. Coursework for Denver University Data Science MS

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published