machine-learning-classification

part 3: README from Repo 1

ML-logistic-regression-notes

files

QUICKSTART-WIN-VSC-BASH.md from

CoderSales/machine-learning-classification

primary source for this README: machine-learning-classification

Repository for running jupyter notebooks and keeping relevant files in one place

secondary source for this README: jupyter-6-Supervised-Learning

Repository for running jupyter notebooks and keeping relevant files in one place

notes

notes made for previous plan to remove null values

check how to remove null values from dataframe

notes

pandas .iloc() - locate by row, col indices .loc() - locate by row index and col NAME

how to run python files from terminal

python3 main.py

Data Cleaning

2.13 Lecture

df.drop('Column name', axis=1) - where axies = 0 for rows, 1 for columns - drops referenced column from data frame - inplace=True argument to ensure column stays dropped. df.drop(1,axis=0).reset_index() - new col with old indices df.drop(1,axis=0).reset_index(drop=True,inplace=True)

df.copy

4.1 Lecture Data Sanity Checks - Part 1

df['columnname'].apply(type).value_counts() - this looks at and notes the values by type and then counts them

df['colname'] = df['colname'].replace('missing','inf'],np.nan) - replaces our specified strings 'missing' and 'inf' - with np.nan

df['colname'] = df['colname'].astype(float) - convert values to float

Review note: when we substitute np.nan in for strings the resulting data type is (if all the other entries are say float) float.

df.info() - rerunning this after data cleaning may result in cleaned columns type changing to, say, float.

Check length of each column Columns shorter than max col length means missing values as empty cells

Alternative approach - clean while loading:

using na_values to tell python which values it should consider as NaN

data_new = pd.read_csv('/content/drive/MyDrive/Python Course/Melbourne_Housing.csv',na_values=['missing','inf'])

on load, above line automatically converts all missing and inf to nan so, running: data_new['BuildingArea'].dtype
gives dtype('float64') as only float (and nan which seems to be treated as whatever the rest of the data types are)

Review note

data['BuildingArea'].unique()

above line run before cleaning gives unique values in column as a numpy array
so can inspect to find out which strings to remove.

setup steps

python3 -m venv .venv - in bash - and on Windows source .venv/bin/activate - in bash source .venv/Scripts/activate - on Windows - on VSCode Windows bash /workspace/machine-learning-classification/.venv/bin/python -m pip install --upgrade pip - in GitPod python3 -m pip install --upgrade pip - on Windows

.venv/Scripts/python.exe -m pip install --upgrade pip - in .venv

pip install --upgrade pip pip install jupyter notebook pip install matplotlib pip install pandas pip install seaborn pip install numpy pip install scipy pip install statsmodels pip install -U scikit-learn pip install ipykernel pip install nb-black

Ctrl Shift P Create New Jupyter Notebook Save and name notebook Paste in necessary code

Ctrl Shift P Python: Select Interpreter use Python version in ./.venv/bin/python

pip freeze > requirements.txt

pip install -r requirements.txt

Add required files

pima-indians-diabetes.csv

Extensions

Extension: Excel Viewer - for viewing csv files in VSCode

Debug

jupyter cannot find modules

install modules from jupyter notebook

prelim

per above Python:Select Interpreter 3.10.9 (.venv)

ipykernel bug

after running pip install ipykernel on running LinearRegression_HandsOn-1.ipynb message appears saying: it is necessary to install ipykernel OK installing ipykernel Rerun LinearRegression_HandsOn-1.ipynb

pandas bug

after running pip install pandas pandas not found

Fix for previous 2 bugs

create new jupyter notebook using Ctrl Shift P Create New Jupyter Notebook

Files

summary

summary-income.md
- high level summary of steps in income.ipynb notebook

References

previous repositories

jupyter-test jupyter-repo-2 jupyter-3

References Part2 / (MyGreatLearning, Colab, modules)

MyGreatLearning

pre scikit-learn

scikit-learn

Supervised Learning - Foundations / Week 1 - Lecture Video Materials
- auto-mpg.csv used in 1.9 Linear Regression Hands-on

Colab

Google Colab mount drive

modules

matplotlib

matplotlib figure dimentions

Set plot dimensions matplotlib

scipy

scipy - check version

References Part3 / (StackOverflow, Git, Tutorials and Repositories)

StackOverflow

https://stackoverflow.com/questions/46419607/how-to-automatically-install-required-packages-from-a-python-script-as-necessary

Git

git

gitignore

How to stop tracking and ignore changes to a file in Git?

Gitpod

Git in VSCode

search string: pause git tracking
Git source control in VS Code

Tutorials and Repositories

References Part4 / (environments, Packages, Statistics, python, ML, Stats for ML)

environments

local

Getting Full Directory Path in Python

Windows Anaconda conda create --name .cenv y conda activate .cenv

python3

not installed so Windows store opens install Python 3.10

conda

virtual environment

conda.io

python environment

python3 -m venv .venv command was slow at first but self-resolved

search string: stuck on $ python3 -m venv .venv setting up environment in virtaulenv using python3 stuck on ...
search string: installing collected packages stuck why is the pip install process stuck on ''Installing collected packages" step?

Packages

NumPy

Pandas

matplotlib

search string: plotting fig from subplot returns Figure(1500x1000)
fig, ax = plt.subplots()
matplotlib docs fig, ax = plt.subplots()
search string: subplot
matplotlib.pyplot.subplot

subplots

colors

search string: fig.patch.set_facecolor('xkcd:blue')
xkcd.com/color/rgb/
search string: fig, axs = plt.subplots(2, 2)
Creating multiple subplots using plt.subplots >> Stacking subplots in two directions

other matplotlib

boxplot

histplot

matplotlib.pyplot.hist
search string matplotlib.pyplot histogram
Histogram with Boxplot above in Python
search string histogram_boxplot matplotlib

error

scipy

scipy.stats

statsmodels

scikit-learn

Documentation

search string: sklearn
scikit-learn | Machine Learning in Python
Getting Started -- skikit-learn
Citing scikit-learn
User Guide
Installing scikit-learn
Scikit-learn: Machine Learning in Python Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12, pp. 2825-2830, 2011.
redirects to https://scikit-learn.org/stable/ (link 2 in this section, above) Source code, binaries, and documentation

ipykernel

search string: ipykernel
pip install ipykernel ipykernel 6.19.2

colors for jupyter notebook charts

search string: pandas plot frame color -matplotlib
Pandas - Plotting
pandas.DataFrame.plot
search string: pandas plot
pandas.crosstab
search string pd crosstab
ANSWER to color: seaborn.set_style()
search string: sns seaborn color frame facecolor
seaborn.set_theme
search string: sns.set_theme(style="whitegrid")
seaborn.countplot | sns.set_theme(style="whitegrid")
search string: countplot sns perc
Actually, really change all of the background color | fig, ax = plt.subplots(facecolor='lightslategray'); | df.plot(ax=ax, color='white')
Change the facecolor of boxplot in pandas | stackoverflow
search string: pandas facecolor
jsfiddle | iterate through object properties
Recursively looping through an object to build a property list | stackoverflow
search string: how to recursively return all levels of an object
search string: is matplotlib. pyplot an object?
search string: matplotlib pyplot plt
pandas.crosstab | pandas | Documentation
search string: pd.crosstab color
saved search string: (autocomplete) [pd.crosstab df normalize='index').plot(kind="bar", figsize=(6,8),stacked=True)](link) -Creating Links in Markdown
[deprecated] | matplotlib.pyplot.figure | matplotlib | Documentation
Elegantly changing the color of a plot frame in matplotlib | fig, axes = plt.subplots(nrows=2); | axes[0].plot(range(10), 'r-'); | axes[1].plot(range(10), 'bo-'); | stackoverflow
search string: ply.figure frame color
How to change plot background color?
How do I plot two countplot graphs side by side in seaborn? | fig, ax =plt.subplots(1,2); | sns.countplot(df['batting'], ax=ax[0]); | sns.countplot(df['bowling'], ax=ax[1]); | fig.show() | stackoverflow
countplot sns subplot
How to prevent overlapping x-axis labels in sns.countplot | code: | plt.figure(figsize=(15,10)) #adjust the size of plot; | ax=sns.countplot(x=df['Location'],data=df,hue='label',palette='mako'); | stackoverflow
search string: countplot | recursively unpacck ax in sns countplot
Countplot using seaborn in Python | geeksforgeeks
search string: countplot sns ax frame
seaborn.countplot | content: | kwargs : key, value mappings | Other keyword arguments are passed through to matplotlib.axes.Axes.bar(). | Returns: | axmatplotlib Axes | Returns the Axes object with the plot drawn onto it. | seaborn | Documentation
search string: countplot sns

.venv error [Resolved]

find in page: | your path PermissionError: [Errno 13] Permission denied | terminal error trying to install preinstalled .venv | stackoverflow
search string: Error: [Errno 13] Permission denied: 'C:\Users\OneDrive\Documents\.venv\Scripts\python.exe'

0 Axes error [Resolved]

to remove with 0 Axes: comment line: | plt.figure(facecolor='blue').set_facecolor('xkcd:cerulean blue') I used matplotlib, but the error message '<Figure size 720x576 with 0 Axes>' appeared with graph

save Pandas dataframe/series data to figure then to file

Statistics

pandas print statement

turn off automatic pandas data type output on print statment

python

main.py (files 1 to 4) and script.sh in CoderSales/machine-learning-classification (repository reference below)

repository reference CoderSales/machine-learning-classification
slice strings in python
Check if Python Package is installed
pip install notebook
How to Execute Shell Commands with Python
How to print a string literally in Python
4 ways to add variables or values into Python strings
search string: percentage symbol pip bash
search string: python access "Option -c 4"
How to Execute Shell Commands with Python
import subprocess | subprocess.run('/path/to/script.sh', check=True) os.system() | run all shell commands with a single call

storing variables

naming arbitrary number of variables

turn off pandas index output

concatenate

concatenate with +

String into variable

.update() a dictionary

print separate with no spaces

Print without space in python 3

function

ML

Linear Regression

Logistic Regression

Statistics for ML (Logistic Regression)

detailed confusion matrix Precision and recall
used for calculation of F1 score Harmonic mean
image Geometric proof without words that max (a,b) > root mean square (RMS) or quadratic mean (QM) > arithmetic mean (AM) > geometric mean (GM) > harmonic mean (HM) > min (a,b) of two distinct positive numbers a and b
image QM_AM_GM_HM_inequality_visual_proof.svg/2560px-QM_AM_GM_HM_inequality_visual_proof.svg.png

F-beta score: sklearn documentation

Search string: F-beta score
Search string: F-beta score is the weighted harmonic mean of precision and recall
Search string: f2 ml sklearn
fbeta_score sklearn.metrics.fbeta_score
fbeta_score sklearn.metrics.fbeta_score

F score

F score

References Part5 / (other, VSCODE workflow window views, HTML, CSS, IMG)

VSCODE workflow window views

Keyboard Shortcuts > workbench.action.duplicateWorkspaceInNewWindow Ctrl Shift Alt N (modified from suggested on site) VSCODE workflow window views

font

HTML

CSS

not used box-shadow: red
used change body tag background color behind image
search string: css font color
CSS Text

nb-black / jupyter notebook formatting

search string: add color using nb black
bar How to change color in markdown cells ipython/jupyter notebook? | stackoverflow

Images

IMG

not used to crop images in css

SVG

Repositories

ResidentMario/matplotlib

References Part 6 / (bash, shell scripting)

import subprocess Python: How to script virtual environment building and activation?
Put this in main.py: | import yoursubfile | Treat it like a module: import file.How can I make one python file run another? [duplicate] | Get one python file to run another, using python 2.7.3 and Ubuntu 12.10:

subprocess file calls

How to add images to README.md on GitHub?
The error is pretty clear. The file hello.py is not an executable file. You need to specify the executable: subprocess.call(['python.exe', 'hello.py', 'htmlfilename.htm']) OSError: [WinError 193] %1 is not a valid Win32 application
Python Exception : bufsize must be an integer
Using the subprocess Module | python 3.11.2 subprocess — Subprocess management | Using the subprocess Module | python 3.11.2
How can I make one python file run another? [duplicate]
How to call a shell script from python code?
Your best option would be to do it in a function
activate () { . ../.env/bin/activate} How to source virtualenv activate in a Bash script
def my_function(): Python Functions
Main result: If you want to ignore a file that you've committed in the past, you'll need to delete the file from your repository and then add a .gitignore rule for it. | search string: how to add files to gitignore
Ignoring a previously committed file
JavaScript function definition syntax (uses curly brackets like bash syntax)Function.prototype.apply()
site to find out which language code is written in
Is there a website that can recognize and identify what programming language is being input (pasted)?
search string: '.' is not recognized as an internal or external command,
5 Ways to Fix the "Not Recognized as an Internal or External Command" Error in Windows
search string: subprocess.Popen() documentation
TypeError: got multiple values for argument
Python Exception : bufsize must be an integer

venv location

shell

search string: chmod executable shell script
chmod +x Steps to write and execute a script
search string: how to start shell script
search string: run shell using source
The first line in Bash scripts is a character sequence known as the "shebang." The shebang is the program loader's first instruction when executing the file, and the characters indicate which interpreter to run when reading the script. | Add the following line to the file to indicate the use of the Bash interpreter: | #!/bin/bash How to Write a Bash Script with Examples | Writing a Bash Script | Adding the "shebang" | #!/usr/bin/env | Uses the env program to locate the interpreter. Use this shebang for other scripting languages, such as Perl, Python, etc.
search string: what does comment do at top of shell script
How to activate a Python virtual environment from a script file
search string: python file to start venv
search string: pass raw strings from shell or py file to terminal to run command in terminal
search string: Taking Linux Command as Raw String in Python
Taking Linux Command as Raw String in Python
search string: how to pass raw code to terminal
What are some ways to pass raw bytes to a program via the Linux terminal?
Pass bash argument to python script
search string: pass arg to function python through bash call
venv — Creation of virtual environments | An example of extending EnvBuilder
search string: try catch shell python venv
PermissionError: [Errno 13] Permission denied
How to assign the output of a Bash command to a variable? [duplicate]
search string: #!/bin/bash -x PWD=pwd
search string: how to activate venv in existing shell
search string: use shell to activate venv
Learn X in Y minutes
Writing shell scripts
search string: automate virtual env

Name		Name	Last commit message	Last commit date
Latest commit History 332 Commits
datasets/ensemble		datasets/ensemble
html-ipynb/supervised/ensemble/boost		html-ipynb/supervised/ensemble/boost
img		img
old		old
other-documentation		other-documentation
pre-ML-cls-sbmsn		pre-ML-cls-sbmsn
.gitignore		.gitignore
.gitpod.yml		.gitpod.yml
Additional_CaseStudy_German_Credit (1) 3 template.ipynb		Additional_CaseStudy_German_Credit (1) 3 template.ipynb
Additional_CaseStudy_Loan_Delinquent_DSBA.ipynb		Additional_CaseStudy_Loan_Delinquent_DSBA.ipynb
Case_Study_Bike_Sharing_v1.ipynb		Case_Study_Bike_Sharing_v1.ipynb
Case_Study_WineQuality_Prediction_V1.ipynb		Case_Study_WineQuality_Prediction_V1.ipynb
DSBA_Project_EasyVisa_FullCode (1)_v5_post_submission.ipynb		DSBA_Project_EasyVisa_FullCode (1)_v5_post_submission.ipynb
DSBA_Project_EasyVisa_HighCode (1)_v4_post_submission.ipynb		DSBA_Project_EasyVisa_HighCode (1)_v4_post_submission.ipynb
DSBA_Project_EasyVisa_LowCode (1).ipynb		DSBA_Project_EasyVisa_LowCode (1).ipynb
DSBA_Project_EasyVisa_LowCode (1)_v2.ipynb		DSBA_Project_EasyVisa_LowCode (1)_v2.ipynb
DSBA_Project_EasyVisa_LowCode (1)_v4_post_submission.ipynb		DSBA_Project_EasyVisa_LowCode (1)_v4_post_submission.ipynb
DecisionTree_Notebook (1) (3).ipynb		DecisionTree_Notebook (1) (3).ipynb
Ensemble_Hands-On_Bagging.ipynb		Ensemble_Hands-On_Bagging.ipynb
Ensemble_Hands-On_Boosting.ipynb		Ensemble_Hands-On_Boosting.ipynb
German_Credit (1).csv		German_Credit (1).csv
Hackathon_1_for_train_dataset.ipynb		Hackathon_1_for_train_dataset.ipynb
Hotel_re_2.ipynb		Hotel_re_2.ipynb
INNHotelsGroup.csv		INNHotelsGroup.csv
IncomeGroupClassification_CaseStudy_DSBA_V2-v2-for-quiz.ipynb		IncomeGroupClassification_CaseStudy_DSBA_V2-v2-for-quiz.ipynb
IncomeGroupClassification_CaseStudy_DSBA_V2-v2.ipynb		IncomeGroupClassification_CaseStudy_DSBA_V2-v2.ipynb
IncomeGroupClassification_CaseStudy_DSBA_V2-v3-reduce.ipynb		IncomeGroupClassification_CaseStudy_DSBA_V2-v3-reduce.ipynb
LICENSE		LICENSE
Loan_Delinquent_Dataset.csv		Loan_Delinquent_Dataset.csv
Logistic_Regression_Hands-On.ipynb		Logistic_Regression_Hands-On.ipynb
MLS_HR_Attrition_ET+-+DSBA.ipynb		MLS_HR_Attrition_ET+-+DSBA.ipynb
Project_SLC_DSBA_INNHotels_FullCode_%281%29_Post_Submission_v2.ipynb		Project_SLC_DSBA_INNHotels_FullCode_%281%29_Post_Submission_v2.ipynb
Project_SLC_DSBA_INNHotels_FullCode_%281%29_Post_Submission_v2_post_recap.ipynb		Project_SLC_DSBA_INNHotels_FullCode_%281%29_Post_Submission_v2_post_recap.ipynb
Project_SLC_DSBA_INNHotels_LowCode_%281%29.ipynb		Project_SLC_DSBA_INNHotels_LowCode_%281%29.ipynb
Project_SLC_DSBA_INNHotels_LowCode_%281%29_(1).ipynb		Project_SLC_DSBA_INNHotels_LowCode_%281%29_(1).ipynb
QUICKSTART2.md		QUICKSTART2.md
README.md		README.md
SLF_Project_LearnerNotebook_LowCode_2.ipynb		SLF_Project_LearnerNotebook_LowCode_2.ipynb
STARTENV_PY.md		STARTENV_PY.md
Sample_Solution_(3).csv		Sample_Solution_(3).csv
Session_Notebook_Machine_Failure_Prediction_%281%29.ipynb		Session_Notebook_Machine_Failure_Prediction_%281%29.ipynb
activate_this.py		activate_this.py
credit.csv		credit.csv
debug-plot.ipynb		debug-plot.ipynb
main6.py		main6.py
nb_dictionaries.py		nb_dictionaries.py
nb_install.py		nb_install.py
package-checker.py		package-checker.py
requirements.txt		requirements.txt
script2.sh		script2.sh
scriptsource.sh		scriptsource.sh
startenv.py		startenv.py
style.css		style.css
summary-income.md		summary-income.md

License

CoderSales/machine-learning-classification

Folders and files

Latest commit

History

Repository files navigation

machine-learning-classification

primary source for this README: jupyter-6-Supervised-Learning

ML-logistic-regression-notes

All content below this point from documentation repository:

documentation

assembling:

part 1:

part 2: Repos used to compile this README.md :

ML-logistic-regression-notes