A practical toolbox for machine learning and statistical analysis
- currently only support mac os
- download the whole folder, and open terminal
- install pipenv and run
pipenv install
andpipenv shell
- then run
python app.py
- now the app takes two types of data:
- table data: csv file with header
- text data: txt file
- table view is used to look at the origin data
- large data frame is split with pagination
- to get a more general view, you can shuffle the data
- basic statics and correlation are also available
- plot view will give a more abstract idea of the dataset through plots
- you can select which features you want to plot
- currently support: kde, histogram, line, box, bar and scatter matrix
- this is only for text data, we can read through the content
- we provide several scalers to scale the features
- popular encoders
- when the dataset is too large, drop columns, rows or just draw a sample
- different strategies to handle missing data
- build clusters using the following algorithms
- kmeans
- hierarchical methods
- affinity propagation
- visualize the clusters using popular plot methods
- t-sne
- radviz
- top 2 pcas
- hierarchical tree(only for hierarchical methods)
- support both biclass problems and multiclass problems
- lots of common classifiers
- train test split and cross validation
- basic metrics and confusion matrix
- linear regression with basic statistics
- glm (gamma)