Brightics Studio v1.1-2020.07

gyu77hs released this 20 Jul 06:05

· 80 commits to master since this release

Release notes

New

Lots of functions for regression, time series, and transformation are added.
- PLS Regression Train
- PLS Regression Predict
- Time Series Distance
- SVD Model
- Linear Sampling
- Over Sampling (SMOTE)
- Under Sampling (Cluster Centroid)
- Correlation Filter
- Variance Filter
- Savitzky-Golay Filter
- Explode And Unexplode
- t-SNE
You can now use more text analytics tools as well with our brand new Brightics Studio functions.
- Document Summarizer (Korean)
- Topic Name Extraction
- GSDMM (Short text topic modeling)
- Dynamic Topic Modeling
- Document Influence Model
- Regular Expression
- NER (Named Entity Recognition)
- NER CRF Train
- NER CRF Predict

Enhancement

Some existing functions are upgraded and separated. (Original functions will be deprecated)
- Gaussian Mixture: A table result is added, so that you can evaluate performance of predict function.
- Label Encoder, Label Encoder Model: They now support multiple input columns.
- Cross Table: The result will be shown in a table.
- Wilcoxon Test: You can perform Wilcoxon test on numeric columns.
- Latent Dirichlet Allocation: You can see Document-topic ratio matrix as a new output table. Moreover, result model page containing the perplexity as an evaluation metric and a visualization of topic modeling are added. Default values of Number of topics and Number of topic words are changed as well.
- SVD: It now has its model output, which can be used in the new function SVD Model. Also projected columns will be shown with the original columns.
- Split Sentences: Now you can choose whether the original texts are displayed duplicated in the resulting table or not. Moreover, you can use it to texts with English and Korean mixed.
- Pivot, One Hot Encoder: Invalid column names are fixed so that they can be used as an input for another function.
- Tokenizer (Korean): You can set predefined compound words which will not be separated during tokenization process.
- Tokenizer (English): You can set predefined compound words which will not be separated during tokenization process. Also an option of converting all letters to lower cases is added.
Removed the Hold Columns option. From now on, the result table will contain all columns from the input table. Don't worry: you can use your old model with some hold columns.
- Polynomial Expansion
- String Split
- Replace Missing String
- Distinct
- Documents Summarizer (English)
- Extract Sentimental Words
- Text Search
- Stopwords Remover
- Synonym Converter
Query Executor: More regular expression functions are added.
Naive Bayes Predict: Joint log likelihood will be displayed if you want.
Normalization, One Hot Encoder, Label Encoder: Now you can see clean and neat model page.
LDA: Now it adjusts the number of components more correctly.
Columns To Array, Array To Columns: Now they support string type.
Classification Predict: Now it supports AdaBoost, MLP, SVM, and XGB.
Regression Predict: Now it supports AdaBoost, GLM, Isotonic, MLP, Random Forest, and XGB.
XGB Classification, XGB Regression: Now it plots normalized feature importances as Decision Tree, Random Forest, and AdaBoost.
Delete Missing Data: You can delete row or column, and delete missing values with more than given proportion.
Text Search: The result table will keep its original column order.
Word2Vec: Evaluation metrics are now added to the model page.
Function names in Palette are displayed in 3 lines for visibility.
Graphviz is added for Linux version.

Minor Fixes

An error which occurs when a dataframe contains a column of int64 dtype is fixed.
Outlier Detection (Tukey/Carling): From the result page, Number of Outliers in a Row is removed as the option has been disabled.
GLM Train: A typo is fixed.
Gaussian Mixture Train: Renamed to Gaussian Mixture.
Gaussian Mixture Predict: Default value of Display Probability is set to False.
AdaBoost & Random Forest Classification/Regression Train: Some parameters are renamed for consistency.
Pivot, Statistic Summary, Statistic Deviation: Some parameters are renamed for consistency, and information where each value is of either sample type or population type is added.
Unload: A bug, which occurs when string type is chosen as a global variable, is fixed.
Two Way ANOVA: Now you can set n to be 1.
Documents Summarizer (English): Now you can summarize a document which is considered as a single sentence.
Stopwords Remover, Synonym Converter: Input column types are restricted to string array type only.
Hierarchical Clustering: A bug of mismatching lengths is fixed. Also a bug which occurs when Key Column is not located in the end of table is fixed.
EWMA: Wilder's ratio option now works as expected.
Read CSV: Invalid column names are fixed so that they can be used as an input for another function.
Collaborative Filtering Train: A bug of handling large size of model is fixed.
Evaluate Ranking Algorithm: A bug of handling null values is fixed.
Evaluate Classification: A bug of mismatching labels is fixed.
PCA: Number of Components now works as expected.
Fixed a bug that some json models are dropped when importing project json.
" (double quotation mark) is prohibited when editing function name.
Fixed a bug that function output is not displayed when zooming in/out the screen.

Assets 4