Skip to content

Brightics Studio v1.1-2020.07

Compare
Choose a tag to compare
@gyu77hs gyu77hs released this 20 Jul 06:05
· 80 commits to master since this release

Release notes

New

  • Lots of functions for regression, time series, and transformation are added.
    • PLS Regression Train
    • PLS Regression Predict
    • Time Series Distance
    • SVD Model
    • Linear Sampling
    • Over Sampling (SMOTE)
    • Under Sampling (Cluster Centroid)
    • Correlation Filter
    • Variance Filter
    • Savitzky-Golay Filter
    • Explode And Unexplode
    • t-SNE
  • You can now use more text analytics tools as well with our brand new Brightics Studio functions.
    • Document Summarizer (Korean)
    • Topic Name Extraction
    • GSDMM (Short text topic modeling)
    • Dynamic Topic Modeling
    • Document Influence Model
    • Regular Expression
    • NER (Named Entity Recognition)
    • NER CRF Train
    • NER CRF Predict

Enhancement

  • Some existing functions are upgraded and separated. (Original functions will be deprecated)
    • Gaussian Mixture: A table result is added, so that you can evaluate performance of predict function.
    • Label Encoder, Label Encoder Model: They now support multiple input columns.
    • Cross Table: The result will be shown in a table.
    • Wilcoxon Test: You can perform Wilcoxon test on numeric columns.
    • Latent Dirichlet Allocation: You can see Document-topic ratio matrix as a new output table. Moreover, result model page containing the perplexity as an evaluation metric and a visualization of topic modeling are added. Default values of Number of topics and Number of topic words are changed as well.
    • SVD: It now has its model output, which can be used in the new function SVD Model. Also projected columns will be shown with the original columns.
    • Split Sentences: Now you can choose whether the original texts are displayed duplicated in the resulting table or not. Moreover, you can use it to texts with English and Korean mixed.
    • Pivot, One Hot Encoder: Invalid column names are fixed so that they can be used as an input for another function.
    • Tokenizer (Korean): You can set predefined compound words which will not be separated during tokenization process.
    • Tokenizer (English): You can set predefined compound words which will not be separated during tokenization process. Also an option of converting all letters to lower cases is added.
  • Removed the Hold Columns option. From now on, the result table will contain all columns from the input table. Don't worry: you can use your old model with some hold columns.
    • Polynomial Expansion
    • String Split
    • Replace Missing String
    • Distinct
    • Documents Summarizer (English)
    • Extract Sentimental Words
    • Text Search
    • Stopwords Remover
    • Synonym Converter
  • Query Executor: More regular expression functions are added.
  • Naive Bayes Predict: Joint log likelihood will be displayed if you want.
  • Normalization, One Hot Encoder, Label Encoder: Now you can see clean and neat model page.
  • LDA: Now it adjusts the number of components more correctly.
  • Columns To Array, Array To Columns: Now they support string type.
  • Classification Predict: Now it supports AdaBoost, MLP, SVM, and XGB.
  • Regression Predict: Now it supports AdaBoost, GLM, Isotonic, MLP, Random Forest, and XGB.
  • XGB Classification, XGB Regression: Now it plots normalized feature importances as Decision Tree, Random Forest, and AdaBoost.
  • Delete Missing Data: You can delete row or column, and delete missing values with more than given proportion.
  • Text Search: The result table will keep its original column order.
  • Word2Vec: Evaluation metrics are now added to the model page.
  • Function names in Palette are displayed in 3 lines for visibility.
  • Graphviz is added for Linux version.

Minor Fixes

  • An error which occurs when a dataframe contains a column of int64 dtype is fixed.
  • Outlier Detection (Tukey/Carling): From the result page, Number of Outliers in a Row is removed as the option has been disabled.
  • GLM Train: A typo is fixed.
  • Gaussian Mixture Train: Renamed to Gaussian Mixture.
  • Gaussian Mixture Predict: Default value of Display Probability is set to False.
  • AdaBoost & Random Forest Classification/Regression Train: Some parameters are renamed for consistency.
  • Pivot, Statistic Summary, Statistic Deviation: Some parameters are renamed for consistency, and information where each value is of either sample type or population type is added.
  • Unload: A bug, which occurs when string type is chosen as a global variable, is fixed.
  • Two Way ANOVA: Now you can set n to be 1.
  • Documents Summarizer (English): Now you can summarize a document which is considered as a single sentence.
  • Stopwords Remover, Synonym Converter: Input column types are restricted to string array type only.
  • Hierarchical Clustering: A bug of mismatching lengths is fixed. Also a bug which occurs when Key Column is not located in the end of table is fixed.
  • EWMA: Wilder's ratio option now works as expected.
  • Read CSV: Invalid column names are fixed so that they can be used as an input for another function.
  • Collaborative Filtering Train: A bug of handling large size of model is fixed.
  • Evaluate Ranking Algorithm: A bug of handling null values is fixed.
  • Evaluate Classification: A bug of mismatching labels is fixed.
  • PCA: Number of Components now works as expected.
  • Fixed a bug that some json models are dropped when importing project json.
  • " (double quotation mark) is prohibited when editing function name.
  • Fixed a bug that function output is not displayed when zooming in/out the screen.