[Analyzer] Unsupervised Clustering #130

shahrukhx01 · 2021-06-07T17:01:07Z

@lalitpagaria for getting document vectors we can use this

https://github.com/UKPLab/sentence-transformers

shahrukhx01 · 2021-07-05T07:19:08Z

@lalitpagaria following are the steps involved in doing this:

Take n number of text documents and extract sentence/document embeddings using sentence transformers.
Apply unsupervised clustering algorithms, from Sklearn https://scikit-learn.org/stable/modules/clustering.html
Show the actual raw texts in grouped form
Alternatively apply dimensionality reductions and show a visualization like this and link each point of visualization to actual raw text/ maybe show on hover etc.

Hope this would help.

lalitpagaria · 2021-07-05T13:32:47Z

@shahrukhx01 Thank for the information. Let me read them out.
For first version would it possible to build cluster on list of texts.
For example if Obsei fetch 200 reviews, then using these 200 texts can we generate cluster. Then tag each and every reviews based on which cluster it belongs to.
Also it is possible to get multiple categories?

shahrukhx01 · 2021-07-05T14:23:47Z

@lalitpagaria that's where topic modelling come into play, to assign categories based on the content of the documents. We have a separate issue for that #131

lalitpagaria · 2021-07-07T18:31:41Z

Yeah my bad. Then let's integrate Topic modelling first.

shahrukhx01 · 2021-07-07T19:03:54Z

@lalitpagaria could you create a dataset of 200 posts as a csv and host it on Kaggle, I’ll take it up in the first week up August if no ones takes up these two issues

shahrukhx01 added the enhancement New feature or request label Jun 7, 2021

shahrukhx01 mentioned this issue Jun 7, 2021

[Analyzer] Topic Modeling #131

Open

lalitpagaria assigned shahrukhx01 Jun 7, 2021

shahrukhx01 changed the title ~~Unsupervised Clustering~~ [Analyzer] Unsupervised Clustering Jul 5, 2021

lalitpagaria added the analyzer label Jul 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Analyzer] Unsupervised Clustering #130

[Analyzer] Unsupervised Clustering #130

shahrukhx01 commented Jun 7, 2021 •

edited

shahrukhx01 commented Jul 5, 2021

lalitpagaria commented Jul 5, 2021

shahrukhx01 commented Jul 5, 2021

lalitpagaria commented Jul 7, 2021

shahrukhx01 commented Jul 7, 2021

[Analyzer] Unsupervised Clustering #130

[Analyzer] Unsupervised Clustering #130

Comments

shahrukhx01 commented Jun 7, 2021 • edited

shahrukhx01 commented Jul 5, 2021

lalitpagaria commented Jul 5, 2021

shahrukhx01 commented Jul 5, 2021

lalitpagaria commented Jul 7, 2021

shahrukhx01 commented Jul 7, 2021

shahrukhx01 commented Jun 7, 2021 •

edited