Skip to content

Prognostic Studies and ASReview #1547

Answered by jteijema
Emanuel-1986 asked this question in Q&A
Discussion options

You must be logged in to vote

Hi @Emanuel-1986!

We are currently working on publishing such a paper! I will update you as soon as the preprint is available.

As for your question, the clustering of words is dependent on your choice of feature extractor. While it is true that TF-IDF clusters similar words together, this is not the case for doc2vec or sBERT. These models are much more context dependent and have proven to be resistant to divergent terminology.

To simplify a little (a lot), these models will analyze the contextual usage of every word compared to every other word. This means that if two terms exist for the same concept but are used in similar contexts, they will end up closer in the embedding space. This is…

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@jteijema
Comment options

Answer selected by Emanuel-1986
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
question Further information is requested
2 participants