Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Representation: most_similar, closes #45 #147

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

henrifroese
Copy link
Collaborator

@henrifroese henrifroese commented Aug 9, 2020

  • add function most_similar in representation.py

Only a draft at the moment; no tests yet.

*EDIT: *

Interested in opinions concerning the "function design"! It currently has signature

most_similar(
    s: TextSeries,
    s_represented: Union[VectorSeries, RepresentationSeries],
    vector: List[float],
    max_number=None,
) -> TextSeries:

and works e.g. like this:

    >>> import texthero as hero
    >>> import pandas as pd
    >>> s = pd.Series(["I like football", "Hey, watch out", "I like sports", "Cool stuff"])
    >>> s_pca = s.pipe(hero.tokenize).pipe(hero.tfidf).pipe(hero.flatten).pipe(hero.pca) # TODO: remove flatten when pca is updated w.r.t. Representation Series
    >>> # want to find the two most similar to "I like football", which has index 0
    >>> s_most_similar = hero.most_similar(s, s_pca, s_pca[0], max_number=2)
    >>> s_most_similar
    0    I like football
    2      I like sports
    dtype: object

- function `most_similar` in `representation.py`

Co-authored-by: Maximilian Krahn <maximilian.krahn@icloud.com>
@henrifroese
Copy link
Collaborator Author

Put on hold until Series Types are figured out.

@henrifroese
Copy link
Collaborator Author

  • maybe use cosine distance by default; add argument to choose function
  • maybe change signature to most_similar(s: TokenSeries(?), text, representation_function, metric)
  • maybe prepare tutorial for users instead of adding function to TextHero

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant