Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove_first_principal_component for Smooth Inverse Frequency in Simple Sentence Similarity.ipynb #10

Open
ziweiji opened this issue Aug 5, 2020 · 0 comments

Comments

@ziweiji
Copy link

ziweiji commented Aug 5, 2020

For Smooth Inverse Frequency in Simple Sentence Similarity.ipynb

In your code, merge sentences1 & sentences2 and remove_first_principal_component together.

        embeddings.append(embedding1)
        embeddings.append(embedding2)
embeddings = remove_first_principal_component(np.array(embeddings))

However, in original code of paper "A Simple but Tough-to-Beat Baseline for Sentence Embeddings" (https://github.com/PrincetonML/SIF/blob/master/src/sim_algo.py), the author calculate embedding1 and embedding2 (including remove_first_principal_component part) separately.

emb1 = SIF_embedding.SIF_embedding(We, x1, w1, params)
emb2 = SIF_embedding.SIF_embedding(We, x2, w2, params)

I wander if this difference influence the result considerably.

I am doing query task, so there is only one sentence in sentences1. Should I (1) merge query & answers and remove_first_principal_component together or (2) calculate embedding1 for query and embedding2 for answers separately or (3) save the svd of answers (sentences2) and then remove first_principal_component of sentences2 from weights of query (sentences1)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant