Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Position Emb and Chunk size #40

Open
liyucheng09 opened this issue May 5, 2024 · 0 comments
Open

Position Emb and Chunk size #40

liyucheng09 opened this issue May 5, 2024 · 0 comments

Comments

@liyucheng09
Copy link

Great job, I found two problems when trying to reproduce the paper's results.

  1. The same positiona embedding was used for all context memory units as explained in the paper. But I found in code implementation, there seems no use of position embedding for cached Ks at all?

  2. Why chunk size? The proposed method does the attention block by block, which (I think) wouldn't cause OOM errors even without the chunking trick in decoding. But I found it fail to process 100K text without setting chunk size, while using flash attn is totaly fine in such circumstances.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant