-
-
Notifications
You must be signed in to change notification settings - Fork 361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi Input/output transformers #236
base: main
Are you sure you want to change the base?
Conversation
- working on multi input/output along with multi input, created base along with multi output only (not tested yet)
- need to use memory tokens, probably will do later
- reverting x_transformers to prevent conflicts (I had used dataspell's auto format but it seems to have deleted some code) - I improved the naming of the attention layers
cache=None, | ||
**kwargs | ||
): | ||
global intermediates_model, cache_pre_attn_layers, cache_model, cache_post_attn_layers, intermediates_pre_attn_layers, intermediates_pre_attn_layer, mem_packed_shape, mem_every |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you using a global
here for the caches?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I don't think it's completely necessary though, I just did it because my IDE wanted me to make them global. It probably could be deleted without causing much issues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see, want to delete it from the PR?
@RyanKim17920 hey Ryan, this looks like good effort are you using all the features present in this wrapper? i would suggest just removing everything except for the multi-io portion |
I believe most of the features should be working properly but I wasn't exactly too sure how to implement the memory-based features so those ones may not implemented well |
@RyanKim17920 yea, you can just remove the memory-based features, as well as anything that isn't related to what you are working on. just allow the multi-io logic to shine through could you also add an example execution, in the same style as the other examples in the readme? |
@RyanKim17920 how well does it work for your project? have you trained anything with it? |
Ah, I haven't worked on it in a while so I haven't tested the model system yet. I was trying to generalize the transformers first so that the training process itself would work out smoother |
- there are too many issues in the io wrapper currently, restarting from scratch
(tested features too)
- memory system does not work and need fixes, basic training can function though (without memory)
Fixed errors around this
- it was actually broken before, false IOtesting skills
padding applied for auto regressive xl - method to prevent loss on padding values because it would cause nan loss
Refer to #235, I fixed naming conventions and all merging conflicts. I still haven't tested all the features completely so there may be errors.