Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi Input/output transformers #236

Open
wants to merge 32 commits into
base: main
Choose a base branch
from

Conversation

RyanKim17920
Copy link

Refer to #235, I fixed naming conventions and all merging conflicts. I still haven't tested all the features completely so there may be errors.

 - working on multi input/output along with multi input, created base along with multi output only (not tested yet)
 - need to use memory tokens, probably will do later
 - reverting x_transformers to prevent conflicts (I had used dataspell's auto format but it seems to have deleted some code)
 - I improved the naming of the attention layers
cache=None,
**kwargs
):
global intermediates_model, cache_pre_attn_layers, cache_model, cache_post_attn_layers, intermediates_pre_attn_layers, intermediates_pre_attn_layer, mem_packed_shape, mem_every
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you using a global here for the caches?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I don't think it's completely necessary though, I just did it because my IDE wanted me to make them global. It probably could be deleted without causing much issues

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see, want to delete it from the PR?

@lucidrains
Copy link
Owner

lucidrains commented Feb 7, 2024

@RyanKim17920 hey Ryan, this looks like good effort

are you using all the features present in this wrapper? i would suggest just removing everything except for the multi-io portion

@RyanKim17920
Copy link
Author

I believe most of the features should be working properly but I wasn't exactly too sure how to implement the memory-based features so those ones may not implemented well

@lucidrains
Copy link
Owner

lucidrains commented Feb 10, 2024

@RyanKim17920 yea, you can just remove the memory-based features, as well as anything that isn't related to what you are working on. just allow the multi-io logic to shine through

could you also add an example execution, in the same style as the other examples in the readme?

@lucidrains
Copy link
Owner

@RyanKim17920 how well does it work for your project? have you trained anything with it?

@RyanKim17920
Copy link
Author

RyanKim17920 commented Feb 13, 2024

@RyanKim17920 how well does it work for your project? have you trained anything with it?

Ah, I haven't worked on it in a while so I haven't tested the model system yet. I was trying to generalize the transformers first so that the training process itself would work out smoother

- there are too many issues in the io wrapper currently, restarting from scratch
(tested features too)
 - memory system does not work and need fixes, basic training can function though (without memory)
Fixed errors around this
 - it was actually broken before, false IOtesting skills
padding applied for auto regressive xl
- method to prevent loss on padding values because it would cause nan loss
errors in early breaking/module list fixed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants