Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transformer Day: Create Transformer Exercises #178

Open
2 tasks
ramon-astudillo opened this issue Jun 6, 2023 · 3 comments
Open
2 tasks

Transformer Day: Create Transformer Exercises #178

ramon-astudillo opened this issue Jun 6, 2023 · 3 comments
Assignees

Comments

@ramon-astudillo
Copy link
Member

Objective: Create exercises where students have to complete miniGPT code

  • Decide on exercise structure. Normally we have
    1. warm-up: some easy example to complete to gain knowledge about internals e.g. separate snippet of attention / feed-forward code. Very light coding or none needed, just run and look at plots/numbers
    2. the main exercise: complete some part of missing minGPT code. (Attention?)
    3. the extra exercise (usually this comes completed). If we have time and manage to add fine-tunining

we can start the discussion here @israfelsr @robertodessi , based on https://github.com/karpathy/minGPT/blob/master/mingpt/model.py

  • After completion of Transformer Day: Get miniGPT up and running #177 extend the notebook created in this issue to hold the exercises decided above.
    • note that if we remove some minGPT Ex 1 can be just running GPT, but we can think of running some simple part.
    • see other labs/notebooks/ as example

Branch: https://github.com/LxMLS/lxmls-toolkit/tree/transformer-day-student

NOTE: ⬆️ since we are going to delete parts, this goes in a separate branch that is merged with student. You can pull updates from transformer-day.

Expected Finishing date:

  • Deciding on exercises: Ideally before June 12 meeting. If not June 19, the next meeting.
  • Implementation: Lets aim for June 26 or June 3 latest. We should aim at closing something that works. Then we can upgrade with the extra time.
@israfelsr
Copy link
Contributor

Hi guys! I've been exploring some resources and brainstorming exercise ideas. I like the concept of having three levels. Here are my thoughts:

  1. Warm-Up exercise: I suggest plotting trained attention weights on a sentence. This exercise can involve trying different sentences or explaining the behavior of specific attention heads on different tokens. I've created a small code using BERT and transformers (HF) to plot the heatmaps. We can improve the plot to make it more intuitive!

  2. Core Exercise: I propose coding the complete Attention module. We can begin with some theoretical exercises, such as determining the dimensions of Q, K, and V for different input scenarios, or discussing the importance of this projection. For the coding part, we can provide dimension hints to guide the implementation. Additionally, we can ask to calculate the number of parameters for a given configuration.

  3. Training and Prompting: Rather than solely loading a pre-existing architecture, maybe we could start creating a small network and training it. We can give instructions to build a network with specific layers and parameters, and then train it using provided data. We can structure this exercise step-by-step with open instructions: loading the data, creating the model, and training it. We can compare this small model with a pretrained one and even ask about the relation performance/amount of weights.

Let me know what do you think!
I have an starting code for the first one that we can improve, for the second one I think we can use the miniGPT module and then for the last one I can start checking some small architectures if you think it's a good idea.

@pedrobalage
Copy link
Contributor

Here are some suggestions from my view on the topic:

Context

  • How much time the students will have?
    • ~ 2.5 hours
  • Which day we will place the exercises in the school?
    • Students know to work with dimensions in numpy.
    • Students know sequence models (factor models and non-linear - LSTM).
    • Some students may not fully understand the toolkit code.
  • What are the levels for the students?
    • Most just follow the guide and check the solutions, but some try to implement them.
    • We should challenge the students to "code" the mathematical model as part of the "learning" process.
    • Running and compare models are okay, but we need to make sure they get the ideas on comparing models.

I like to walk back on what we want the students need to learn from the exercices. My suggestions are:

  • Understand the limitations of a pure LTSM or RNN model. How attention can improve them?
  • As we work only with attention, what are the pros and cons? Speed and positional information
  • We can add complexity with multi-heads. What is the impact they do?
  • How this architecture can lead to sophisticated models (many parameters, large models with good performance).

Suggestions for the exercices:

  • Implement the positional encoding or any other block
  • Extend the number of heads from 1 to n
  • Bonus: run an external model (hugging face)

What we are not going to cover:

  • Model finetunning
  • Efficient operations in PyTorch
  • Data requirements for model training (tokenization, cleaning, loss optimization)
  • Training large models

@robertodessi
Copy link
Contributor

Can we close this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants