Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modifying the models hyperparameters #124

Open
benjamin27315k opened this issue Feb 16, 2024 · 7 comments
Open

Modifying the models hyperparameters #124

benjamin27315k opened this issue Feb 16, 2024 · 7 comments
Assignees

Comments

@benjamin27315k
Copy link

Hello there,

I'm new to neural speed, coming from llama-cpp-python, and i encounter some problems (probably due to a misunderstanding on my side).

I don't want to flood you with issues so I'll start with my two main questions :

  • is there a way to change the model hyperparameters (the temperature, mostly) ?
  • is there a way to not use a tokenizer coming from HF, and instead do like llama-cpp and use the tokenizer included in the .gguf file ? (in my use case, I'd like to not depend on an external lib)

Thank you !

@Zhenzhong1
Copy link
Contributor

Zhenzhong1 commented Feb 18, 2024

@benjamin27315k Hi

  1. is there a way to change the model hyperparameters (the temperature, mostly) ?

Yes, you can change the model hyperparameter directly in this file https://github.com/intel/neural-speed/blob/main/neural_speed/convert/convert_llama.py#L1159-L1176. Take llama as an example. Just modify and re-run this script to get a customized gguf file.

  1. is there a way to not use a tokenizer coming from HF, and instead do like llama-cpp and use the tokenizer included in the .gguf file ? (in my use case, I'd like to not depend on an external lib)

Yes. when you get a gguf model file, you can run this script
python scripts\inference.py --model_name llama2 -m ggml-model-q4_0.gguf -n 512 -p "Building a website can be done in 10 simple steps:"

This script will use the tokenizer inside of the gguf.

@Zhenzhong1 Zhenzhong1 self-assigned this Feb 18, 2024
@benjamin27315k
Copy link
Author

Thank you very much, @Zhenzhong1 !
I'll try that and keep you posted if anything goes wrong 👍

@benjamin27315k
Copy link
Author

benjamin27315k commented Feb 21, 2024

Hi @Zhenzhong1

so I tried what you advised and :

Yes, you can change the model hyperparameter directly in this file https://github.com/intel/neural-speed/blob/main/neural_speed/convert/convert_llama.py#L1159-L1176. Take llama as an example. Just modify and re-run this script to get a customized gguf file.

I saw all the parameters for loading the model, but not the inference parameters (temperature, top_p, etc...)

Yes. when you get a gguf model file, you can run this script python scripts\inference.py --model_name llama2 -m ggml-model-q4_0.gguf -n 512 -p "Building a website can be done in 10 simple steps:"

Do I need a customized .gguf file for this command line to run, or one I just downloaded as is would work (it currently doesn't : error loading model: unrecognized tensor type 10)(ah, sorry, I'm trying to use this model : llama-2-7b-chat.Q2_K.gguf, taken from TheBloke on HF)

@benjamin27315k
Copy link
Author

Nevermind my second question, I forgot it had to be Q4 ^^

@benjamin27315k
Copy link
Author

Ah, and while I'm at it, when using CLI, if I get an error, it freezes the terminal. Would you have a trick to avoid that ?

@Zhenzhong1
Copy link
Contributor

Zhenzhong1 commented Feb 22, 2024

Ah, and while I'm at it, when using CLI, if I get an error, it freezes the terminal. Would you have a trick to avoid that ?

@benjamin27315k just type stty echo in command line, will be ok.

@Zhenzhong1
Copy link
Contributor

Zhenzhong1 commented Feb 22, 2024

saw all the parameters for loading the model, but not the inference parameters (temperature, top_p, etc...)

@benjamin27315k

you don't need a customized GGUF if you only want to modify inference parameters.

Inference parameters are input args, just modify them in the command line. Please check this https://github.com/intel/neural-speed/blob/main/docs/advanced_usage.md for more inference parameters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants