Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

/research using perplexity api #866

Open
0x4007 opened this issue Oct 21, 2023 · 19 comments · May be fixed by #875
Open

/research using perplexity api #866

0x4007 opened this issue Oct 21, 2023 · 19 comments · May be fixed by #875

Comments

@0x4007
Copy link
Member

0x4007 commented Oct 21, 2023

https://blog.perplexity.ai/blog/introducing-pplx-api

Perplexity is optimized for q&a and live web research so perhaps it's a better backend for the ask command.

I use their consumer facing product and it's very effective.

@Keyrxng
Copy link
Contributor

Keyrxng commented Oct 22, 2023

/start

@ubiquibot
Copy link

ubiquibot bot commented Oct 22, 2023

Skipping /start since no time labels are set to calculate the timeline

@Keyrxng
Copy link
Contributor

Keyrxng commented Oct 22, 2023

@pavlovcik I'll get this done this afternoon

@Keyrxng
Copy link
Contributor

Keyrxng commented Oct 22, 2023

We currently support Mistral 7B, Llama 13B, Code Llama 34B, Llama 70B, and Replit 3b. The API is conveniently OpenAI client-compatible for easy integration with existing applications.

These are the models available through the API, the replit one is the only one that isn't a chat model.

What model do you want to run? Looking at their UI you have claude, gpt-4 and perplexity to choose from but it's not clear right off the bat what model is perplexity from those options on the api reference docs, perhaps the perplexity model isn't available through the API? Or is it a white-labeled llama 70b?

It defaults to Mistral 7B so I'm assuming it's that and to just run with the default?

@Keyrxng
Copy link
Contributor

Keyrxng commented Oct 22, 2023

Should have looked into this first, but their context limit isn't capable of handling our needs at least not until they increase the limits.


Where possible, we try to match the Hugging Face implementation. We are open to adjusting the API, so please reach out with feedback regarding these details.

Model Context Length Special Tokens Model Type
codellama-34b-instruct 4096 [2] [INST], [/INST], <>, <> Chat Completion
llama-2-13b-chat 4096 [INST], [/INST], <>, <> Chat Completion
llama-2-70b-chat 4096 [INST], [/INST], <>, <> Chat Completion
mistral-7b-instruct 4096 [1][2] [INST], [/INST] Chat Completion
replit-code-v1.5-3b 4096 No special tokens Text Completion

[1] We drop any added system messages. For system prompting, per Mistral's recommendation, you can concatenate the system prompt with the first user message.

[2] We will be increasing the context length of codellama-34b-instruct to 16k tokens, and increasing the context length of mistral-7b-instruct to 32k tokens.

@0x4007
Copy link
Member Author

0x4007 commented Oct 23, 2023

What if on issue view, we make /ask use perplexity? This could make sense because issue should be more research focused (refining the specification.) Given the limited context window, we can pass in the specification, and the sender comment only.

On the pull request review, we should use GPT-4 (with code interpreter?) so that we can pass in the diff, the conversation, and it can suggest direct code adjustments.

Perplexity Pros:

  • Perplexity has fast and high quality results from searching the internet.
  • It seems to be optimized specifically for asking questions, whereas ChatGPT has specializations including calculations and working with code.

GPT-4 Pros:

  • Large context length, which means we can include the entire conversation as context.

I'm using the free version of perplexity so I only have used the perplexity model. It seems to work quite well.

@Keyrxng
Copy link
Contributor

Keyrxng commented Oct 23, 2023

I hear what you are saying Pav and I think until they up the context limit our hands are tied.

Review currently doesn't care about linked and conversation etc

Ask does consider all of the linked context as well as the current issue context which in my demos with miniscule issues, convos and prs it was eating up 4k tokens like it was nothing. The original scope of ask was that it would take as much context as possible to be able to provide better responses for research/issue brainstorming/planning.

What I had tried was just replaced the askGPT core api call and replaced it with perplexity, also swapped out the gptContextCall for perp but couldn't get any decent responses due to context window and formatting

I think GPT3.5 will perform better with the additional context than perp with the reduced context window but an improved model. Soon as that 16k window hits I think switching it out would be the best idea, although pricing adds a matter of perplexity to the AI feature suite. We'd need to allow for a switch of sorts so that if no perp API KEY is provided but an OpenAI one is then we use the right model

I took the 7 day free trial for the annual plan and done a bit of playing around myself, pretty good i must say.

GPT-4 (with code interpreter?)

Isn't code interpreter just a python plugin custom text splitter/parser?

I'm using the free version of perplexity so I only have used the perplexity model.

From what I gathered, I think they are using Mistral as their main model

@0x4007
Copy link
Member Author

0x4007 commented Oct 23, 2023

I hear what you are saying Pav and I think until they up the context limit our hands are tied.

Add the spec and if the token counter is too high, then perhaps just the sender comment. I'll only know for sure how valuable the feature is when testing with real issues. But intuitively the more context we provide, the more relevant results I would expect.

@Keyrxng
Copy link
Contributor

Keyrxng commented Oct 23, 2023

I'll give it a try and open the draft

What I'm troubled with is:

  • Current issue body is minimal and the actual relevant context lives within the linked 'original comment' or is spread over multiple issues/prs. The context could be the issue/pr body or it may be a comment within (Both scenarios are not uncommon to see)

I'll add the spec, count the links in the body, determine tokencount of spec and question, if it's 1/3 or more than that'll do? If it's less than 1/3 grab whatever the body is of the linked context and fire?

telegrammed you my api key for perp

@ishaan-jaff
Copy link

Hi @pavlovcik @Keyrxng - I believe we can make this easier
I’m the maintainer of LiteLLM - we allow you to deploy an LLM proxy to call 100+ LLMs in 1 format - Perplexity, Bedrock, OpenAI, Anthropic etc https://github.com/BerriAI/litellm/tree/main/openai-proxy.

If this looks useful (we're used in production)- please let me know how we can help.

Usage

Perplexity request

curl http://0.0.0.0:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
     "model": "perplexity/mistral-7b-instruct",
     "messages": [{"role": "user", "content": "Say this is a test!"}],
     "temperature": 0.7
   }'

gpt-3.5-turbo request

curl http://0.0.0.0:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
     "model": "gpt-3.5-turbo",
     "messages": [{"role": "user", "content": "Say this is a test!"}],
     "temperature": 0.7
   }'

claude-2 request

curl http://0.0.0.0:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
     "model": "claude-2",
     "messages": [{"role": "user", "content": "Say this is a test!"}],
     "temperature": 0.7
   }'

@ubiquibot
Copy link

ubiquibot bot commented Oct 23, 2023

Available commands

- /start: Assign the origin sender to the issue automatically.
- /stop: Unassign the origin sender from the issue automatically.
- /help: List all available commands.
- /autopay: Toggle automatic payment for the completion of the current issue.
- /query: Comments the users multiplier and address
- /multiplier: Set the bounty payout multiplier for a specific contributor, and provide the reason for why. 
  example usage: "/wallet @user 0.5 'Multiplier reason'"
- /allow: Set access control. (Admin Only)
- /wallet: <WALLET_ADDRESS | ENS_NAME>: Register the hunter's wallet address. 
  ex1: /wallet 0x0000000000000000000000000000000000000000
  ex2: /wallet vitalik.eth

@ishaan-jaff

@Keyrxng
Copy link
Contributor

Keyrxng commented Oct 24, 2023

I believe we can make this easier

I appreciate you taking the time Ishaan, and while I'm but a lowly grunt, I do think that it's more than what we need at the moment although if there is need for more than a couple of models then it may be considered at that point.

For me personally, I'll likely make use of it in personal projects so again, appreciate the shout.


I'm working on improving tokenization before the call on our end as Mistral's way is unique and not provided by TikToken by default. It was over estimating by near double in most cases or under estimating by half in the others.

@0x4007
Copy link
Member Author

0x4007 commented Oct 24, 2023

Agreed with @Keyrxng but thanks for letting us know about your product. I'm also curious to know how you found this issue @ishaan-jaff

I'm working on improving tokenization before the call on our end

Not sure if you're using the code I shared in the other thread under github-agents, but that is specifically for gpt tokenization. Different models I guess have different encoders.

@Keyrxng
Copy link
Contributor

Keyrxng commented Oct 24, 2023

I am yes or at least drew from that initially

I'm hoping I can just string the entire convo as exampled by perp and mistral docs using the special characters it's been trained with then it's just a case of either

    1. Extend an encoder including the new special chars
    1. Run with the encoder that returns as close to the token count returned by mistral itself

@0x4007
Copy link
Member Author

0x4007 commented Oct 24, 2023

I presume that all these commercial models have solutions for token counting, like OpenAI's tiktoken.

@Keyrxng
Copy link
Contributor

Keyrxng commented Oct 24, 2023

Well, yeah it tends to vary from model to model depending on how that model was trained, what special characters were used etc

For instance in the context of Mistral instruct:

Chat template
The template used to build a prompt for the Instruct model is defined as follows:

<s>[INST] Instruction [/INST] Model answer</s>[INST] Follow-up instruction [/INST]

Note that <s> and </s> are special tokens for beginning of string (BOS) and end of string (EOS) while [INST] and [/INST] are regular strings.

NOTE
This format must be strictly respected, otherwise the model will generate sub-optimal outputs.

As reference, here is the format used to tokenize instructions during fine-tuning:

[START_SYMBOL_ID] + 
tok("[INST]") + tok(USER_MESSAGE_1) + tok("[/INST]") +
tok(BOT_MESSAGE_1) + [END_SYMBOL_ID] +
…
tok("[INST]") + tok(USER_MESSAGE_N) + tok("[/INST]") +
tok(BOT_MESSAGE_N) + [END_SYMBOL_ID]

NOTE
The function tok should never generate the EOS token, however FastChat (used in vLLM) sends the full prompt as a string which might lead to incorrect tokenization of the EOS token and prompt injection. Users are encouraged to send tokens instead as described above.

@Keyrxng
Copy link
Contributor

Keyrxng commented Oct 24, 2023

The above was taken from the Mistral docs whereas the example below is from perplexity and there is clear differences between the two. I'm inclined to believe the Mistral docs above the Perp docs but still leaves me wondering slightly

The system message is prepended to the first user message:


<bos>[INST] <<SYS>>
System prompt
<</SYS>>

Instruction [/INST]
mistral-7b-instruct
Example chat:


[
  {
    "role": "user",
    "content": "Instruction"
  },
  {
    "role": "assistant",
    "content": "Model answer"
  },
  {
    "role": "user",
    "content": "Follow-up instruction"
  }
]
The tokenized chat:


<bos>[INST] Instruction [/INST]Model answer<eos> [INST] Follow-up instruction [/INST]

@Keyrxng
Copy link
Contributor

Keyrxng commented Oct 24, 2023

reading your comment again I may have misunderstood at first.

Perp uses the says API structure as OpenAI so it returns the tokens used for input, output and both but it's after the fact obviously.

some shit QA:

@Keyrxng
Copy link
Contributor

Keyrxng commented Oct 25, 2023

So the underlying isn't Tiktoken it's google SentencePieceProcessor, tried to get something close with TikToken but no joy. I've had to get the js wrapper for SPP but the prompt tokenization is just about spot on

some more shit QA:

@Keyrxng Keyrxng linked a pull request Oct 25, 2023 that will close this issue
@0x4007 0x4007 changed the title /ask using perplexity api /research using perplexity api Jan 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants