Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable conditional sending of schema to OpenAI in prompt preamble #586

Open
Tracked by #584
asimpson opened this issue Jun 9, 2023 · 1 comment
Open
Tracked by #584
Labels
datasource/ADX enhancement New feature or request

Comments

@asimpson
Copy link
Contributor

asimpson commented Jun 9, 2023

Currently the OpenAI integration (#577) does not send any details about the schema or data of the user in the prompt to OpenAI. We should explore sending the schema or at least database and table names along with the prompt which should result in more relevant KQL queries. This should be conditional and opt-out by default. The UX is unsolved here but maybe a checkbox in the header to include schema details works well enough?

Potential issues

  1. Cost
    The OpenAI API charges per 1k tokens sent (and more if the user uses gpt4). Including the schema or even just the table names in the prompt potentially introduces many more tokens than the user is anticipating. We need to avoid surprise charges from API use. We should at the very least warn the user about this possibility or at best estimate how many tokens will be sent before the request is actually made.

  2. Token limits
    There is a max of 4096 tokens for the API. Along with ☝️ we should estimate the amount of tokens before send and alert the user if they've hit the limit before sending.

@asimpson
Copy link
Contributor Author

OpenAI recommends gpt-3-encoder to count tokens like so:

const {encode} = require('gpt-3-encoder')

const string = process.argv[2];
console.log(string);
const encoded = encode(string)
console.log('# of tokens: ', encoded.length)

Compare results to using their tiktoken python module

import tiktoken
enc = tiktoken.get_encoding("cl100k_base")
e = enc.encode("hi there bob")
print(len(e))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasource/ADX enhancement New feature or request
Projects
Status: Feature Requests
Development

No branches or pull requests

1 participant