You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
SQL queries on graphs are very flexible and powerful. With large language models like OpenAI's Codex we could make this more accessible.
The idea is to create a prompt of the following format:
We have these tables:
- VERTICES with columns ID (long), X (number), Y (string), Z (timestamp)
- EDGE_ATTRIBUTES with columns SRC_ID (long), DST_ID (long), FOO (number), BAR (string), ...
- EDGES with columns SRC_ID (long), DST_ID (long), SRC_X (number), DST_X ..., EDGE_FOO (number), EDGE_BAR (string), ...
Write a query to find [WHAT THE USER WANTS]:
SELECT
The task is to experiment to find the best prompt. Please create 10 different use cases with made up tables and user queries. (Some queries can share the same tables, but make sure there is good variety.)
Imaginary tables are very flexible and easy to work with. But let's use a real dataset as one of the examples. That way we can try the generated queries for real.
For all test cases try the effects of the following:
Adding one or more exemplars. An exemplar helps the model understand the format. It can also be useful for detecting the end of the generated query.
Changing the set of tables. LynxKite has vertices, edges, and edge_attributes. But it never makes sense to use all three in a query. What if we only mention vertices and edges? Or if we only mention edge_attributes?
Try GPT-3 and Codex. You can drive these from code, so you can easily run a full experimental suite on them. Try a few examples manually on ChatGPT for comparison.
Experiment with the specific phrasing. Find the best option instead of We have these tables: and find the best format for specifying the schema. For GPT-3/ChatGPT I think an instruction ought to work well. (Like Write a query that.) For Codex I expect a comment would work better. (Like -- This query is going to.)
Check the effects of unrelated columns. Try adding 100 unrelated columns in your imaginary table.
Please submit the experimental suite in a PR. I don't want to clutter the main branch with this experimental stuff, but we can merge it to a research branch for posterity. Once we are past the prototyping phase, this code can become the basis of the real implementation.
Please write the code so that the generated queries are saved to a file and include this file in the PR too. You don't have to include the output for every variation, but do include what you want to show. The good stuff! 😊 (Make sure you don't include the API key though!)
Thank you!
The text was updated successfully, but these errors were encountered:
SQL queries on graphs are very flexible and powerful. With large language models like OpenAI's Codex we could make this more accessible.
The idea is to create a prompt of the following format:
The task is to experiment to find the best prompt. Please create 10 different use cases with made up tables and user queries. (Some queries can share the same tables, but make sure there is good variety.)
Imaginary tables are very flexible and easy to work with. But let's use a real dataset as one of the examples. That way we can try the generated queries for real.
For all test cases try the effects of the following:
vertices
,edges
, andedge_attributes
. But it never makes sense to use all three in a query. What if we only mentionvertices
andedges
? Or if we only mentionedge_attributes
?We have these tables:
and find the best format for specifying the schema. For GPT-3/ChatGPT I think an instruction ought to work well. (LikeWrite a query that
.) For Codex I expect a comment would work better. (Like-- This query is going to
.)Please submit the experimental suite in a PR. I don't want to clutter the main branch with this experimental stuff, but we can merge it to a research branch for posterity. Once we are past the prototyping phase, this code can become the basis of the real implementation.
Please write the code so that the generated queries are saved to a file and include this file in the PR too. You don't have to include the output for every variation, but do include what you want to show. The good stuff! 😊 (Make sure you don't include the API key though!)
Thank you!
The text was updated successfully, but these errors were encountered: