Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: --context directive in %%ai cell magics via command line arguments to include local directory file contents and file names and URL names and plain text contents #773

Open
jaanli opened this issue May 5, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@jaanli
Copy link

jaanli commented May 5, 2024

Problem

I am always copy and pasting context for large language models to experience less verbal hallucinations, and to ground them using techniques such as in-context learning (appending positively and negatively labeled examples to prompts).

This is similar to prompt optimization methods such as those implemented in DSPy (https://github.com/stanfordnlp/dspy).

I currently use this bash script that claude.ai wrote in order to copy and paste the contents of the current directory I need help with for a variety of software engineering, machine learning, writing, research tasks for non-profit and teaching work:

https://gist.github.com/jaanli/5def01b7bd674efd6d9008cf1125986d

Usage of this script:

  1. Copy and paste this into /usr/local/bin/copy.sh
  2. Make it executable and run copy.sh
  3. Paste the contents into claude.ai, chat.openai.com or another LLM
  4. Write the prompt

Proposed Solution

Add a directive or argument called --context or something like that that enables reading of a JSON object using this type of syntax:

ray job submit --address="http://<head-node-ip>:8265" --runtime-env-json='{"working_dir": "/data/my_files", "pip": ["emoji"]}' -- python my_ray_script.py

(Example from https://docs.ray.io/en/latest/ray-core/handling-dependencies.html#specifying-a-runtime-environment-per-job)

The JSON object might support keys like working_dir in a similar manner, to enable the user to pass as context a Gitignore-style list of things to include/exclude from being copied as context to the LLM when using the %%ai cell magic.

The JSON object might also support keys like url in a similar manner: then the contents of documentation pages or patterns that are outside the pre-training data for LLMs (or impossible to access via the web scale datasets due to federal laws like HIPAA or EU laws like GDPR).

Additional context

I'm happy to help prototype this and have some spare cycles for open source development. This feature would accelerate my work in health equity (https://onefact.github.io/synthetic-healthcare-data/ & https://jaanli.github.io/american-community-survey/new-york-area/income-by-race & https://jaanli.github.io/new-york-real-estate/) and ability to teach courses where I developm materials like this: https://colab.research.google.com/github/onefact/datathinking.org-codespace/blob/main/notebooks/princeton-university/week-1-visualizing-33-million-phone-calls-in-new-york-city.ipynb (these are sometimes used as advertising by for-profit companies, e.g. this dataset was reused to advertise motherduck here: https://motherduck.com/blog/introducing-column-explorer/).

Any next steps to assess whether such an argument to pass a JSON object with local directory file contents and file names, and URL names and plain text contents might be feasible?

(This could then be extended to handle URLs that have PDF file type, etc with standard python tools!)

@jaanli jaanli added the enhancement New feature or request label May 5, 2024
@jaanli
Copy link
Author

jaanli commented May 5, 2024

@JasonWeill
Copy link
Collaborator

See also #434, which concerns making "learn" and "ask" functionality available in the magic commands.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants