Skip to content

llm-prepare converts complex project directory structures and files into a single flat or set of flat files facilitating processing using In-Context Learning (ICL) with AI models such as ChatGPT, Claude, Gemini, Mistral, or ..?

License

Notifications You must be signed in to change notification settings

samestrin/llm-prepare

Repository files navigation

llm-prepare

Star on GitHub Fork on GitHub Watch on GitHub

Version 1.0.13 License: MIT Built with Node.js

llm-prepare converts complex project directory structures and files into a single flat or set of flat files facilitating processing using In-Context Learning (ICL) with AI models such as ChatGPT, Claude, Gemini, Mistral, or ..?

This Node.js tool recursively scans a project directory based on provided arguments (at least a directory and file inclusion pattern). Then, it constructs a simplified layout view that includes all directories and file matches. The tool then combines the layout view with the aggregated text file content of the entire project. The aggregated file content is stripped of comments and unnecessary whitespace by default. Output compression is also supported to reduce token use, and llm-prepare can handle large projects by chunking the output. Example prompts are included.

In-Context Learning (ICL)

"While finetuning with full datasets is still a powerful option if the data vastly exceeds the context length, our results suggest that long-context ICL is an effective alternative– trading finetuning-time cost for increased inference-time compute. As the effectiveness and effiency of using very long model context lengths continues to increase, we believe long-context ICL will be a powerful tool for many tasks."

In-Context Learning (ICL) allows a Large Language Model (LLM) to perform tasks by interpreting the context provided within the prompt without additional training or fine-tuning. This approach differs significantly from previous methods where models were explicitly trained on a specific task using vast datasets. Instead, ICL leverages the model's pre-trained knowledge base—a comprehensive understanding accumulated during its initial extensive training phase.

As the token size—or the amount of data that an LLM can process and generate in a single instance—has dramatically increased, the value of ICL has become even more significant. This increase in token size allows LLMs to handle longer and more complex inputs and outputs, which enhances their ability to understand and generate sophisticated text.

Features

  • Layout View: Provides an ASCII layout view of your project.
  • Directory Traversal: Recursively scan through the project directory.
  • Custom File Filtering: Include files based on specified patterns.
  • Ignore Support: Automatically respects .ignore files to exclude specific files or directories.
  • Output Consolidation: Generates a single flat file consolidated view of file contents and directory structure.
  • Multifile Output: Generates multiple flat files from a consolidated view of file contents and directory structure based on a provided chunk size.
  • Optionally Remove Layout View: Optionally remove the layout view from the output.
  • Optionally Include Comments: Optionally include comments in the output.

Example Prompts

  • Code Review: Interactive code review with a simulated senior software engineer.
  • Generate MySQL Create Table: Generate a MySQL CREATE TABLE statement based on your provided CSV content.
  • Question and Answer: Interactive question and answer session powered by your project code.
  • Readme Generation: A simulated senior technical writer generates a README.md based on your project code.
  • Simple Add Comments: A set of simple prompts that generate comments based on your project code (C#, Javascript, PHP, Python, Ruby, Rust, and TypeScript).
  • Technical Document Generation: A simulated senior technical writer generates technical documentation based on your project code.
  • Test Generation: Interactive test generation with a simulated senior software engineer and simulated QA.
Plus many more (including new CSV oriented prompts). All example prompts have been tested with ChatGPT GPT-4.

Dependencies

  • Node.js: The script runs in a Node.js environment.
  • fs-extra: An extension of the standard Node.js fs module, providing additional methods and promise support.
  • ignore: Used to handle .ignore files similar to .gitignore.
  • istextorbinary: Determines whether a given file contains text or binary data.
  • open: Opens URLs in your default browser.
  • yargs: Helps in building interactive command line tools, by parsing arguments and generating an elegant user interface.
  • yargs/helpers: Provides utility methods for yargs.

Installing Node.js

Before installing, ensure you have Node.js and npm (Node Package Manager) installed on your system. You can download and install Node.js from Node.js official website.

Installing llm-prepare

To install and use llm-prepare, follow these steps:

Clone the Repository: Begin by cloning the repository containing the llm-prepare to your local machine.

git clone https://github.com/samestrin/llm-prepare/

Navigate to your project's root directory and run:

npm install

To make llm-prepare available from any location on your system, you need to install it globally. You can do this using npm.

Run the following command in your project directory:

npm link

This will create a global symlink to your script. Now, you can run the script using llm-prepare from anywhere in your terminal.

Platform-Specific Installation Instructions

macOS and Linux

The provided installation steps should work as-is for both macOS and Linux platforms.

Windows

For Windows, ensure that Node.js is added to your PATH during the installation. The npm link command should also work in Windows PowerShell or Command Prompt, allowing you to run the script globally.

Usage

To run the script, you need to provide two mandatory arguments: the path to the project directory (--path-name) and the pattern of files to include (--file-pattern).

Example:

llm-prepare --path-name "/path/to/project" --file-pattern "*.js"

This will process all JavaScript files in the specified project directory, respecting any .ignore files, and output the consolidated content and structure to your console.

Options

      --help                 Show help                                 [boolean]
  -p, --path-name            Path to the project directory   [string] [required]
  -f, --file-pattern         Pattern of files to include, e.g., '\.js$' or '*'
                             for all files                   [string] [required]
  -o, --output-filename      Output filename                            [string]
  -i, --include-comments     Include comments? (Default: false)        [boolean]
  -c, --compress             Compress? (Default: false)                [boolean]
      --chunk-size           Maximum size (in kilobytes) of each file   [number]
  -s, --suppress-layout      Suppress layout in output (Default: false)[boolean]
      --default-ignore       Use a custom default ignore file           [string]
      --show-default-ignore  Show default ignore file                   [string]
      --show-prompts         Show example prompts in your browser      [boolean]
  -v, --version              Display the version number                [boolean]

Contribute

Contributions to this project are welcome. Please fork the repository and submit a pull request with your changes or improvements.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Share

Twitter Facebook LinkedIn

About

llm-prepare converts complex project directory structures and files into a single flat or set of flat files facilitating processing using In-Context Learning (ICL) with AI models such as ChatGPT, Claude, Gemini, Mistral, or ..?

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published