Skip to content

Create the prompts you need to write your Novel using AI

License

Notifications You must be signed in to change notification settings

danielsobrado/ainovelprompter

Repository files navigation

AI Novel Prompter

AI Novel Prompter can generate writing prompts for novels based on user-specified characteristics.

Features

Technologies Used

  • Frontend:
    • React
    • TypeScript
    • Axios
    • React Router
    • React Toastify
  • Backend:
    • Go
    • Gin Web Framework
    • GORM (Go ORM)
    • PostgreSQL

Prerequisites

Before running the application, make sure you have the following installed:

  • Node.js (v18 or higher)
  • Go (v1.18 or higher)
  • PostgreSQL
  • Docker
  • Docker Compose

Getting Started

  1. Clone the repository:
    git clone https://github.com/danielsobrado/ainovelprompter.git
    
  2. Navigate to the project directory:
    cd ainovelprompter
    
  3. Set up the backend:
  • Navigate to the server directory:

    cd server
    
  • Install the Go dependencies:

    go mod download
    
  • Update the config.yaml file with your database configuration.

  • Run the database migrations:

    go run cmd/main.go migrate
    
  • Start the backend server:

    go run cmd/main.go
    
  1. Set up the frontend:
  • Navigate to the client directory:

    cd ../client
    
  • Install the frontend dependencies:

    npm install
    
  • Start the frontend development server:

    npm start
    
  1. Open your web browser and visit http://localhost:3000 to access the application.

Getting Started (Docker)

  1. Clone the repository:
git clone https://github.com/danielsobrado/ainovelprompter.git
  1. Navigate to the project directory:
cd ainovelprompter
  1. Update the docker-compose.yml file with your database configuration.

  2. Start the application using Docker Compose:

docker-compose up -d
  1. Open your web browser and visit http://localhost:3000 to access the application.

Configuration

  • Backend configuration can be modified in the server/config.yaml file.
  • Frontend configuration can be modified in the client/src/config.ts file.

Build

To build the frontend for production, run the following command in the client directory:

npm run build

The production-ready files will be generated in the client/build directory.

Installation and Management Guide for PostgreSQL on WSL

This small guide provides instructions on how to install PostgreSQL on the Windows Subsystem for Linux (WSL), along with steps to manage user permissions and troubleshoot common issues.


Prerequisites

  • Windows 10 or higher with WSL enabled. (Or just Ubuntu)
  • Basic familiarity with Linux command line and SQL.

Installation

  1. Open WSL Terminal: Launch your WSL distribution (Ubuntu recommended).

  2. Update Packages:

    sudo apt update
  3. Install PostgreSQL:

    sudo apt install postgresql postgresql-contrib
  4. Check Installation:

    psql --version
  5. Set PostgreSQL User Password:

    sudo passwd postgres

Database Operations

  1. Create Database:

    createdb mydb
  2. Access Database:

    psql mydb
  3. Import Tables from SQL File:

    psql -U postgres -q mydb < /path/to/file.sql
  4. List Databases and Tables:

    \l  # List databases
    \dt # List tables in the current database
  5. Switch Database:

    \c dbname

User Management

  1. Create New User:

    CREATE USER your_db_user WITH PASSWORD 'your_db_password';
  2. Grant Privileges:

    ALTER USER your_db_user CREATEDB;

Troubleshooting

  1. Role Does Not Exist Error: Switch to the 'postgres' user:

    sudo -i -u postgres
    createdb your_db_name
  2. Permission Denied to Create Extension: Login as 'postgres' and execute:

    CREATE EXTENSION IF NOT EXISTS pg_trgm;
  3. Unknown User Error: Ensure you are using a recognized system user or correctly refer to a PostgreSQL user within the SQL environment, not via sudo.


Generating Custom Training Data to Fine-Tune a Language Model (Manual Steps)

To generate custom training data for fine-tuning a language model to emulate the writing style of George MacDonald, the process begins by obtaining the full text of one of his novels, "The Princess and the Goblin," from Project Gutenberg. The text is then broken down into individual story beats or key moments using a prompt that instructs the AI to generate a JSON object for each beat, capturing the author, emotional tone, type of writing, and the actual text excerpt.

Next, GPT-4 is used to rewrite each of these story beats in its own words, generating a parallel set of JSON data with unique identifiers linking each rewritten beat to its original counterpart. To simplify the data and make it more useful for training, the wide variety of emotional tones is mapped to a smaller set of core tones using a Python function. The two JSON files (original and rewritten beats) are then used to generate training prompts, where the model is asked to rephrase the GPT-4 generated text in the style of the original author. Finally, these prompts and their target outputs are formatted into JSONL and JSON files, ready to be used for fine-tuning the language model to capture MacDonald's distinctive writing style.


Generating Custom Training Data to Fine-Tune a Language Model (Automated)

In the previous example, the process of generating paraphrased text using a language model involved some manual tasks. The user had to manually provide the input text, run the script, and then review the generated output to ensure its quality. If the output did not meet the desired criteria, the user would need to manually retry the generation process with different parameters or make adjustments to the input text.

However, with the updated version of the process_text_file function, the entire process has been fully automated. The function takes care of reading the input text file, splitting it into paragraphs, and automatically sending each paragraph to the language model for paraphrasing. It incorporates various checks and retry mechanisms to handle cases where the generated output does not meet the specified criteria, such as containing unwanted phrases, being too short or too long, or consisting of multiple paragraphs.

The automation process includes several key features:

  1. Resuming from the last processed paragraph: If the script is interrupted or needs to be run multiple times, it automatically checks the output file and resumes processing from the last successfully paraphrased paragraph. This ensures that progress is not lost and the script can pick up where it left off.

  2. Retry mechanism with random seed and temperature: If a generated paraphrase fails to meet the specified criteria, the script automatically retries the generation process up to a specified number of times. With each retry, it randomly changes the seed and temperature values to introduce variation in the generated responses, increasing the chances of obtaining a satisfactory output.

  3. Progress saving: The script saves the progress to the output file every specified number of paragraphs (e.g., every 500 paragraphs). This safeguards against data loss in case of any interruptions or errors during the processing of a large text file.

  4. Detailed logging and summary: The script provides detailed logging information, including the input paragraph, generated output, retry attempts, and reasons for failure. It also generates a summary at the end, displaying the total number of paragraphs, successfully paraphrased paragraphs, skipped paragraphs, and the total number of retries.


Generating Custom Training Data to Fine-Tune a Language Model with Local LLM and LM Studio using ORPO

To generate ORPO custom training data for fine-tuning a language model to emulate the writing style of George MacDonald.

The input data should be in JSONL format, with each line containing a JSON object that includes the prompt and chosen response. (From the previous fine tuning) To use the script, you need to set up the OpenAI client with your API key and specify the input and output file paths. Running the script will process the JSONL file and generate a CSV file with columns for the prompt, chosen response, and a generated rejected response. The script saves progress every 100 lines and can resume from where it left off if interrupted. Upon completion, it provides a summary of the total lines processed, written lines, skipped lines, and retry details.


Fine-Tuning lessons

  • Dataset Quality Matters: 95% of outcomes depend on dataset quality. A clean dataset is essential since even a little bad data can hurt the model.

  • Manual Data Review: Cleaning and evaluating the dataset can greatly improve the model. This is a time-consuming but necessary step because no amount of parameter adjusting can fix a defective dataset.

  • Training parameters should not improve but prevent model degradation. In robust datasets, the goal should be to avoid negative repercussions while directing the model. There is no optimal learning rate.

  • Model Scale and Hardware Limitations: Larger models (33b parameters) may enable better fine-tuning but require at least 48GB VRAM, making them impractical for majority of home setups.

  • Gradient Accumulation and Batch Size: Gradient accumulation helps reduce overfitting by enhancing generalisation across different datasets, but it may lower quality after a few batches.

  • The size of the dataset is more important for fine-tuning a base model than a well-tuned model. Overloading a well-tuned model with excessive data might degrade its previous fine-tuning.

  • An ideal learning rate schedule starts with a warmup phase, holds steady for an epoch, and then gradually decreases using a cosine schedule.

  • Model Rank and Generalisation: The amount of trainable parameters affects the model's detail and generalisation. Lower-rank models generalise better but lose detail.

  • LoRA's Applicability: Parameter-Efficient Fine-Tuning (PEFT) is applicable to large language models (LLMs) and systems like Stable Diffusion (SD), demonstrating its versatility.


Finetuning Llama 3 issues as of May 2024

The Unsloth community has helped resolve several issues with finetuning Llama3. Here are some key points to keep in mind:

  1. Double BOS tokens: Double BOS tokens during finetuning can break things. Unsloth automatically fixes this issue.

  2. GGUF conversion: GGUF conversion is broken. Be careful of double BOS and use CPU instead of GPU for conversion. Unsloth has built-in automatic GGUF conversions.

  3. Buggy base weights: Some of Llama 3's base (not instruct) weights are "buggy" (untrained): <|reserved_special_token_{0->250}|> <|eot_id|> <|start_header_id|> <|end_header_id|>. This can cause NaNs and buggy results. Unsloth automatically fixes this.

  4. System prompt: According to the Unsloth community, adding a system prompt makes finetuning of the Instruct version (and possibly the base version) much better.

  5. Quantization issues: Quantization issues are common. See this comparison which shows that you can get good performance with Llama3, but using the wrong quantization can hurt performance. For finetuning, use bitsandbytes nf4 to boost accuracy. For GGUF, use the I versions as much as possible.

  6. Long context models: Long context models are poorly trained. They simply extend the RoPE theta, sometimes without any training, and then train on a weird concatenated dataset to make it a long dataset. This approach does not work well. A smooth, continuous long context scaling would have been much better if scaling from 8K to 1M context length.

To resolve some of these issues, use Unsloth for finetuning Llama3.


Evaluation Metrics

When fine-tuning a language model for paraphrasing in an author's style, it's important to evaluate the quality and effectiveness of the generated paraphrases.

The following evaluation metrics can be used to assess the model's performance:

  1. BLEU (Bilingual Evaluation Understudy):

    • BLEU measures the n-gram overlap between the generated paraphrase and the reference text, providing a score between 0 and 1.
    • To calculate BLEU scores, you can use the sacrebleu library in Python.
    • Example usage: from sacrebleu import corpus_bleu; bleu_score = corpus_bleu(generated_paraphrases, [original_paragraphs])
  2. ROUGE (Recall-Oriented Understudy for Gisting Evaluation):

    • ROUGE measures the overlap of n-grams between the generated paraphrase and the reference text, focusing on recall.
    • To calculate ROUGE scores, you can use the rouge library in Python.
    • Example usage: from rouge import Rouge; rouge = Rouge(); scores = rouge.get_scores(generated_paraphrases, original_paragraphs)
  3. Perplexity:

    • Perplexity quantifies the uncertainty or confusion of the model when generating text.
    • To calculate perplexity, you can use the fine-tuned language model itself.
    • Example usage: perplexity = model.perplexity(generated_paraphrases)
  4. Stylometric Measures:

    • Stylometric measures capture the writing style characteristics of the target author.
    • To extract stylometric features, you can use the stylometry library in Python.
    • Example usage: from stylometry import extract_features; features = extract_features(generated_paraphrases)

Integration with Axolotl

To integrate these evaluation metrics into your Axolotl pipeline, follow these steps:

  1. Prepare your training data by creating a dataset of paragraphs from the target author's works and splitting it into training and validation sets.

  2. Fine-tune your language model using the training set, following the approach discussed earlier.

  3. Generate paraphrases for the paragraphs in the validation set using the fine-tuned model.

  4. Implement the evaluation metrics using the respective libraries (sacrebleu, rouge, stylometry) and calculate the scores for each generated paraphrase.

  5. Perform human evaluation by collecting ratings and feedback from human evaluators.

  6. Analyze the evaluation results to assess the quality and style of the generated paraphrases and make informed decisions to improve your fine-tuning process.

Here's an example of how you can integrate these metrics into your pipeline:

from sacrebleu import corpus_bleu
from rouge import Rouge
from stylometry import extract_features

# Fine-tune the model using the training set
fine_tuned_model = train_model(training_data)

# Generate paraphrases for the validation set
generated_paraphrases = generate_paraphrases(fine_tuned_model, validation_data)

# Calculate evaluation metrics
bleu_score = corpus_bleu(generated_paraphrases, [original_paragraphs])
rouge = Rouge()
rouge_scores = rouge.get_scores(generated_paraphrases, original_paragraphs)
perplexity = fine_tuned_model.perplexity(generated_paraphrases)
stylometric_features = extract_features(generated_paraphrases)

# Perform human evaluation
human_scores = collect_human_evaluations(generated_paraphrases)

# Analyze and interpret the results
analyze_results(bleu_score, rouge_scores, perplexity, stylometric_features, human_scores)

Remember to install the necessary libraries (sacrebleu, rouge, stylometry) and adapt the code to fit your implementation in Axolotl or similar.


AI Writing Model Comparison

In this experiment, I explored the capabilities and differences between various AI models in generating a 1500-word text based on a detailed prompt. I tested models from https://chat.lmsys.org/, ChatGPT4, Claude 3 Opus, and some local models in LM Studio. Each model generated the text three times to observe variability in their outputs. I also created a separate prompt for evaluating the writing of the first iteration from each model and asked ChatGPT 4 and Claude Opus 3 to provide feedback.

Through this process, I observed that some models exhibit higher variability between executions, while others tend to use similar wording. There were also significant differences in the number of words generated and the amount of dialogue, descriptions, and paragraphs produced by each model. The evaluation feedback revealed that ChatGPT suggests a more "refined" prose, while Claude recommends less purple prose. Based on these findings, I compiled a list of takeaways to incorporate into the next prompt, focusing on precision, varied sentence structures, strong verbs, unique twists on fantasy motifs, consistent tone, distinct narrator voice, and engaging pacing. Another technique to consider is asking for feedback and then rewriting the text based on that feedback.

I'm open to collaborating with others to further fine-tune prompts for each model and explore their capabilities in creative writing tasks.

Prompting Small LLMs

  • Direct Instructions:
    • Use clean, specific, and direct commands.
    • Avoid verbosity and unnecessary phrases.
  • Adjective Management:
    • Be cautious with adjectives; they may influence the model's response inappropriately.
  • Delimiters and Markdown:
    • Use backticks, brackets, or markdown to separate distinct parts of the text.
    • Markdown helps structure and segregate sections effectively.
  • Structured Formats:
    • Utilize JSON, markdown, HTML, etc., for input and output.
    • Constrain output using JSON schema when necessary.
  • Few-shot Examples:
    • Provide few-shot examples from various niches to avoid overfitting.
    • Use these examples to "teach" the model steps in a process.
  • Chain-of-Thought:
    • Implement chain-of-thought prompts to improve reasoning and procedural understanding.
    • Break down tasks into steps and guide the model through them.
  • Description Before Completion:
    • Prompt the model to describe entities before answering.
    • Ensure that description doesn’t bleed into completion unintentionally.
  • Context Management:
    • Provide essential context only, avoid unstructured paragraph dumps.
    • Direct the model towards the desired answer with sufficient but concise context.
  • Testing and Verification:
    • Test prompts multiple times to catch unexpected outputs.
    • Use completion ranking for relevance, clarity, and coherence.
  • Use Stories:
    • Control output with storytelling techniques.
    • For example, write a narrative that includes the desired output format.
  • GBNF Grammars:
    • Explore GBNF grammars to constrain and control model output.
  • Read and Refine:
    • Review and refine generated prompts to remove unnecessary phrases and ensure clarity.

Prompting Llama 3 8b

Models have inherent formatting biases. Some models prefer hyphens for lists, others asterisks. When using these models, it's helpful to mirror their preferences for consistent outputs.

Key Points for Llama 3 Prompting:

  • Formatting Tendencies:

    • Llama 3 prefers lists with bolded headings and asterisks.

    • Example: Bolded Title Case Heading

      • List items with asterisks after two newlines

      • List items separated by one newline

      Next List

      • More list items

      • Etc...

  • Few-shot Examples:

    • Llama 3 follows both system prompts and few-shot examples.
    • It is flexible with prompting methods but may quote few-shot examples verbatim.
  • System Prompt Adherence:

    • Llama 3 responds well to system prompts with detailed instructions.
    • Combining system prompts and few-shot examples yields better results.
  • Context Window:

    • The current context window is small, limiting the use of extensive few-shot examples.
    • This may be addressed in future updates.
  • Censorship:

    • The instruct version has some censorship but is less restricted than previous versions.
  • Intelligence:

    • Performs well in zero-shot chain-of-thought reasoning.
    • Capable of understanding and adapting to varied inputs.
  • Consistency:

    • Generally consistent but may directly quote examples.
    • Performance can degrade with higher temperatures.

Usage Recommendations:

  • Lists and Formatting:

    • Use the preferred list format for better accuracy.
    • Explicitly instruct Llama 3 on desired output formats if different from its default.
  • Chat Settings:

    • Suitable for tasks requiring intelligence and instruction following.
    • Limited by context window for large tasks.
  • Pipeline Settings:

    • Effective for GPT-4 style pipelines using system prompts.
    • Context window limitations restrict some tasks.

Llama 3 is flexible and intelligent but has context and quoting limitations. Adjust prompting methods accordingly.

Contributing

All comments are welcome. Open an issue or send a pull request if you find any bugs or have recommendations for improvement.

License

This project is licensed under: Attribution-NonCommercial-NoDerivatives (BY-NC-ND) license See: https://creativecommons.org/licenses/by-nc-nd/4.0/deed.en