Skip to content
Alfred Nutile edited this page May 31, 2023 · 16 revisions

Intro to LaraChain

image

You can watch an introductory video here. This project was inspired by https://docs.langchain.com/docs/. As a long-time PHP/Laravel developer, I was concerned about the place of PHP and Laravel in this new age of AI, LLM, and Python.

After reading much of the documentation for LangChain, I realized that Laravel, as a framework, already has a solid foundation to create this type of system for PHP developers.

Looking at their opening statement:

LangChain is a framework for developing applications powered by language models. We believe that the most powerful and differentiated applications will not only call out to a language model via an api, but will also:

  • Be data-aware: connect a language model to other sources of data
  • Be agentic: Allow a language model to interact with its environment

As such, the LangChain framework is designed with the objective in mind to enable those types of applications.

I would like LaraChain to be a framework that grows into this as well. By creating a pluggable foundation for taking in data from different sources, passing them through various transformers, and then easily creating outbound resources, we can enable companies of any size to get the most out of their data and benefit from the LLM/AI breakthrough.

Tasks

https://github.com/users/alnutile/projects/2

Setting up

It is a pretty standard Laravel app + the need for Sail for Postgres only.

Using LaraChain

Just fork and merge as work gets added. It will strive to not interfere with your additions especially as the project matures.

Contributing

Make a PR, make sure all the tests and linting is passing.

Terminology

Model

For the most part this will refer to Eloquent models.

Team

Using Laravel JetStream as the foundation, LaraChain can allow application builders to segment users if needed or disable it if not. It also means numerous features are in place to quickly build out an interface.

Project

This is a Model that will be the top level for all imported data. For example, you can have a Project named "Historical Collection," then using Sources, you can import all the data and make it searchable under that project. When your user is on a different Project searching, they will not see that data.

Source

This will be what gets attached to a Project. A Project can have one or more Sources. A Source is a pluggable pattern. For example, and with all the top-level features, LaraChain will come with some and then install them as needed with Composer. Or you can add your custom ones. I will go into each one in its section, but here is an example:

Source: Web PDF

This Source, when used, will download a PDF file from the web. It will have some settings in the Create process, like URL or Auth info. Some Sources might not have a setup step – just add and go. You will see that more in the Transformers area.

A Source belongs to a Project.

Document

This is what a Source saves the data to – a Document. This will reference the Source and, therefore, the Project. In the above example with the Source WebFile, the PDF downloads a PDF and saves the filename to the Document only, not the content. More on that in Transformers.

Document Chunk

This is what we do when we break up the Source into smaller chunks. If we continue the above example, when the Source WebFile PDF downloads the PDF and the Transformer kicks in to parse the PDF, it will save the pages (or smaller chunks) into the document_chunks table, which will reference the PDF page and the Document.

Transformer

Now that we have a Source or Sources and have run them, what do we want to do with the data? This is where Transformers come in. Just like Sources, you can easily add them. Some might have settings, and some might simply let you add them, like the "Embedding Transformer."

image

In this WebPDF example, here's how they work:

PDF Parser

LaraChain can have numerous parsers like JSON, CSV, etc. This one will iterate over all the PDF pages and place the content into Document Chunks.

Embed Transformer

This will use the LLM of your choice. It will vectorize the data, which will be saved to the document_chunks.embedding field in the database with a token_count.

Token Count will come into play later to track cost and usage.

So now we have all the pages in document_chunks. How do we offer them back out to the user?

Thanks to Horizon, we can parse a lot of data in the background.

Outbound

This is how LaraChain can offer the data to the user or systems that would like to get the results. In this case, there is a "Playground" in the Project area itself. Right now, there is ONE pluggable working called ChatUI, which is what the Project page uses to integrate with, in this case, OpenAI Chat (but soon Google Chat etc.). However, the API Outbound will be available soon, and more options will be added.

Let's see how this works. The video shows it being used, by the way. So once you create an Outbound, you attach "Response Types" to it. These are how we do something with the incoming request. Let's list them out in this example:

image

API Outbound type

This lays the foundation for creating auth based api. You can learn more about it here As with the ChatUI Outbound, you can add any of the response types needed to filter and process the data.

ChatUI Outbound

This lays the foundation for creating a chat ui to have an interface to chat with an LLM you can see that in action here

Embed Question

We need to use the LLM to make embeddings from the question. This is saved to the Message Model (more on this shortly) and passed to the next Response Type.

Vector Search

This will take the data based on the above and search the database for related data. This is using the power (🦄) of vector databases, in this case, a PostgreSQL plugin. The results are then passed on to the next Response Type.

NOTE: The question from the user might be search-related or context-aware. One goal will be to learn how LangChain does this and then add > Response Types to help with this.

Trim Text

This is being worked on and is inspired by a similar package in Python. This will reduce the response from the vector search, so we have more room for Tokens (more on this shortly).

Combine Text

This will take all the results and combine them down to fit into a specified size, saving room for the LLM response (token limitations are a big part of this).

ResponseType

These are what Outbound uses to process the request. So if you create an Outbound of ChatUI (as seen above) or API, the request/question will go through these classes. See above for a list of them.

Scheduler Model

[COMING SOON] This will be a strategy for looking for new data.

Source Listeners Model

[COMING SOON] When a source comes in like a Webhook, you might want to trigger a Listener to react to this. For example, an API Listener would go get the results. A Listener is a Model that will relate to a Source since we have an API Source already.

Message Model

This is the "Memory" of our system to know what a User is asking. This will give us the ability to link Questions back to Projects. The user can then see the project and their questions and the answers. This is key for chatting with an LLM as well.

Source Events

This will be an Event triggered with every Source coming in so Listeners can react see "Source Listeners" abve

The Tech Stack

Laravel

This will be updated soon to Laravel 10. Right now I default to JetStream, and Inertia.

Horizon

Key here for helping us scale and process numerous PDF pages, stream to the UI results etc.

Postgres

For now Larachain is using this and the pgvector to make it possible to use Postgres for Vector searchs.

Sail (just a touch of it)

This is only here to run the Postgres with Vector plugin. I know Brew could work too but on my M1 I had some issues. ¯_(ツ)_/¯

Links