Skip to content

Seamlessly integrate with top LLM APIs for speedy, robust, and scalable querying. Ideal for developers needing quick, reliable AI-powered responses.

License

Notifications You must be signed in to change notification settings

blefo/FastInference

Repository files navigation

⚡FastInference - The Ultra-Fast LLM Querying Manager (OpenAi, HuggingFace, Ollama, ...)

Query any LLM API and get the responses very fast with a highly robust and distributed library.
All the LLMs providers can be used with FastInference [OpenAI, Huggingface, VertexAI, TogetherAI, Azure, etc.]

Features

  • High Performance: Get high inference speed thanks to intelligent asynchronous and distributed querying.
  • Robust Error Handling: Advanced mechanisms to handle exceptions, ensuring robust querying.
  • Ease of Use: Simplified API designed for working with all the LLM providers: easy and fast.
  • Scalability: Optimized for large datasets and high concurrency.

The workflow

Diagram of the workflow

Usage

pip install fastinference-llm
from fastinference import FastInference

prompt = """
            You will be provided with a tweet, and your task is to classify its sentiment as positive, neutral, or negative.
            
            Tweet: {tweet_content}
        """

api_key = "you-api-key"
model_name = "modelprovider/model_name"

results = FastInference(file_path="your-dataset-file-path", 
                        main_column="your-main-feature", 
                        prompt=prompt, 
                        api_key=api_key,
                        model_name=model_name, 
                        only_response=True).run()
print(results)

The Parameters

Here are the parameters that are not optional for initializing the FastInference object.

  • file_path (string): path to your dataset (csv, xlsx, json, parquet)
  • main_column (string): name of the main column (explained below in detail)
  • prompt (string): the prompt with the variable in it (explained below in detail)
  • api_key (string): your API key
  • model_name (string): has the format provider/model_name (for example "huggingface/meta-llama/Meta-Llama-3-70B")
  • only_response (bool): if True, you get a list containing the response of the LLM otherwise you get the full object normalized following the OpenAI API

The Prompt

One of the parameter of the FastInference library is a prompt.The prompt must be in a string format. It contains between curly brackets the column's name from your dataset where you want the variable to be in the prompt.

Example Usage

To understand how to use the prompt parameter in the FastInference library, we'll provide an example based on a tweet sentiment classification task. Consider a dataset with the following structure:

tweet_content related_entities
"Just had the best day ever at the NeurIPS Conference!" "NeurIPS"
"Traffic was terrible this morning in Paris." "Paris"
"Looking forward to the new Star Wars movie!" "Star Wars"

One of the parameters of the FastInference library is a prompt. This must be formatted as a string. It contains, within curly brackets, the names of the columns from your dataset that you want to include in the prompt.

Here's how you could set up your prompt for classifying the sentiment of tweets based on their content and related entities:

prompt = """
          You will be provided with a tweet, and your task is to classify its sentiment as positive, neutral, or negative.
          You must consider the related identified entities in order to make a good decision.
          
          Tweet: {tweet_content}
          Related Entities: {related_entities}
          """

The main_column Parameter

The parameter main_column is the parameter that is considered as the most important information for inference. It is a string containing the name of the most important column in your data. It does not influence the LLM in inference since the prompt does not create hierarchical relationships between data.

The main column has no influence on LLM inference.

Output format

If only_response is True, it gives back a list with items created by the library, and these items are strings.

Here is the structure of the return data if only_response=True:

["response 1", "response 2", ..., "response n"]

But if only_response is False, it gives back a list of Datablock items. Each Datablock item has these parts: content (str), metadata (dict), content_with_prompt (Prompt object), and response (ModelResponse, which is part of the OpenAI API). You can easily get the words generated by the language model by picking from the "choices" attribute.

Here is the structure of the return data if only_response=False:

[
        Datablock(content: str, content_with_prompt: PromptTemplate, metadata: dict, response: ModelResponse),
        ...
        Datablock(content: str, content_with_prompt: PromptTemplate, metadata: dict, response: ModelResponse)
]

The only_response=False is by default and advised. The Datablock item keeps track of data correctly after the distribution steps. It makes sure the data stays reliable and concistent throughout the process.

Supported Providers (Docs)

The FastInference is based on the open-source LiteLLM library. All the supported LLMs by LiteLLM are also by FastInference.

Provider Completion
openai
azure
aws - sagemaker
aws - bedrock
google - vertex_ai [Gemini]
google - palm
google AI Studio - gemini
mistral ai api
cloudflare AI Workers
cohere
anthropic
huggingface
replicate
together_ai
openrouter
ai21
baseten
vllm
nlp_cloud
aleph alpha
petals
ollama
deepinfra
perplexity-ai
Groq AI
anyscale
IBM - watsonx.ai
voyage ai
xinference [Xorbits Inference]

Contributing

To contribute: Clone the repo locally -> Make a change -> Submit a PR with the change.

Here's how to modify the repo locally: Clone the repo

git clone https://github.com/blefo/FastInference.git

Make your changes then Submit a PR! 🚀 push your fork to your GitHub repo and submit a PR from there

  • Add new method for data loading
  • Make the API KEY and model's information directly loaded in the os variables
  • Optimize the DataBlock Structure
  • Leverage the LiteLLM's feature for rotating APIs and keys in order to avoid the exceptions

About

Seamlessly integrate with top LLM APIs for speedy, robust, and scalable querying. Ideal for developers needing quick, reliable AI-powered responses.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages