⚡FastInference - The Ultra-Fast LLM Querying Manager (OpenAi, HuggingFace, Ollama, ...)

Query any LLM API and get the responses very fast with a highly robust and distributed library.
All the LLMs providers can be used with FastInference [OpenAI, Huggingface, VertexAI, TogetherAI, Azure, etc.]

Features

High Performance: Get high inference speed thanks to intelligent asynchronous and distributed querying.
Robust Error Handling: Advanced mechanisms to handle exceptions, ensuring robust querying.
Ease of Use: Simplified API designed for working with all the LLM providers: easy and fast.
Scalability: Optimized for large datasets and high concurrency.

The workflow

Usage

pip install fastinference-llm

from fastinference import FastInference

prompt = """
            You will be provided with a tweet, and your task is to classify its sentiment as positive, neutral, or negative.
            
            Tweet: {tweet_content}
        """

api_key = "you-api-key"
model_name = "modelprovider/model_name"

results = FastInference(file_path="your-dataset-file-path", 
                        main_column="your-main-feature", 
                        prompt=prompt, 
                        api_key=api_key,
                        model_name=model_name, 
                        only_response=True).run()
print(results)

The Parameters

Here are the parameters that are not optional for initializing the FastInference object.

file_path (string): path to your dataset (csv, xlsx, json, parquet)
main_column (string): name of the main column (explained below in detail)
prompt (string): the prompt with the variable in it (explained below in detail)
api_key (string): your API key
model_name (string): has the format provider/model_name (for example "huggingface/meta-llama/Meta-Llama-3-70B")
only_response (bool): if True, you get a list containing the response of the LLM otherwise you get the full object normalized following the OpenAI API

The Prompt

One of the parameter of the FastInference library is a prompt.The prompt must be in a string format. It contains between curly brackets the column's name from your dataset where you want the variable to be in the prompt.

Example Usage

To understand how to use the prompt parameter in the FastInference library, we'll provide an example based on a tweet sentiment classification task. Consider a dataset with the following structure:

tweet_content	related_entities
"Just had the best day ever at the NeurIPS Conference!"	"NeurIPS"
"Traffic was terrible this morning in Paris."	"Paris"
"Looking forward to the new Star Wars movie!"	"Star Wars"

One of the parameters of the FastInference library is a prompt. This must be formatted as a string. It contains, within curly brackets, the names of the columns from your dataset that you want to include in the prompt.

Here's how you could set up your prompt for classifying the sentiment of tweets based on their content and related entities:

prompt = """
          You will be provided with a tweet, and your task is to classify its sentiment as positive, neutral, or negative.
          You must consider the related identified entities in order to make a good decision.
          
          Tweet: {tweet_content}
          Related Entities: {related_entities}
          """

The main_column Parameter

The parameter main_column is the parameter that is considered as the most important information for inference. It is a string containing the name of the most important column in your data. It does not influence the LLM in inference since the prompt does not create hierarchical relationships between data.

The main column has no influence on LLM inference.

Output format

If only_response is True, it gives back a list with items created by the library, and these items are strings.

Here is the structure of the return data if only_response=True:

["response 1", "response 2", ..., "response n"]

But if only_response is False, it gives back a list of Datablock items. Each Datablock item has these parts: content (str), metadata (dict), content_with_prompt (Prompt object), and response (ModelResponse, which is part of the OpenAI API). You can easily get the words generated by the language model by picking from the "choices" attribute.

Here is the structure of the return data if only_response=False:

[
        Datablock(content: str, content_with_prompt: PromptTemplate, metadata: dict, response: ModelResponse),
        ...
        Datablock(content: str, content_with_prompt: PromptTemplate, metadata: dict, response: ModelResponse)
]

The only_response=False is by default and advised. The Datablock item keeps track of data correctly after the distribution steps. It makes sure the data stays reliable and concistent throughout the process.

Supported Providers (Docs)

The FastInference is based on the open-source LiteLLM library. All the supported LLMs by LiteLLM are also by FastInference.

Provider	Completion
openai	✅
azure	✅
aws - sagemaker	✅
aws - bedrock	✅
google - vertex_ai [Gemini]	✅
google - palm	✅
google AI Studio - gemini	✅
mistral ai api	✅
cloudflare AI Workers	✅
cohere	✅
anthropic	✅
huggingface	✅
replicate	✅
together_ai	✅
openrouter	✅
ai21	✅
baseten	✅
vllm	✅
nlp_cloud	✅
aleph alpha	✅
petals	✅
ollama	✅
deepinfra	✅
perplexity-ai	✅
Groq AI	✅
anyscale	✅
IBM - watsonx.ai	✅
voyage ai
xinference [Xorbits Inference]

Contributing

To contribute: Clone the repo locally -> Make a change -> Submit a PR with the change.

Here's how to modify the repo locally: Clone the repo

git clone https://github.com/blefo/FastInference.git

Make your changes then Submit a PR! 🚀 push your fork to your GitHub repo and submit a PR from there

Add new method for data loading
Make the API KEY and model's information directly loaded in the os variables
Optimize the DataBlock Structure
Leverage the LiteLLM's feature for rotating APIs and keys in order to avoid the exceptions

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
.github/workflows		.github/workflows
.idea		.idea
fastinference		fastinference
.DS_Store		.DS_Store
LICENSE.txt		LICENSE.txt
README.md		README.md
detailed_workflow.png		detailed_workflow.png
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/workflows

.github/workflows

.idea

.idea

fastinference

fastinference

.DS_Store

.DS_Store

LICENSE.txt

LICENSE.txt

README.md

README.md

detailed_workflow.png

detailed_workflow.png

requirements.txt

requirements.txt

setup.cfg

setup.cfg

setup.py

setup.py

Repository files navigation

⚡FastInference - The Ultra-Fast LLM Querying Manager (OpenAi, HuggingFace, Ollama, ...)

Features

The workflow

Usage

The Parameters

The Prompt

Example Usage

The main_column Parameter

Output format

Supported Providers (Docs)

Contributing

About

Releases 1

Packages

Languages

License

blefo/FastInference

Folders and files

Latest commit

History

Repository files navigation

⚡FastInference - The Ultra-Fast LLM Querying Manager (OpenAi, HuggingFace, Ollama, ...)

Features

The workflow

Usage

The Parameters

The Prompt

Example Usage

The main_column Parameter

Output format

Supported Providers (Docs)

Contributing

About

Topics

Resources

License

Stars

Watchers

Forks

Languages