Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does Ollama currently plan to support multiple acceleration frameworks #4501

Open
glide-the opened this issue May 17, 2024 · 0 comments
Open
Labels
feature request New feature or request

Comments

@glide-the
Copy link

glide-the commented May 17, 2024

Requirements

Does Ollama currently plan to support multiple acceleration frameworks?
We understand that Ollama currently leverages Llama.cpp for inference acceleration, which supports only the Llama architecture. The GLM has made some modifications to the model.

We are very keen on seeing the GLM ecosystem implemented with C++ capabilities. To this end, we have developed the following design proposal and would like to inquire if Ollama has plans to advance this work.

Ollama Project Integration with ChatGLM and CogVM

The Ollama project is currently built on the Llama.cpp acceleration framework, creating a one-click run framework. It leverages the inference and conversational capabilities of Llama.cpp. At a higher level, it has designed a service distribution and execution test. Users can pull quantized models from a remote image server and run them using a local client. The project currently supports Linux, Mac, and Windows terminal systems. Llama.cpp inference code can accelerate inference on mainstream hardware.

Objective

To use Ollama's service distribution method to distribute models like ChatGLM and CogVM on the server side, supporting multi-end systems (Linux/Mac/Windows) for execution.

Ollama Project Design Description

The Ollama framework relies on the language features of CGO, designing a local client runtime system. The system compiles the terminal executable files of Llama.cpp via CGO to publish the HTTP server service provided by Llama.cpp. By connecting Go and C through .h files, it supports model quantization. The upper layer has designed command modules to receive user command instructions, thereby invoking HTTP service through Go to complete model instance maintenance. Additionally, the Go module includes code for model management and retrieval.

  • Call Relationship Diagram
graph TD;
    A[cgo Layer] --> B[llama.cpp server];
    B --> C[httpserver Service];
    C --> D[.h Files];
    D --> E[Quantization Support];
    A --> F[Command Module];
    F --> C;
    F --> G[Go Execution];
    G --> H[Model Management];
    G --> I[Model Retrieval];
 
    G --> J[Task pubsh];
    E --> C;

    subgraph Ollama Framework
        A;
        B;
        C;
        D;
        E;
        F;
        G;
        H;
        I;
    end
    
    K[Compilation Adaptation];
    K --> L[llama.cpp server];
    L --> M[Task Scheduling];
    

Design Proposal

Based on the design content in the above diagram, we investigated the ChatGLM.cpp repository, which provides quantization support for the GGML inference solution. On this basis, we can write a GLM server executor, making some adaptation operations at the compilation layer, checking the compatibility of Llama.cpp and ChatGLM.cpp .h header files, and scheduling the corresponding task allocation.

graph TD;
    A[cgo Layer];
    A --> F[Command Module];
    F --> C[llama.cpp & chatglm.cpp header]
    C --> D[.h Files];
    D --> E[Quantization Support];
    F --> G[Go Execution];
    G --> H[Model Management];
    G --> I[Model Retrieval];
    G --> J[Task pubsh];
    E --> C;

    subgraph Ollama Framework
        A;
        B;
        C;
        D;
        E;
        F;
        G;
        H;
        I;
    end
    
    K[Compilation Adaptation];
    K --> L[llama.cpp & chatglm.cpp server];
    L --> M[Task Scheduling];
  

link
@glide-the glide-the added the feature request New feature or request label May 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant