Does Ollama currently plan to support multiple acceleration frameworks #4501

glide-the · 2024-05-17T15:31:51Z

Requirements

Does Ollama currently plan to support multiple acceleration frameworks?
We understand that Ollama currently leverages Llama.cpp for inference acceleration, which supports only the Llama architecture. The GLM has made some modifications to the model.

We are very keen on seeing the GLM ecosystem implemented with C++ capabilities. To this end, we have developed the following design proposal and would like to inquire if Ollama has plans to advance this work.

Ollama Project Integration with ChatGLM and CogVM

The Ollama project is currently built on the Llama.cpp acceleration framework, creating a one-click run framework. It leverages the inference and conversational capabilities of Llama.cpp. At a higher level, it has designed a service distribution and execution test. Users can pull quantized models from a remote image server and run them using a local client. The project currently supports Linux, Mac, and Windows terminal systems. Llama.cpp inference code can accelerate inference on mainstream hardware.

Objective

To use Ollama's service distribution method to distribute models like ChatGLM and CogVM on the server side, supporting multi-end systems (Linux/Mac/Windows) for execution.

Ollama Project Design Description

The Ollama framework relies on the language features of CGO, designing a local client runtime system. The system compiles the terminal executable files of Llama.cpp via CGO to publish the HTTP server service provided by Llama.cpp. By connecting Go and C through .h files, it supports model quantization. The upper layer has designed command modules to receive user command instructions, thereby invoking HTTP service through Go to complete model instance maintenance. Additionally, the Go module includes code for model management and retrieval.

Call Relationship Diagram

graph TD;
    A[cgo Layer] --> B[llama.cpp server];
    B --> C[httpserver Service];
    C --> D[.h Files];
    D --> E[Quantization Support];
    A --> F[Command Module];
    F --> C;
    F --> G[Go Execution];
    G --> H[Model Management];
    G --> I[Model Retrieval];
 
    G --> J[Task pubsh];
    E --> C;

    subgraph Ollama Framework
        A;
        B;
        C;
        D;
        E;
        F;
        G;
        H;
        I;
    end
    
    K[Compilation Adaptation];
    K --> L[llama.cpp server];
    L --> M[Task Scheduling];

Design Proposal

Based on the design content in the above diagram, we investigated the ChatGLM.cpp repository, which provides quantization support for the GGML inference solution. On this basis, we can write a GLM server executor, making some adaptation operations at the compilation layer, checking the compatibility of Llama.cpp and ChatGLM.cpp .h header files, and scheduling the corresponding task allocation.

graph TD;
    A[cgo Layer];
    A --> F[Command Module];
    F --> C[llama.cpp & chatglm.cpp header]
    C --> D[.h Files];
    D --> E[Quantization Support];
    F --> G[Go Execution];
    G --> H[Model Management];
    G --> I[Model Retrieval];
    G --> J[Task pubsh];
    E --> C;

    subgraph Ollama Framework
        A;
        B;
        C;
        D;
        E;
        F;
        G;
        H;
        I;
    end
    
    K[Compilation Adaptation];
    K --> L[llama.cpp & chatglm.cpp server];
    L --> M[Task Scheduling];

link

glide-the added the feature request New feature or request label May 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does Ollama currently plan to support multiple acceleration frameworks #4501

Does Ollama currently plan to support multiple acceleration frameworks #4501

glide-the commented May 17, 2024 •

edited

Does Ollama currently plan to support multiple acceleration frameworks #4501

Does Ollama currently plan to support multiple acceleration frameworks #4501

Comments

glide-the commented May 17, 2024 • edited

Requirements

Ollama Project Integration with ChatGLM and CogVM

Objective

Ollama Project Design Description

Design Proposal

link

glide-the commented May 17, 2024 •

edited