forked from containers/podman-desktop-extension-ai-lab
-
Notifications
You must be signed in to change notification settings - Fork 0
/
ai.json
264 lines (264 loc) · 31.3 KB
/
ai.json
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
{
"recipes": [
{
"id": "chatbot",
"description" : "This is a Streamlit chat demo application.",
"name" : "ChatBot",
"repository": "https://github.com/containers/ai-lab-recipes",
"ref": "96555a1a8dd517b499933b66ba09ac4a248a0bb6",
"icon": "natural-language-processing",
"categories": [
"natural-language-processing"
],
"basedir": "recipes/natural_language_processing/chatbot",
"readme": "# Chat Application\n\nThis model service is intended be used as the basis for a chat application. It is capable of having arbitrarily long conversations\nwith users and retains a history of the conversation until it reaches the maximum context length of the model.\nAt that point, the service will remove the earliest portions of the conversation from its memory.\n\nTo use this model service, please follow the steps below:\n\n* [Download Model](#download-models)\n* [Build Image](#build-the-image)\n* [Run Image](#run-the-image)\n* [Interact with Service](#interact-with-the-app)\n* [Deploy on Openshift](#deploy-on-openshift)\n\n## Build and Deploy Locally\n\n### Download model(s)\n\nThe two models that we have tested and recommend for this example are Llama2 and Mistral. The locations of the GGUF variants\nare listed below:\n\n* Llama2 - https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/tree/main\n* Mistral - https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF/tree/main\n\n_For a full list of supported model variants, please see the \"Supported models\" section of the\n[llama.cpp repository](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#description)._\n\nThis example assumes that the developer already has a copy of the model that they would like to use downloaded onto their host machine and located in the `/models` directory of this repo. \n\nThis can be accomplished with:\n\n```bash\ncd models\nwget https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q5_K_S.gguf\ncd ../\n```\n\n## Deploy from Local Container\n\n### Build the image\n\nBuild the `model-service` image.\n\n```bash\ncd chatbot/model_services\npodman build -t chatbot:service -f base/Containerfile .\n```\n\nAfter the image is created it should be run with the model mounted as volume, as shown below.\nThis prevents large model files from being loaded into the container image which can cause a significant slowdown\nwhen transporting the images. If it is required that a model-service image contains the model,\nthe Containerfiles can be modified to copy the model into the image.\n\nWith the model-service image, in addition to a volume mounted model file, an environment variable, $MODEL_PATH,\nshould be set at runtime. If not set, the default location where the service expects a model is at \n`/locallm/models/llama-2-7b-chat.Q5_K_S.gguf` inside the running container. This file can be downloaded from the URL\n`https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q5_K_S.gguf`.\n\n### Run the image\n\nOnce the model service image is built, it can be run with the following:\nBy assuming that we want to mount the model `llama-2-7b-chat.Q5_K_S.gguf`\n\n```bash\nexport MODEL_FILE=llama-2-7b-chat.Q5_K_S.gguf\npodman run --rm -d -it \\n -v /local/path/to/$MODEL_FILE:/locallm/models/$MODEL_FILE:Z \\n --env MODEL_PATH=/locallm/models/$MODEL_FILE \\n -p 7860:7860 \\n chatbot:service\n```\n\n### Interact with the app\n\nNow the service can be interacted with by going to `0.0.0.0:7860` in your browser.\n\n![](https://raw.githubusercontent.com/containers/ai-lab-recipes/main/assets/app.png)\n\n\nYou can also use the example [chatbot/ai_applications/ask.py](ask.py) to interact with the model-service in a terminal.\nIf the `--prompt` argument is left blank, it will default to \"Hello\".\n\n```bash\ncd chatbot/ai_applications\n\npython ask.py --prompt <YOUR-PROMPT>\n```\n\nOr, you can build the `ask.py` into a container image and run it alongside the model-service container, like so:\n\n```bash\ncd chatbot/ai_applications\npodman build -t chatbot -f builds/Containerfile .\npodman run --rm -d -it -p 8080:8080 chatbot # then interact with the application at 0.0.0.0:8080 in your browser\n```\n\n## Deploy on Openshift\n\nNow that we've developed an application locally that leverages an LLM, we'll want to share it with a wider audience.\nLet's get it off our machine and run it on OpenShift.\n\n### Rebuild for x86\n\nIf you are on a Mac, you'll need to rebuild the model-service image for the x86 architecture for most use case outside of Mac.\nSince this is an AI workload, you may also want to take advantage of Nvidia GPU's available outside our local machine.\nIf so, build the model-service with a base image that contains CUDA and builds llama.cpp specifically for a CUDA environment.\n\n```bash\ncd chatbot/model_services/cuda\npodman build --platform linux/amd64 -t chatbot:service-cuda -f cuda/Containerfile .\n```\n\nThe CUDA environment significantly increases the size of the container image.\nIf you are not utilizing a GPU to run this application, you can create an image\nwithout the CUDA layers for an x86 architecture machine with the following:\n\n```bash\ncd chatbot/model_services\npodman build --platform linux/amd64 -t chatbot:service-amd64 -f base/Containerfile .\n```\n\n### Push to Quay\n\nOnce you login to [quay.io](quay.io) you can push your own newly built version of this LLM application to your repository\nfor use by others.\n\n```bash\npodman login quay.io\n```\n\n```bash\npodman push localhost/chatbot:service-amd64 quay.io/<YOUR-QUAY_REPO>/<YOUR_IMAGE_NAME:TAG>\n```\n\n### Deploy\n\nNow that your model lives in a remote repository we can deploy it.\nGo to your OpenShift developer dashboard and select \"+Add\" to use the Openshift UI to deploy the application.\n\n![](https://raw.githubusercontent.com/containers/ai-lab-recipes/main/assets/add_image.png)\n\nSelect \"Container images\"\n\n![](https://raw.githubusercontent.com/containers/ai-lab-recipes/main/assets/container_images.png)\n\nThen fill out the form on the Deploy page with your [quay.io](quay.io) image name and make sure to set the \"Target port\" to 7860.\n\n![](https://raw.githubusercontent.com/containers/ai-lab-recipes/main/assets/deploy.png)\n\nHit \"Create\" at the bottom and watch your application start.\n\nOnce the pods are up and the application is working, navigate to the \"Routes\" section and click on the link created for you\nto interact with your app.\n\n![](https://raw.githubusercontent.com/containers/ai-lab-recipes/main/assets/app.png)",
"models": [
"hf.TheBloke.mistral-7b-instruct-v0.1.Q4_K_M",
"hf.NousResearch.Hermes-2-Pro-Mistral-7B.Q4_K_M",
"hf.ibm.merlinite-7b-Q4_K_M",
"hf.froggeric.Cerebrum-1.0-7b-Q4_KS",
"hf.TheBloke.openchat-3.5-0106.Q4_K_M",
"hf.TheBloke.mistral-7b-openorca.Q4_K_M",
"hf.MaziyarPanahi.phi-2.Q4_K_M",
"hf.llmware.dragon-mistral-7b-q4_k_m",
"hf.MaziyarPanahi.MixTAO-7Bx2-MoE-Instruct-v7.0.Q4_K_M"
]
},
{
"id": "summarizer",
"description" : "This is a Streamlit demo application for summarizing text.",
"name" : "Summarizer",
"repository": "https://github.com/containers/ai-lab-recipes",
"ref": "96555a1a8dd517b499933b66ba09ac4a248a0bb6",
"icon": "natural-language-processing",
"categories": [
"natural-language-processing"
],
"basedir": "recipes/natural_language_processing/summarizer",
"readme": "# Summarizer\n\nThis model service is intended be be used for text summarization tasks. This service can ingest an arbitrarily long text input. If the input length is less than the models maximum context window it will summarize the input directly. If the input is longer than the maximum context window, the input will be divided into appropriately sized chunks. Each chunk will be summarized and a final \"summary of summaries\" will be the services final output. ",
"models": [
"hf.TheBloke.mistral-7b-instruct-v0.1.Q4_K_M",
"hf.NousResearch.Hermes-2-Pro-Mistral-7B.Q4_K_M",
"hf.ibm.merlinite-7b-Q4_K_M",
"hf.froggeric.Cerebrum-1.0-7b-Q4_KS",
"hf.TheBloke.openchat-3.5-0106.Q4_K_M",
"hf.TheBloke.mistral-7b-openorca.Q4_K_M",
"hf.MaziyarPanahi.phi-2.Q4_K_M",
"hf.llmware.dragon-mistral-7b-q4_k_m",
"hf.MaziyarPanahi.MixTAO-7Bx2-MoE-Instruct-v7.0.Q4_K_M"
]
},
{
"id": "codegeneration",
"description" : "This is a code-generation demo application.",
"name" : "Code Generation",
"repository": "https://github.com/containers/ai-lab-recipes",
"ref": "96555a1a8dd517b499933b66ba09ac4a248a0bb6",
"icon": "generator",
"categories": [
"natural-language-processing"
],
"basedir": "recipes/natural_language_processing/codegen",
"readme": "# Code Generation\n\nThis example will deploy a local code-gen application using a llama.cpp model server and a python app built with langchain. \n\n### Download Model\n\n- **codellama**\n\n - Download URL: `wget https://huggingface.co/TheBloke/CodeLlama-7B-Instruct-GGUF/resolve/main/codellama-7b-instruct.Q4_K_M.gguf` \n\n```\n\ncd ../models\n\nwget <Download URL>\n\ncd ../\n\n```\n\n### Deploy Model Service\n\nTo start the model service, refer to [the playground model-service document](../playground/README.md). Deploy the LLM server and volumn mount the model of choice.\n\n```\n\npodman run --rm -it -d \\ \n\n -p 8001:8001 \\ \n\n -v Local/path/to/locallm/models:/locallm/models:ro,Z \\ \n\n -e MODEL_PATH=models/<model-filename> \\ \n\n -e HOST=0.0.0.0 \\ \n\n -e PORT=8001 \\ \n\n playground:image\n\n```\n\n### Build Container Image\n\nOnce the model service is deployed, then follow the instruction below to build your container image and run it locally. \n\n- `podman build -t codegen-app code-generation -f code-generation/builds/Containerfile`\n\n- `podman run -it -p 8501:8501 codegen-app -- -m http://10.88.0.1:8001/v1` ",
"models": [
"hf.TheBloke.mistral-7b-code-16k-qlora.Q4_K_M",
"hf.TheBloke.mistral-7b-codealpaca-lora.Q4_K_M",
"hf.TheBloke.mistral-7b-instruct-v0.1.Q4_K_M",
"hf.NousResearch.Hermes-2-Pro-Mistral-7B.Q4_K_M",
"hf.ibm.merlinite-7b-Q4_K_M",
"hf.froggeric.Cerebrum-1.0-7b-Q4_KS",
"hf.TheBloke.openchat-3.5-0106.Q4_K_M",
"hf.TheBloke.mistral-7b-openorca.Q4_K_M",
"hf.MaziyarPanahi.phi-2.Q4_K_M",
"hf.llmware.dragon-mistral-7b-q4_k_m",
"hf.MaziyarPanahi.MixTAO-7Bx2-MoE-Instruct-v7.0.Q4_K_M"
]
},
{
"id": "audio_to_text",
"description" : "This is an audio to text demo application.",
"name" : "Audio to Text",
"repository": "https://github.com/containers/ai-lab-recipes",
"ref": "96555a1a8dd517b499933b66ba09ac4a248a0bb6",
"icon": "generator",
"categories": [
"audio"
],
"basedir": "recipes/audio/audio_to_text",
"readme": "# Audio to Text Application\n\n This sample application is a simple recipe to transcribe an audio file.\n This provides a simple recipe to help developers start building out their own custom LLM enabled\n audio-to-text applications. It consists of two main components; the Model Service and the AI Application.\n\n There are a few options today for local Model Serving, but this recipe will use [`whisper-cpp`](https://github.com/ggerganov/whisper.cpp.git)\n and its included Model Service. There is a Containerfile provided that can be used to build this Model Service within the repo,\n [`model_servers/whispercpp/Containerfile`](/model_servers/whispercpp/Containerfile).\n\n Our AI Application will connect to our Model Service via it's API endpoint.\n\n<p align=\"center\">\n<img src=\"../../../assets/whisper.png\" width=\"70%\">\n</p>\n\n# Build the Application\n\nIn order to build this application we will need a model, a Model Service and an AI Application. \n\n* [Download a model](#download-a-model)\n* [Build the Model Service](#build-the-model-service)\n* [Deploy the Model Service](#deploy-the-model-service)\n* [Build the AI Application](#build-the-ai-application)\n* [Deploy the AI Application](#deploy-the-ai-application)\n* [Interact with the AI Application](#interact-with-the-ai-application)\n * [Input audio files](#input-audio-files)\n\n### Download a model\n\nIf you are just getting started, we recommend using [ggerganov/whisper.cpp](https://huggingface.co/ggerganov/whisper.cpp).\nThis is a well performant model with an MIT license.\nIt's simple to download a pre-converted whisper model from [huggingface.co](https://huggingface.co)\nhere: https://huggingface.co/ggerganov/whisper.cpp. There are a number of options, but we recommend to start with `ggml-small.bin`. \n\nThe recommended model can be downloaded using the code snippet below:\n\n```bash\ncd models\nwget https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-small.bin \ncd ../\n```\n\n_A full list of supported open models is forthcoming._ \n\n\n### Build the Model Service\n\nThe Model Service can be built from the root directory with the following code snippet:\n\n```bash\ncd model_servers/whispercpp\npodman build -t whispercppserver .\n```\n\n### Deploy the Model Service\n\nThe local Model Service relies on a volume mount to the localhost to access the model files. You can start your local Model Service using the following Podman command:\n```\npodman run --rm -it \\\n -p 8001:8001 \\\n -v Local/path/to/locallm/models:/locallm/models \\\n -e MODEL_PATH=models/<model-filename> \\\n -e HOST=0.0.0.0 \\\n -e PORT=8001 \\\n whispercppserver\n```\n\n### Build the AI Application\n\nNow that the Model Service is running we want to build and deploy our AI Application. Use the provided Containerfile to build the AI Application\nimage from the `audio-to-text/` directory.\n\n```bash\ncd audio-to-text\npodman build -t audio-to-text . -f builds/Containerfile \n```\n### Deploy the AI Application\n\nMake sure the Model Service is up and running before starting this container image.\nWhen starting the AI Application container image we need to direct it to the correct `MODEL_SERVICE_ENDPOINT`.\nThis could be any appropriately hosted Model Service (running locally or in the cloud) using a compatible API.\nThe following Podman command can be used to run your AI Application:\n\n```bash\npodman run --rm -it -p 8501:8501 -e MODEL_SERVICE_ENDPOINT=http://0.0.0.0:8001/inference audio-to-text \n```\n\n### Interact with the AI Application\n\nOnce the streamlit application is up and running, you should be able to access it at `http://localhost:8501`.\nFrom here, you can upload audio files from your local machine and translate the audio files as shown below.\n\nBy using this recipe and getting this starting point established,\nusers should now have an easier time customizing and building their own LLM enabled applications. \n\n#### Input audio files\n\nWhisper.cpp requires as an input 16-bit WAV audio files.\nTo convert your input audio files to 16-bit WAV format you can use `ffmpeg` like this:\n\n```bash\nffmpeg -i <input.mp3> -ar 16000 -ac 1 -c:a pcm_s16le <output.wav>\n```\n",
"models": [
"hf.ggerganov.whisper.cpp"
]
},
{
"id": "object_detection",
"description" : "This is an object detection demo application.",
"name" : "Object Detection",
"repository": "https://github.com/containers/ai-lab-recipes",
"ref": "96555a1a8dd517b499933b66ba09ac4a248a0bb6",
"icon": "generator",
"categories": [
"computer-vision"
],
"basedir": "recipes/computer_vision/object_detection",
"readme": "# Object Detection\n\nThis recipe provides an example for running an object detection model service and its associated client locally. \n\n## Build and run the model service\n\n```bash\ncd object_detection/model_server\npodman build -t object_detection_service .\n```\n\n```bash\npodman run -it --rm -p 8000:8000 object_detection_service\n```\n\nBy default the model service will use [`facebook/detr-resnet-101`](https://huggingface.co/facebook/detr-resnet-101), which has an apache-2.0 license. The model is relatively small, but it will be downloaded fresh each time the model server is started unless a local model is provided (see additional instructions below). \n\n\n## Use a different or local model\n\nIf you'd like to use a different model hosted on huggingface, simply use the environment variable `MODEL_PATH` and set it to the correct `org/model` path on [huggingface.co](https://huggingface.co/) when starting your container. \n\nIf you'd like to download models locally so that they are not pulled each time the container restarts, you can use the following python snippet to a model to your `models/` directory. \n\n```python\nfrom huggingface_hub import snapshot_download\n\nsnapshot_download(repo_id=\"facebook/detr-resnet-101\",\n revision=\"no_timm\",\n local_dir=\"<PATH_TO>/locallm/models/vision/object_detection/facebook/detr-resnet-101\",\n local_dir_use_symlinks=False)\n\n```\n\nWhen using a model other than the default, you will need to set the `MODEL_PATH` environment variable. Here is an example of running the model service with a local model:\n\n```bash\n podman run -it --rm -p 8000:8000 -v <PATH/TO>/locallm/models/vision/:/locallm/models -e MODEL_PATH=models/object_detection/facebook/detr-resnet-50/ object_detection_service\n```\n\n## Build and run the client application\n\n```bash\ncd object_detection/client\npodman build -t object_detection_client .\n```\n\n```bash\npodman run -p 8501:8501 -e MODEL_ENDPOINT=http://10.88.0.1:8000/detection object_detection_client\n```\n\nOnce the client is up a running, you should be able to access it at `http://localhost:8501`. From here you can upload images from your local machine and detect objects in the image as shown below. \n\n<p align=\"center\">\n<img src=\"../../../assets/object_detection.png\" width=\"70%\">\n</p>\n\n\n",
"models": [
"hf.facebook.detr-resnet-101"
]
}
],
"models": [
{
"id": "hf.TheBloke.mistral-7b-instruct-v0.1.Q4_K_M",
"name": "TheBloke/Mistral-7B-Instruct-v0.1-GGUF",
"description": "The Mistral-7B-Instruct-v0.1 Large Language Model (LLM) is a instruct fine-tuned version of the [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) generative text model using a variety of publicly available conversation datasets. For full details of this model please read our [release blog post](https://mistral.ai/news/announcing-mistral-7b/)",
"hw": "CPU",
"registry": "Hugging Face",
"license": "Apache-2.0",
"url": "https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF/resolve/main/mistral-7b-instruct-v0.1.Q4_K_M.gguf",
"memory": 4370129224
},
{
"id": "hf.NousResearch.Hermes-2-Pro-Mistral-7B.Q4_K_M",
"name": "NousResearch/Hermes-2-Pro-Mistral-7B-GGUF",
"description": "This is the GGUF version of the model, made for the llama.cpp inference engine.\n If you are looking for the transformers/fp16 model, it is available here: [https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B](https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B)\n Hermes 2 Pro on Mistral 7B is the new flagship 7B Hermes!\n Hermes 2 Pro is an upgraded, retrained version of Nous Hermes 2, consisting of an updated and cleaned version of the OpenHermes 2.5 Dataset, as well as a newly introduced Function Calling and JSON Mode dataset developed in-house.\n This new version of Hermes maintains its excellent general task and conversation capabilities - but also excels at Function Calling, JSON Structured Outputs, and has improved on several other metrics as well, scoring a 90% on our function calling evaluation built in partnership with Fireworks.AI, and an 81% on our structured JSON Output evaluation.\n Hermes Pro takes advantage of a special system prompt and multi-turn function calling structure with a new chatml role in order to make function calling reliable and easy to parse. Learn more about prompting below.\nThis work was a collaboration between Nous Research, @interstellarninja, and Fireworks.AI\n Learn more about the function calling on our github repo here: [https://github.com/NousResearch/Hermes-Function-Calling/tree/main](https://github.com/NousResearch/Hermes-Function-Calling/tree/main)",
"hw": "CPU",
"registry": "Hugging Face",
"license": "Apache-2.0",
"url": "https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B-GGUF/resolve/main/Hermes-2-Pro-Mistral-7B.Q4_K_M.gguf",
"memory": 4370129224
},
{
"id": "hf.ibm.merlinite-7b-Q4_K_M",
"name": "ibm/merlinite-7b-GGUF",
"description": "## Merlinite 7b - GGUF\n4-bit quantized version of [ibm/merlinite-7b](https://huggingface.co/ibm/merlinite-7b)",
"hw": "CPU",
"registry": "Hugging Face",
"license": "Apache-2.0",
"url": "https://huggingface.co/ibm/merlinite-7b-GGUF/resolve/main/merlinite-7b-Q4_K_M.gguf",
"memory": 4370129224
},
{
"id": "hf.TheBloke.mistral-7b-codealpaca-lora.Q4_K_M",
"name": "TheBloke/Mistral-7B-codealpaca-lora-GGUF",
"description": "## Mistral 7B CodeAlpaca Lora - GGUF\n- Model creator: [Kamil](https://huggingface.co/Nondzu)\n- Original model: [Mistral 7B CodeAlpaca Lora](https://huggingface.co/Nondzu/Mistral-7B-codealpaca-lora)\n### Description\nThis repo contains GGUF format model files for [Kamil's Mistral 7B CodeAlpaca Lora](https://huggingface.co/Nondzu/Mistral-7B-codealpaca-lora).\nThese files were quantised using hardware kindly provided by [Massed Compute](https://massedcompute.com/).",
"hw": "CPU",
"registry": "Hugging Face",
"license": "Apache-2.0",
"url": "https://huggingface.co/TheBloke/Mistral-7B-codealpaca-lora-GGUF/resolve/main/mistral-7b-codealpaca-lora.Q4_K_M.gguf",
"memory": 4370129224
},
{
"id": "hf.TheBloke.mistral-7b-code-16k-qlora.Q4_K_M",
"name": "TheBloke/Mistral-7B-Code-16K-qlora-GGUF",
"description": "## Mistral 7B Code 16K qLoRA - GGUF\n- Model creator: [Kamil](https://huggingface.co/Nondzu)\n- Original model: [Mistral 7B Code 16K qLoRA](https://huggingface.co/Nondzu/Mistral-7B-code-16k-qlora)\n## Description\nThis repo contains GGUF format model files for [Kamil's Mistral 7B Code 16K qLoRA](https://huggingface.co/Nondzu/Mistral-7B-code-16k-qlora).",
"hw": "CPU",
"registry": "Hugging Face",
"license": "Apache-2.0",
"url": "https://huggingface.co/TheBloke/Mistral-7B-Code-16K-qlora-GGUF/resolve/main/mistral-7b-code-16k-qlora.Q4_K_M.gguf",
"memory": 4370129224
},
{
"id": "hf.froggeric.Cerebrum-1.0-7b-Q4_KS",
"name": "froggeric/Cerebrum-1.0-7b-GGUF",
"description": "GGUF quantisations of [AetherResearch/Cerebrum-1.0-7b](https://huggingface.co/AetherResearch/Cerebrum-1.0-7b)\n## Introduction\nCerebrum 7b is a large language model (LLM) created specifically for reasoning tasks. It is based on the Mistral 7b model, fine-tuned on a small custom dataset of native chain of thought data and further improved with targeted RLHF (tRLHF), a novel technique for sample-efficient LLM alignment. Unlike numerous other recent fine-tuning approaches, our training pipeline includes under 5000 training prompts and even fewer labeled datapoints for tRLHF.\nNative chain of thought approach means that Cerebrum is trained to devise a tactical plan before tackling problems that require thinking. For brainstorming, knowledge intensive, and creative tasks Cerebrum will typically omit unnecessarily verbose considerations.\nZero-shot prompted Cerebrum significantly outperforms few-shot prompted Mistral 7b as well as much larger models (such as Llama 2 70b) on a range of tasks that require reasoning, including ARC Challenge, GSM8k, and Math.\nThis LLM model works a lot better than any other mistral mixtral models for agent data, tested on 14th March 2024.",
"hw": "CPU",
"registry": "Hugging Face",
"license": "Apache-2.0",
"url": "https://huggingface.co/froggeric/Cerebrum-1.0-7b-GGUF/resolve/main/Cerebrum-1.0-7b-Q4_KS.gguf",
"memory": 4144643441
},
{
"id": "hf.TheBloke.openchat-3.5-0106.Q4_K_M",
"name": "TheBloke/openchat-3.5-0106-GGUF",
"description": "## Openchat 3.5 0106 - GGUF\n- Model creator: [OpenChat](https://huggingface.co/openchat)\n- Original model: [Openchat 3.5 0106](https://huggingface.co/openchat/openchat-3.5-0106)\n## DescriptionThis repo contains GGUF format model files for [OpenChat's Openchat 3.5 0106](https://huggingface.co/openchat/openchat-3.5-0106).\nThese files were quantised using hardware kindly provided by [Massed Compute](https://massedcompute.com/).",
"hw": "CPU",
"registry": "Hugging Face",
"license": "Apache-2.0",
"url": "https://huggingface.co/TheBloke/openchat-3.5-0106-GGUF/resolve/main/openchat-3.5-0106.Q4_K_M.gguf",
"memory": 4370129224
},
{
"id": "hf.TheBloke.mistral-7b-openorca.Q4_K_M",
"name": "TheBloke/Mistral-7B-OpenOrca-GGUF",
"description": "## Mistral 7B OpenOrca - GGUF- Model creator: [OpenOrca](https://huggingface.co/Open-Orca)\n- Original model: [Mistral 7B OpenOrca](https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca)\n## Description\nThis repo contains GGUF format model files for [OpenOrca's Mistral 7B OpenOrca](https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca).",
"hw": "CPU",
"registry": "Hugging Face",
"license": "Apache-2.0",
"url": "https://huggingface.co/TheBloke/Mistral-7B-OpenOrca-GGUF/resolve/main/mistral-7b-openorca.Q4_K_M.gguf",
"memory": 4370129224
},
{
"id": "hf.MaziyarPanahi.phi-2.Q4_K_M",
"name": "MaziyarPanahi/phi-2-GGUF",
"description": "## [MaziyarPanahi/phi-2-GGUF](https://huggingface.co/MaziyarPanahi/phi-2-GGUF)\n- Model creator: [microsoft](https://huggingface.co/microsoft)\n- Original model: [microsoft/phi-2](https://huggingface.co/microsoft/phi-2)\n## Description\n[MaziyarPanahi/phi-2-GGUF](https://huggingface.co/MaziyarPanahi/phi-2-GGUF) contains GGUF format model files for [microsoft/phi-2](https://huggingface.co/microsoft/phi-2).",
"hw": "CPU",
"registry": "Hugging Face",
"license": "Apache-2.0",
"url": "https://huggingface.co/MaziyarPanahi/phi-2-GGUF/resolve/main/phi-2.Q4_K_M.gguf",
"memory": 1739461755
},
{
"id": "hf.llmware.dragon-mistral-7b-q4_k_m",
"name": "llmware/dragon-mistral-7b-v0",
"description": "## Model Card for Model ID\ndragon-mistral-7b-v0 part of the dRAGon ('Delivering RAG On ...') model series, RAG-instruct trained on top of a Mistral-7B base model.\nDRAGON models have been fine-tuned with the specific objective of fact-based question-answering over complex business and legal documents with an emphasis on reducing hallucinations and providing short, clear answers for workflow automation.",
"hw": "CPU",
"registry": "Hugging Face",
"license": "Apache-2.0",
"url": "https://huggingface.co/llmware/dragon-mistral-7b-v0/resolve/main/dragon-mistral-7b-q4_k_m.gguf",
"memory": 4370129224
},
{
"id": "hf.MaziyarPanahi.MixTAO-7Bx2-MoE-Instruct-v7.0.Q4_K_M",
"name": "MaziyarPanahi/MixTAO-7Bx2-MoE-Instruct-v7.0-GGUF",
"description": "## [MaziyarPanahi/MixTAO-7Bx2-MoE-Instruct-v7.0-GGUF](https://huggingface.co/MaziyarPanahi/MixTAO-7Bx2-MoE-Instruct-v7.0-GGUF)\n- Model creator: [zhengr](https://huggingface.co/zhengr)\n- Original model: [zhengr/MixTAO-7Bx2-MoE-Instruct-v7.0](https://huggingface.co/zhengr/MixTAO-7Bx2-MoE-Instruct-v7.0)\n## Description\n[MaziyarPanahi/MixTAO-7Bx2-MoE-Instruct-v7.0-GGUF](https://huggingface.co/MaziyarPanahi/MixTAO-7Bx2-MoE-Instruct-v7.0-GGUF) contains GGUF format model files for [zhengr/MixTAO-7Bx2-MoE-Instruct-v7.0](https://huggingface.co/zhengr/MixTAO-7Bx2-MoE-Instruct-v7.0).",
"hw": "CPU",
"registry": "Hugging Face",
"license": "Apache-2.0",
"url": "https://huggingface.co/MaziyarPanahi/MixTAO-7Bx2-MoE-Instruct-v7.0-GGUF/resolve/main/MixTAO-7Bx2-MoE-Instruct-v7.0.Q4_K_M.gguf",
"memory": 7784628224
},
{
"id": "hf.ggerganov.whisper.cpp",
"name": "ggerganov/whisper.cpp",
"description": "# OpenAI's Whisper models converted to ggml format\n\n[Available models](https://huggingface.co/ggerganov/whisper.cpp/tree/main)\n",
"hw": "CPU",
"registry": "Hugging Face",
"license": "Apache-2.0",
"url": "https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-small.bin",
"memory": 1073741824
},
{
"id": "hf.facebook.detr-resnet-101",
"name": "facebook/detr-resnet-101",
"description": "# DETR (End-to-End Object Detection) model with ResNet-101 backbone\n\nDEtection TRansformer (DETR) model trained end-to-end on COCO 2017 object detection (118k annotated images). It was introduced in the paper [End-to-End Object Detection with Transformers](https://arxiv.org/abs/2005.12872) by Carion et al. and first released in [this repository](https://github.com/facebookresearch/detr). \n\nDisclaimer: The team releasing DETR did not write a model card for this model so this model card has been written by the Hugging Face team.\n\n## Model description\n\nThe DETR model is an encoder-decoder transformer with a convolutional backbone. Two heads are added on top of the decoder outputs in order to perform object detection: a linear layer for the class labels and a MLP (multi-layer perceptron) for the bounding boxes. The model uses so-called object queries to detect objects in an image. Each object query looks for a particular object in the image. For COCO, the number of object queries is set to 100. \n\nThe model is trained using a \"bipartite matching loss\": one compares the predicted classes + bounding boxes of each of the N = 100 object queries to the ground truth annotations, padded up to the same length N (so if an image only contains 4 objects, 96 annotations will just have a \"no object\" as class and \"no bounding box\" as bounding box). The Hungarian matching algorithm is used to create an optimal one-to-one mapping between each of the N queries and each of the N annotations. Next, standard cross-entropy (for the classes) and a linear combination of the L1 and generalized IoU loss (for the bounding boxes) are used to optimize the parameters of the model.\n\n![model image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/detr_architecture.png)\n\n## Intended uses & limitations\n\nYou can use the raw model for object detection. See the [model hub](https://huggingface.co/models?search=facebook/detr) to look for all available DETR models.",
"hw": "CPU",
"registry": "Hugging Face",
"license": "Apache-2.0",
"url": "https://huggingface.co/facebook/detr-resnet-101/resolve/main/pytorch_model.bin",
"memory": 1073741824
}
],
"categories": [
{
"id": "natural-language-processing",
"name": "Natural Language Processing",
"description" : "Models that work with text: classify, summarize, translate, or generate text."
},
{
"id": "computer-vision",
"description" : "Process images, from classification to object detection and segmentation.",
"name" : "Computer Vision"
},
{
"id": "audio",
"description" : "Recognize speech or classify audio with audio models.",
"name" : "Audio"
},
{
"id": "multimodal",
"description" : "Stuff about multimodal models goes here omg yes amazing.",
"name" : "Multimodal"
}
]
}