Sorry, I thought I posted more relevant details.
My machine is ancient. Core i7 2700K. GTX 670 (6GB). 👴
I installed the nVidia tools and it let me install Cuda 12.x but I don't know if my GPU supports that.
The command I'm trying to run is very slightly different from yours (StarCoder-1B instead of 3B), taken from the Windows Installation documentation:
.\tabby.exe serve --model StarCoder-1B --chat-model Qwen2-1.5B-Instruct --device cuda
But after counting up for about 10 seconds it starts spitting out messages like the following every few seconds:
10.104 s Starting...←[2m2024-10-18T08:32:25.331474Z←[0m ←[33m WARN←[0m ←[2mllama_cpp_server::supervisor←[0m←[2m:←[0m ←[2mcrates\llama-cpp-server\src\supervisor.rs←[0m←[2m:←[0m←[2m98:←[0m llama-server <embedding> exited with status code -1073741795, args: `Command { std: "D:\\Apps\\tabby_x86_64-windows-msvc-cuda122\\llama-server.exe" "-m" "C:\\Users\\Deozaan\\.tabby\\models\\TabbyML\\Nomic-Embed-Text\\ggml\\model.gguf" "--cont-batching" "--port" "30888" "-np" "1" "--log-disable" "--ctx-size" "4096" "-ngl" "9999" "--embedding" "--ubatch-size" "4096", kill_on_drop: true }
I left it running for about 2 hours and it just kept doing that.
More help would be appreciated.
-Deozaan
First, it may be handy to list the specifications of the computer I use with Tabby (for reference):
CPU: Intel i3 10100F (bog standard cooling, no tweaks of any kind)
GPU: MSI GeForce 1650 (4 GB of VRAM, 128-bit bus, 75 Watt version (without extra power connectors on the card))
RAM: Kingston 32 GByte (DDR4, 3200 MHz) As of this week, I added another 16 GB RAM stick, just to see if dual channel was an improvement or not. Till now I didn't notice it.
SSD: SSD via SATA interface (Crucial 500 GByte)
All inside a cheap, no-name "gamer" case
There are 3 things to try:
- Tabby without GPU support
- Tabby with GPU support
- No Tabby
Tabby without GPU support:You will need to download a (much smaller)
Tabby, extract it and make sure that you have as much of the fastest RAM as the motherboard supports in your computer to make this the best experience possible.
Start it with:
.\tabby.exe serve --model StarCoder-1B --chat-model Qwen2-1.5B-Instruct
Less than ideal, but you would still be able to (patiently) see how it works.
Tabby with GPU support:This
overview from NVidia shows which type of CUDA is supported on their GPUs.
My GPU is a 1650, which supports CUDA 7.5
Your GPU is a 670, which supports CUDA 3.0
This
overview shows you need NVidia driver 450.xxx for your card if you use CUDA development software 11.x.
You can get v11.7 of the CUDA development tools
here. As far as I know, you can go the NVidia website and download a tool that identifies your card and the maximum driver number it supports. If that number isn't 450 or higher, than I'm pretty sure that the Tabby version for CUDA devices won't work.
In case you can download a sufficient driver for your GPU, download the '
tabby_x86_64-windows-msvc-cuda117.zip' archive, extract it and start it with:
.\tabby.exe serve --model StarCoder-1B --chat-model Qwen2-1.5B-Instruct --device cuda
In case you cannot download a sufficient driver for your GPU, you could still try a Tabby version that supports vulkan devices. Vulkan is supported by NVIdia/AMD/Intel GPUs and is often used to make Windows games work on Linux. As your GPU is from around 2012 and Vulkan was made relatively recent, I don't know how far back Vulkan support goes. You might be lucky though. Anyway, download the '
tabby_x86_64-windows-msvc-vulkan.zip' archive, extract it and start it with:
.\tabby.exe serve --model StarCoder-1B --chat-model Qwen2-1.5B-Instruct --device vulkan
No Tabby:If this also doesn't work, than you have to come to the conclusion that your GPU is simply too old for use with LLMs/AI and that you are relegated to software that provide CPU-only access to LLMs/AI. And in that case, I recommend another free tool:
LM Studio. This tool, in combination with the LLM 'bartowski/StableLM Instruct 3B' is my advice. This software is very versatile, takes about 1 GB of RAM and the LLM takes about 4 GB of RAM, so you'll need to have 8 GByte or more in your computer for LM Studio to work halfway decent.