SYCL: Anyone using TrueNas Apps to launch a compose version of any of the images? #17849
Replies: 11 comments
-
|
Adding some logs after the prompt was working and then started outputting gibberish: prompt + response: prompt: prompt: It went on, but I deleted over 80% of text after this point. logs: |
Beta Was this translation helpful? Give feedback.
-
|
@the-bort-the
FP32 has less accuracy effect than FP16. But both have similar performance in fact.
Thank you! |
Beta Was this translation helpful? Give feedback.
-
|
I can see |
Beta Was this translation helpful? Give feedback.
-
|
@the-bort-the |
Beta Was this translation helpful? Give feedback.
-
|
@NeoZhangJianyu I made the following change and built the image locally anyways. I'm pulling this image in now via the TrueNas docker-compose UI. It spins up with the below logs. I guess I'm curious to know if there is any other way to ensure f32 is truly enabled or what else can be done? I'm seeing GPU use, but also heavy CPU use. Logs: |
Beta Was this translation helpful? Give feedback.
-
|
@the-bort-the I'm not familiar with the llama-server. Thank you! |
Beta Was this translation helpful? Give feedback.
-
|
@NeoZhangJianyu - Hi! The other models seems to only tax the CPU - it will reach 35-50%. This is with the same image I built, and the image I original obtained from the repo. I have tried the models listed below, obtained from Hugging face. Attached are cli results and metrics from a newly obtained llama_cli_chicago_prompts.txt Also I've since disabled this in docker-compose: Current docker-compose: |
Beta Was this translation helpful? Give feedback.
-
|
@the-bort-the
Thank you! |
Beta Was this translation helpful? Give feedback.
-
|
@NeoZhangJianyu I think I'm just trying to understand how to best use my Intel GPU with ggml because their isn't official support through Ollama. ollama/ollama#11160 (comment) |
Beta Was this translation helpful? Give feedback.
-
|
@the-bort-the The ollama enhanced by SYCL backend is still using old version of llama.cpp, since no more users. Anyway, try the new LLM by llama.cpp firstly. Thank you! |
Beta Was this translation helpful? Give feedback.
-
You mean continue using this project because it's actively being developed, yes? I feel this is the best shot to get things working and to make use of the current hardware I have. Currently the version of Ollama in the official Ollama project is stuck on 0.9.3, which is pretty old. Perhaps I should just look at AMD or NVIDIA and ditch Intel hardware 😄 |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Currently running
ghcr.io/ggml-org/llama.cpp:server-intelwith some success within TrueNas Apps. Wondering if anyone is doing the same and has any progress to share (compose settings, models to use, etc).I have tried several models from Hugging face and they seem to work for 1-8 prompts before needing a new chat session or needing a restart. I've had more success with the packaged UI over trying to force it to use Open webUI. I'm not sure if I have a bug or not, but I'm just not seeing things live before needing to deploy the container once more.
intel-gpu-topshowBlitteractivity during prompts.current image sha:
sha256:c32e17454cc730656a7c245a67c8eb06dc65ce8e39f83969df1753150fff3cb4GPU: Intel Arc A750 Graphics
CPU: AMD Ryzen 5 5600XT 6-Core Processor
OS: TrueNAS v25.04.0 Community Edition
Models tried:
Compose.yml:
Beta Was this translation helpful? Give feedback.
All reactions