- cross-posted to:
- technology@lemmy.world
- cross-posted to:
- technology@lemmy.world
cross-posted from: https://sh.itjust.works/post/61139432
I seriously can’t believe how much progress he’s made for the FOSS community. He actually might take a bite out of the big 3’s profits with this



How many GPUs do you even need to have a usable, self-hosted AI? It looks like he has 6 on his rig. Probably each costs 2k or something. That’s not peanuts. I have a 12GB VRAM card. It probably can’t generate anything in any meaningful amount of time. Which brings me to the question: who is this for?
Regardless, impressive what he vibe-coded there.
I use an 6700 XTX and it’s working perfectly fine, depending on the model. Gemma4 takes a long time to generate answers, but the Qwen-Series is quick and starts generating answers in ~10 seconds.
What’s the quality of the answers though? And how much context can it hold? I imagine it’s only good for small, short questions, but have no concept of what is needed for that.
I’m assuming you’re using a 12b or 24b qwen model. The ones from deepseek go up to hundreds of billions of params and I can’t tell if bigger number is better or just meaningless posturing.
I’m using the 35b models.
Quality for qwen is mostly fine - sometimes it does hallucinate some shit while thinking, but it does correct itself almost every time. But the answers itself are, for the most part, precise and useful. Not what you know from the cloud models, obviously, but it’s absolutely fine for everyday use. What is actually annoying is the web search - not sure if that’s a qwen problem or a problem with open webui, but it actually takes a long time to finish the search.
I once had a situation where a model was running into an “infinite loop” while thinking, thinking the same line over and over again. And once, qwen just started outputting chinese halfway through the answer lol.
When it comes to context, I’m gonna be very honest - I don’t know. I have never hit any kind of problems or limits because of that since I’m not using AI over a long term project. I use it for small, concise cases and that’s it.
Didn’t downvote. I use AI, and not ashamed of it. I don’t write huge programs and I damn sure don’t release anything to the public mainly because, in the back of my mind, I can just see some poor chap using my code and now smoke is coming out of his server. It works for me. Usually it’s ‘write a script that does _________’ or Docker compose files. It seems pretty accurate for those uses and if I need a bash command sequence explained, it’s good for that too.
I also use AI when I master my audio tracks before I upload them. I am clinically deaf and there are some frequencies that I just can’t hear well enough to make a judgement call. It’s pretty good at that too.
Thanks for the response. It’s interesting to read about the experience of others.
My MacBook Air with 24GB of unified RAM is enough to run something simple and useful.
That’s like what, 5 or 6k?
Like 1k
Reasonable price!
I have a rx5600xt (6gb), 32gb ram, ryzen 3600. System hasn’t been updated since i built it during covid. QwenV3-vl35B is the heftiest thing I can run, it gets around 2 tokens/sec, in LM studio. It’s easier than most people seem to think.
How do you now run out of RAM? Does it offload to system RAM?
Yes, offloads into system. Oh and i forgot to mention that’s with the context set around 25k. That can vary greatly per model though, it’s taken some experimentation to figure that out.
Thank you. That’s good to know.
I think in one video it looked like 16 cards. I think he did multiple bifurcations of the pcie lanes. I think he is / was using it for protein folding as well.
That’s definitely not my level of disposable wealth/income. I can barely afford one card.