Another reason to self host your own AI

SuspiciousCarrot78@aussie.zone · edit-2 7 days ago

Another reason to self host your own AI

irmadlad@lemmy.world · 7 days ago

People will buy intelligence from us on a meter’

We have governmental surveillance and we have surveillance capitalism. Surveillance capitalism works so well that governments are now very interested in the data they collect, which is alarming. Unfounded conspiracy theory: It’s probably one of the reasons that governments don’t seem interested in AI’s regulation. If I had the proper equipment to run AI entirely local and efficiently so that the expenditure would justify it, I would.

SuspiciousCarrot78@aussie.zone · 7 days ago

You probably could. A Tesla P4 or P40 (old data centre cards) are more than up to the job. My Lenovo tiny hosts a P4 (card cost $100 on eBay; the lenovo itself was $200ish) and runs Qwen3.5-35B-A3B at about 20 tok/s. Smaller models are even faster.

https://www.youtube.com/watch?v=8F_5pdcD3HY

If you’re not bound by the one liter shoebox design, then the P40 is still a great and inexpensive card.

I think I mentioned elsewhere but right now I’m trying to figure out if I can use a magic packet from the Raspberry Pi to wake up the Lenovo as needed rather than leaving it on all the time.

irmadlad@lemmy.world · 6 days ago

Thing is, if I were going to do in house AI, I’d want to do it up right and from what I can gather, a system like that is going to cost me some jack.

pogmommy@lemmy.ml · 7 days ago

My issue with the orphan-crushing machine isn’t only that it’s not in my children’s bedroom

sobchak@programming.dev · 6 days ago

I think they know it’s a somewhat viable option and is part of the reason they’re doing the hardware cartel/circlejerk thing.

Decronym@lemmy.decronym.xyz · edit-2 4 days ago

Acronyms, initialisms, abbreviations, contractions, and other phrases which expand to something larger, that I’ve seen in this thread:

Fewer Letters	More Letters
ARP	Address Resolution Protocol, translates IPs to MAC addresses
IP	Internet Protocol
RPi	Raspberry Pi brand of SBC
SBC	Single-Board Computer

[Thread #321 for this comm, first seen 30th May 2026, 09:50] [FAQ] [Full list] [Contact] [Source code]

Auli@lemmy.ca · 6 days ago

Sure but all these self hosted ais are still done by companies who used massive amounts of power and water to train it.

KatherinaReichelt@feddit.org · 6 days ago

Which is an interesting dilemma: Those AIs are already trained. That power and water was used. If you use them, you will not pollute anything. But you may encourage those companies to train another AI

Noxy@pawb.social · 7 days ago

not gonna self host bullshit that wastes resources and makes me dumber.

toor@lemmy.world · 7 days ago

Me, looking at my Jellyfin server…

Oh. Ok.

Noxy@pawb.social · 7 days ago

NO that makes you dumber in a GOOD WAY THO.

somegeek@programming.dev · edit-2 5 days ago

I started working toward self hosting LLM for my small company using ollama and opencode as agent But I realized a good model like GLM 5 requures 250GB of RAM and 24GB vram with a 4080?? I dont know, this is what the LLM told me itself.

I ended up using qwen-code2.7-7b-16k.

Currently the best thing I have is my laptop, 16GB ram, i7 9750H gtx1650

How do you guys selfhost? What models do you use that are actually good?

SuspiciousCarrot78@aussie.zone · edit-2 4 days ago

I mean…that entirely depends on your use case - and I hate saying that. For me and what I do, Qwen SLM (esp Qwen3-4B 2507 instruct and Qwen3.5-2B) are exceptional. But I’m not trying to do Claude at home.

Best bet? Spend $10 on OpenRouter and try different models. In a head to head with ChatGPT 5.4 mini (excellent for coding BTW), I’ve found Qwen 3.5 27B more than able to hold its own for coding tasks…IF you narrowly gate it/confine it. The last batch of Qwen’s really are something. Dunno about the 3.7 series.

Having said ALL that, I’m really tempted to go back in time and code myself a deterministic expert system, with user updatable knowledge cascade, tool calling and a minimal amount of Markov chain word garnish for flavour. I think we use to just call that “a program” lol.

Really tempted actually, because if 50% of llm use case is basically Super Google but not shit…well, I can make that myself. I just need to point my autism at it.

PS: this might help

https://www.youtube.com/watch?v=0AqpaFm11oI

somegeek@programming.dev · 4 days ago

Qwen 3.5 24B is way too large for my specs. I’m barely running qwen2.5 7B

SuspiciousCarrot78@aussie.zone · edit-2 4 days ago

Hmm…it runs on a 1060…it’s a MoE not a dense. 24B is even lighter. Worth a shot.

https://www.youtube.com/watch?v=8F_5pdcD3HY

Else, if youre looking for a coding model (??) something like Sara or fara might suit

https://huggingface.co/microsoft/Fara-7B

somegeek@programming.dev · 4 days ago

Thanks. I will look into it.

superglue@lemmy.dbzer0.com · 7 days ago

Does anyone have a recommendation for a local model that can run well on a 5070 12GB? It pretty much would only get used for help with homelabbing and simple scripts.

monoboy@lemmy.zip · 7 days ago

Qwen 3.6-35B-A3B (which OP mentioned) would work great as long as you have some system RAM to offload it.

SuspiciousCarrot78@aussie.zone · 7 days ago

There’s an argument to be had regarding a MoE versus a small dense model. I guess it depends on what exactly you need doing with it. I would be tempted to run a smaller dense model (like a Qwen 3-14B or a Qwen 3.5 9B) as at a reasonable quant, it might fit mostly or entirely on the GPU, thereby giving you excellent speeds.

PS: I’m actually in the process of designing an expert system (not a LLM) for pretty much the task you described. The intention is that you would still interact with it like a large language model, but the actual brains underneath it would be something more traditional.

GreenBottles@lemmy.world · 6 days ago

P100s are dirt cheap on ebay fyi

SuspiciousCarrot78@aussie.zone · 6 days ago

Huh - cheaper than the P40s (though less VRAM) but larger bandwidth due to HBM2. Good looking out

GreenBottles@lemmy.world · 6 days ago

They rip

surewhynotlem@lemmy.world · 6 days ago

I was looking at that. Does it end up faster than something like a 1080?

SuspiciousCarrot78@aussie.zone · 6 days ago

Numbers about 3-4x. The P100 is near 800 GB/s. The 1080 is what… 192GB/s? Hell, even if it were double that, HBM2 simply has larger bandwidth. The 1080 was a gaming card; the P100 is a server / number cruncher.

heartSagan5@lemmy.zip · 5 days ago

And are you sure you’r self-hosting or is it a plugin (that you’re self-hosting)? Also, I don’t invite SkyNet into my perimeter.

litchralee@sh.itjust.works · 7 days ago

I’d like to draw a comparison: a cozy wood fire versus central heating. In the right time and place (eg camping in the woods), a wood fire is both very practical and very useful. Meanwhile, most homes built in the past 70+ years in the USA have central heating (or are somewhere that doesn’t need heating at all) and the benefits are quite obvious: automatic temperature regulation, supplied by a utility, and low or no local emissions. And yet, there will still be rural homes that are heated exclusively by a wood stove, located in the middle of the living room, whose iron construction stores and radiates heat well after the fire has gone out.

Do I bemoan individual homes that use a wood fire? No, not really. The reality is that a grand, overwhelming majority of people don’t have wood fires anymore. Even when air quality is poor, prohibiting wood fires in a few rural homes isn’t exactly what would clear up the air.

Now, it would be a vastly different story if city-dwellers all had wood fires. When every home in a neighborhood is building and burning a wood fire, the results are disastrous: horrific PM2.5 in the air, soot coating everything, substantially reduced energy efficiency, and mass logging just to keep the wood supply. A mole-hill quickly becomes a mountain of problems when it’s at scale.

So to that end, I would very much like to see commercial-scale AI reigned in, as the external costs have already gotten out of hand. What they have built is more correctly called a wildfire, not a wood fire. But where does that leave small-scale AI/LLM users? They can weigh the cost/benefits for themselves, provided that they don’t harm other people or resources in the process.

But that brings us back to a cozy wood fire versus central heating: at small scale, a wood fire struggles to heat an entire modern American home (ie 2500 sq ft; or 232 sq m). Yet central heating does it with ease. Who then will be interested in this endeavor? Probably only those with a love for the camping aesthetic, and other enthusiasts.

At this point, it has become more clear what the utility of small LLM models is, and they do pale in comparison to larger LLM models. If small LLMs are what sensibly survives into the future, then that’s essentially a cap on their capabilities, given a want to avoid burning the planet to run anything larger. The only way out would be for substantial developments in the energy efficiency of small LLM models, but that’s not where the interest is.

No one is seeking to build a more efficient wood fire.

irmadlad@lemmy.world · 7 days ago

(ie 2500 sq ft; or 232 sq m)

Damn, y’all livin’ lavvy.

pound_heap@lemmy.dbzer0.com · 7 days ago

People are downvoting you, but I like your idea to draw analogy with heating, because it is something most of us rely on, and if LLMs and related technology will keep evolving as they do, probably most of us will rely on it more or less, sooner or later. Regardless of what AI haters would say.

But your wood fire/central heating analogy is bad. I would compare large LLM vendors to hot water heating utility common in Eastern Europe, and small LLMs to various heating devices. Utility companies can set prices, and decide who gets connected to hot water pipe, and set water temperature. There are regulations that limit the power of such utility companies, allow customers to choose the supplier, etc. Same should happen with LLM providers - competition and anti-monopoly laws should protect customers who choose to use them.

Alternatively, customers may choose not to use utility-supplied heating. They can purchase space heaters, hand warmers, install split systems, burn wood - they are free to pick technology, power source, size, appearance of such devices. They can take responsibility of heating their homes, willing to invest their time and money in order to be independent of central heating utility. Small LLMs are like that - people can run their own, with capabilities dependent on investment, or they can pay smaller providers or resellers to get more flexibility and/or privacy and avoid capital investments. They could spend time tuning small models and harnesses to do some simple tasks, and they wouldn’t need to “buy intelligence” from OpenAI and others.

pound_heap@lemmy.dbzer0.com · 7 days ago

deleted by creator