I’ve been looking into self-hosting LLMs, and it seems a $10k GPU is kind of a requirement to run a decently-sized model and get reasonable tokens / s rate. There’s CPU and SSD offloading, but I’d imagine it would be frustratingly slow to use. I even find cloud-based AI like GH Copilot to be rather annoyingly slow. Even so, GH Copilot is like $20 a month per user, and I’d be curious what the actual costs are per user considering the hardware and electricity cost.
What we have now is clearly an experimental first generation of the tech, but the industry is building out data centers as though it’s always going to require massive GPUs / NPUs with wicked quantities of VRAM to run these things. If it really will require huge data centers full of expensive hardware where each user prompt requires minutes of compute time on a $10k GPU, then it can’t possibly be profitable to charge a nominal monthly fee to use this tech, but maybe there are optimizations I’m unaware of.
Even so, if the tech does evolve and it become a lot cheaper to host these things, then will all these new data centers still be needed? On the other hand, if the hardware requirements don’t decrease by an order of magnitude, then will it be cost effective to offer LLMs as a service, in which case, I don’t imagine the new data centers will be needed either.
As I am told, there is no way these llm’s ever make their investments back. It’s like Tesla at this point. Whomever is paying the actual money to build this stuff is going to get hosed if they can’t offload it onto some other sucker. That ultimate sucker probably being the US taxpayer.
Honestly just jump in with whatever hardware you have available and a small 1.5b/7b model. You’ll figure out all the difficult uncertainties as you go and try to improve things.
I’m hosting a few lighter models that are somewhat useful and fun without even using a dedicated GPU- just a lot of ram and fast NVMe so the models don’t take forever to spin up.
Of course I’ve got an upgrade path in mind for the hardware and to add a GPU but there are other places I’d rather put the money atm and I do appreciate that it all currently runs on a 250w PSU.
I’ve been looking into self-hosting LLMs, and it seems a $10k GPU is kind of a requirement to run a decently-sized model and get reasonable tokens / s rate. There’s CPU and SSD offloading, but I’d imagine it would be frustratingly slow to use. I even find cloud-based AI like GH Copilot to be rather annoyingly slow. Even so, GH Copilot is like $20 a month per user, and I’d be curious what the actual costs are per user considering the hardware and electricity cost.
What we have now is clearly an experimental first generation of the tech, but the industry is building out data centers as though it’s always going to require massive GPUs / NPUs with wicked quantities of VRAM to run these things. If it really will require huge data centers full of expensive hardware where each user prompt requires minutes of compute time on a $10k GPU, then it can’t possibly be profitable to charge a nominal monthly fee to use this tech, but maybe there are optimizations I’m unaware of.
Even so, if the tech does evolve and it become a lot cheaper to host these things, then will all these new data centers still be needed? On the other hand, if the hardware requirements don’t decrease by an order of magnitude, then will it be cost effective to offer LLMs as a service, in which case, I don’t imagine the new data centers will be needed either.
As I am told, there is no way these llm’s ever make their investments back. It’s like Tesla at this point. Whomever is paying the actual money to build this stuff is going to get hosed if they can’t offload it onto some other sucker. That ultimate sucker probably being the US taxpayer.
Ai failed and now they are doing this to capture the compute market to then make their profit back through unscrupulous means.
Honestly just jump in with whatever hardware you have available and a small 1.5b/7b model. You’ll figure out all the difficult uncertainties as you go and try to improve things.
I’m hosting a few lighter models that are somewhat useful and fun without even using a dedicated GPU- just a lot of ram and fast NVMe so the models don’t take forever to spin up.
Of course I’ve got an upgrade path in mind for the hardware and to add a GPU but there are other places I’d rather put the money atm and I do appreciate that it all currently runs on a 250w PSU.