Recent post re: AI as utility

https://www.tomsguide.com/ai/people-will-buy-intelligence-from-us-on-a-meter-chatgpts-ceo-sam-altman-has-critics-worried-with-his-ai-vision

Myself, I’m a fan of local LLM / self hosted ML… but if you ever needed a clarion call that a hard pivot is coming (soon) for online/ cloud based AI…Altman et al are making some concerning mouth noises (to say nothing of broader concerns with OAI, Anthropic etc).

Right now, I’m sketching out a plan where my Raspberry Pi (always on, 2-3w) uses a magic packet to wake up my modest AI server (Lenovo P330 with Tesla P4) if/when needed (Qwen 3.6-35B-A3B); no point in chugging down 80-100w, 24/7 for no good reason.

If the trend continues the direction it appears to be (increasing costs, environmental impacts etc) then I’d feel a lot better hosting my own as port of first call and replacing simpler tasks with more traditional programs. YMMV.

  • superglue@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    5
    arrow-down
    1
    ·
    9 days ago

    Does anyone have a recommendation for a local model that can run well on a 5070 12GB? It pretty much would only get used for help with homelabbing and simple scripts.

    • monoboy@lemmy.zip
      link
      fedilink
      English
      arrow-up
      7
      arrow-down
      1
      ·
      9 days ago

      Qwen 3.6-35B-A3B (which OP mentioned) would work great as long as you have some system RAM to offload it.

    • SuspiciousCarrot78@aussie.zoneOP
      link
      fedilink
      English
      arrow-up
      5
      arrow-down
      1
      ·
      9 days ago

      There’s an argument to be had regarding a MoE versus a small dense model. I guess it depends on what exactly you need doing with it. I would be tempted to run a smaller dense model (like a Qwen 3-14B or a Qwen 3.5 9B) as at a reasonable quant, it might fit mostly or entirely on the GPU, thereby giving you excellent speeds.

      PS: I’m actually in the process of designing an expert system (not a LLM) for pretty much the task you described. The intention is that you would still interact with it like a large language model, but the actual brains underneath it would be something more traditional.