• Kairos@lemmy.today
    link
    fedilink
    English
    arrow-up
    41
    ·
    2 days ago

    So can a lot of other models.

    “This load can be towed by a single vehicle”

    • Irdial@lemmy.sdf.org
      link
      fedilink
      English
      arrow-up
      27
      ·
      2 days ago

      the Chinese AI lab also released a smaller, “distilled” version of its new R1, DeepSeek-R1-0528-Qwen3-8B, that DeepSeek claims beats comparably sized models on certain benchmarks

      Most models come in 1B, 7-8B, 12-14B, and 27+B parameter variants. According to the docs, they benchmarked the 8B model using an NVIDIA H20 (96 GB VRAM) and got between 144-1198 tokens/sec. Most consumer GPUs probably aren’t going to be able to keep up with

      • brucethemoose@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        ·
        edit-2
        1 day ago

        Depends on the quantization.

        7B is small enough to run it in FP8 or a Marlin quant with SGLang/VLLM/TensorRT, so you can probably get very close to the H20 on a 3090 or 4090 (or even a 3060) and you know a little Docker.

      • Avid Amoeba@lemmy.ca
        link
        fedilink
        English
        arrow-up
        7
        ·
        2 days ago

        It proved sqrt(2) irrational with 40tps on a 3090 here. The 32b R1 did it with 32tps but it thought a lot longer.

        • Irdial@lemmy.sdf.org
          link
          fedilink
          English
          arrow-up
          3
          arrow-down
          1
          ·
          edit-2
          2 days ago

          On my Mac mini running LM Studio, it managed 1702 tokens at 17.19 tok/sec and thought for 1 minute. If accurate, high-performance models were more able to run on consumer hardware, I would use my 3060 as a dedicated inference device

    • LainTrain@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      7
      arrow-down
      1
      ·
      2 days ago

      I’m genuinely curious what you do that a 7b model is “trash” to you? Like yeah sure a gippity now tends to beat out a mistral 7b but I’m pretty happy with my mistral most of the time if I ever even need ai at all.

    • TropicalDingdong@lemmy.world
      link
      fedilink
      English
      arrow-up
      6
      arrow-down
      2
      ·
      2 days ago

      Yeah idk. I did some work with deepseek early on. I wasn’t impressed.

      HOWEVER…

      Some other things they’ve developed like deepsite, holy shit impressive.

    • T156@lemmy.world
      link
      fedilink
      English
      arrow-up
      12
      arrow-down
      3
      ·
      1 day ago

      The censorship only exists on the version they host, which is fair enough. If they’re running it themselves in China, they can’t just break the law.

      If you run it yourself, the censorship isn’t there.

      • jaschen@lemm.ee
        link
        fedilink
        English
        arrow-up
        3
        arrow-down
        1
        ·
        18 hours ago

        Untrue, I downloaded the vanilla version and it’s hardcoded in.

      • MonkderVierte@lemmy.ml
        link
        fedilink
        English
        arrow-up
        2
        arrow-down
        1
        ·
        edit-2
        1 day ago

        Yeah, i think the censoring in the LLM data itself would be pretty vulnerable to circumvention.