• T156@lemmy.world
    link
    fedilink
    English
    arrow-up
    2
    ·
    3 days ago

    That’s basically model routing, and has existed a while. Open AI’s GPT-5 and llama-swap do that, for example. If the task is simple, it uses a smaller, less intensive model, and only uses the slower, larger one of the task is more complex.

    Though most tend to operate with models on the same device/service, rather than a model run elsewhere.