• graphito@sopuli.xyzOP
    link
    fedilink
    English
    arrow-up
    5
    ·
    edit-2
    3 months ago

    Goodluck checking for hallucinations using this approach

    I used to use llm to fill up forms with personal data: llm always tries to imagine new people, amalgamation of correct names from db, new forms, imaginary places of birth, nonexistent false data. Weeding out these error is hard and usually happens far late into production. To catch the error, I have to create all sorts of pipelines and checks, which is insane complexity and maintenance burden for such a simple job as “fill up a form”


    AI hyped coworker in response to this problem said: oh, so it’s just a quality problem – you can put AI to check the result 10 times and if it’s flaky, give it to human to check.

    He created a system where llms were writing code, checked the resulting code and verified it to written requirements by nontechnical-human. I mean it’s impressive but I can’t imagine the system being “hired” to do high stake projects.