• Blue_Morpho@lemmy.world
    link
    fedilink
    English
    arrow-up
    173
    arrow-down
    3
    ·
    3 months ago

    The title of the article is extraordinary wrong that makes it click bait.

    There is no “yes to copilot”

    It is only a formalization of what Linux said before: All AI is fine but a human is ultimately responsible.

    " AI agents cannot use the legally binding “Signed-off-by” tag, requiring instead a new “Assisted-by” tag for transparency"

    The only mention of copilot was this:

    “developers using Copilot or ChatGPT can’t genuinely guarantee the provenance of what they are submitting”

    This remains a problem that the new guidelines don’t resolve. Because even using AI as a tool and having a human review it still means the code the LLM output could have come from non GPL sources.

    • marlowe221@lemmy.world
      link
      fedilink
      English
      arrow-up
      44
      ·
      edit-2
      3 months ago

      Yeah, that’s also my question. Partially because I am a former-lawyer-turned-software-developer… but, yeah. How are the kernel maintainers supposed to evaluate whether a particular PR contains non-GPL code?

      Granted, this was potentially an issue before LLMs too, but nowhere near the scale it will be now.

      (In the interests of full disclosure, my legal career had nothing to do with IP law or software licensing - I did public interest law).

      • Alex@lemmy.ml
        link
        fedilink
        English
        arrow-up
        23
        ·
        3 months ago

        They don’t, just like they don’t with human submitted stuff. The point of the Signed-off-by is the author attests they have the rights to submit the code.

        • ell1e@leminal.space
          link
          fedilink
          English
          arrow-up
          2
          ·
          3 months ago

          Which I’m guessing they cannot attest, if LLMs truly have the 2-10% plagiarism rate that multiple studies seem to claim. It’s an absurd rule, if you ask me. (Not that I would know, I’m not a lawyer.)

          • Alex@lemmy.ml
            link
            fedilink
            English
            arrow-up
            3
            ·
            3 months ago

            Where are you seeing the 2-10% figure?

            In my experience code generation is most affected by the local context (i.e. the codebase you are working on). On top of that a lot of code is purely mechanical - code generally has to have a degree of novelty to be protected by copyright.

      • wonderingwanderer@sopuli.xyz
        link
        fedilink
        English
        arrow-up
        7
        arrow-down
        1
        ·
        3 months ago

        If it’s flagged as “assisted by <LLM>” then it’s easy to identify where that code came from. If a commercial LLM is trained on proprietary code, that’s on the AI company, not on the developer who used the LLM to write code. Unless they can somehow prove that the developer had access to said proprietary code and was able to personally exploit it.

        If AI companies are claiming “fair use,” and it holds up in court, then there’s no way in hell open-source developers should be held accountable when closed-source snippets magically appear in AI-assisted code.

        Granted, I am not a lawyer, and this is not legal advice. I think it’s better to avoid using AI-written code in general. At most use it to generate boilerplate, and maybe add a layer to security audits (not as a replacement for what’s already being done).

        But if an LLM regurgitates closed-source code from its training data, I just can’t see any way how that would be the developer’s fault…

        • sem@piefed.blahaj.zone
          link
          fedilink
          English
          arrow-up
          5
          arrow-down
          1
          ·
          3 months ago

          Pretty convenient.

          This is how copyleft code gets laundered into closed source programs.

          All part of the plan.

          • wonderingwanderer@sopuli.xyz
            link
            fedilink
            English
            arrow-up
            1
            ·
            3 months ago

            How would they launder it? Just declare it their own property because a few lines of code look similar? When there’s no established connection between the developers and anyone who has access to the closed-source code?

            That makes no sense. Please tell me that wouldn’t hold up in court.

            • lagoon8622@sh.itjust.works
              link
              fedilink
              English
              arrow-up
              3
              ·
              3 months ago

              Please tell me that wouldn’t hold up in court.

              First tell us how much money you have. Then we’ll be able to predict whether the courts will find in your favor or not

            • sem@piefed.blahaj.zone
              link
              fedilink
              English
              arrow-up
              2
              ·
              3 months ago

              First of all, who is going to discover the closed source use of gpl code and create a lawsuit anyway?

              Second, the llm ingests the code, and then spits it back out, with maybe a few changes. That is how it benefits from copyleft code while stripping the license.

              Maybe a human could do the same thing, but it would take much longer.

              • wonderingwanderer@sopuli.xyz
                link
                fedilink
                English
                arrow-up
                1
                ·
                3 months ago

                Wait, did you just move the goalposts? I thought the issue we were talking about was open-source developers who use LLM-generated code and unwittingly commit changes that contain allegedly closed-source snippets from the LLM’s training data.

                Now you want to talk about LLM training data that uses open-source code, and then closed-source developers commit changes that contain snippets of GPL code? That’s fine. It’s a change of topic, but we can talk about that too.

                Just don’t expect what I said before about the previous topic of discussion to apply to the new topic. If we’re talking about something different now, I get to say different things. That’s how it works.

                • sem@piefed.blahaj.zone
                  link
                  fedilink
                  English
                  arrow-up
                  1
                  ·
                  3 months ago

                  I was responding specifically to this part

                  But if an LLM regurgitates closed-source code from its training data, I just can’t see any way how that would be the developer’s fault…

                  showing what would happen when the llm regurgitates open source code into close source projects.

                  Sorry if you didn’t like that.

    • lechekaflan@lemmy.world
      link
      fedilink
      English
      arrow-up
      5
      ·
      3 months ago

      The title of the article is extraordinary wrong that makes it click bait.

      It’s the pain in the ass with some of those fucking tech/video/showbiz news outlets and then rules in some fora where you cannot make “editorialized” post titles, even though it’s so tempting to correct the awful titling.

    • Fmstrat@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      ·
      3 months ago

      Because even using AI as a tool and having a human review it still means the code the LLM output could have come from non GPL sources.

      I get why they are passing this by though, since you don’t know the provenance of that Stack Overflow snippet, either.

    • scarabic@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      3 months ago

      That’s probably why they say “a human is responsible” not “a human must validate it.” I certainly agree that validation is not always possible. And this problem will get worse in time.

    • TheOctonaut@piefed.zip
      link
      fedilink
      English
      arrow-up
      4
      arrow-down
      4
      ·
      3 months ago

      the LLM output could have come from non-GPL sources

      Fundamentally not how LLMs work, it’s not a database of code snippets.

  • theherk@lemmy.world
    link
    fedilink
    English
    arrow-up
    90
    arrow-down
    4
    ·
    3 months ago

    Seems like a reasonable approach. Make people be accountable for the code they submit, no matter the tools used.

    • ell1e@leminal.space
      link
      fedilink
      English
      arrow-up
      18
      arrow-down
      1
      ·
      3 months ago

      If the accountability cannot be practically fulfilled, the reasonable policy becomes a ban.

      What good is it to say “oh yeah you can submit LLM code, if you agree to be sued for it later instead of us”? I’m not a lawyer and this isn’t legal advice, but sometimes I feel like that’s what the Linux Foundation policy says.

  • sonofearth@lemmy.world
    link
    fedilink
    English
    arrow-up
    39
    arrow-down
    2
    ·
    3 months ago

    I am the c/fuck_ai person but at this point I have made peace we can’t avoid it. I still don’t want it to do artsy stuff (image gen, video gen) and to blindly use it in critical stuff because humans are the ones that should be doing it or have constant oversight. I think the team’s logic is correct here, because there is no way to know if the code is from an LLM or a human unless something there screams LLM or the contributor explicitly mentions it. Mandating the latter seems like a reasonable move for now.

  • CanIFishHere@lemmy.ca
    link
    fedilink
    English
    arrow-up
    42
    arrow-down
    7
    ·
    3 months ago

    AI is here, another tool to use…the correct way. Very reasonable approach from Torvalds.

  • NewNewAugustEast@lemmy.zip
    link
    fedilink
    English
    arrow-up
    29
    arrow-down
    4
    ·
    3 months ago

    Copilot? You mean the AI with terms of service that are in bold and explicit: “for entertainment purposes only”?

    Which is why its in the title and not the article? EntertainBait?

    • truthfultemporarily@feddit.org
      link
      fedilink
      English
      arrow-up
      15
      arrow-down
      4
      ·
      3 months ago

      Where does slop start? If you use auto complete and it is just adding a semicolon or some braces, is it slop? Is producing character by character what you would have wrote yourself slop?

      How about using it for debugging?

      • ell1e@leminal.space
        link
        fedilink
        English
        arrow-up
        7
        arrow-down
        1
        ·
        3 months ago

        If you would have written it yourself the same way, why not write it yourself? (And there was autocomplete before the age of LLMs, anyway.)

        The big problems start with situations where it doesn’t match what you would have written, but rather what somebody else has written, character by character.

      • BoxOfFeet@lemmy.world
        link
        fedilink
        English
        arrow-up
        5
        arrow-down
        3
        ·
        3 months ago

        To me, it starts at anything beyond correcting spelling for individual words or adding punctuation. I don’t even want it suggesting quick reply phrases.

        Is producing character by character what you would have wrote yourself slop?

        Yes.

      • badgermurphy@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        ·
        3 months ago

        There’s the rub. When establishing laws and guidelines, every term must be explicitly defined. Lack of specificity in these definitions is where bad-faith actors hide their misdeeds by technically obeying the letter of the law due to its vagueness, while flagrantly violating its spirit.

        Its why today, in the USA, corporations are legally people when its convenient, and not when its not, and the expenditure of money is governments protected “free speech”.

      • FauxLiving@lemmy.world
        link
        fedilink
        English
        arrow-up
        5
        arrow-down
        8
        ·
        edit-2
        3 months ago

        There is a certain brand of user (who may or may not be a human) who draws the venn of ‘AI slop’ and ‘AI output’ as a circle.

        They’ve taken the extremist position that AI should be uninvented and any use of AI is the worst thing that could possibly happen to any project and they’ll have an entire grab bag of misinformation-based memes to shotgun at you. Engaging with these people is about as productive as trying to convince a vaccine denier that vaccines don’t cause autism.

        I’m not saying that the user you replies to believes this, but the comment they wrote is indistinguishable from the comments of such a user.

        e: I’d also like to point out that these users are very much attracted to low-effort activism. This is why you see comments like mind being heavily downvoted but not many actual replies. They want to try to influence the discussion but don’t have the capability or motivation to step into the ring, so to speak, and defend their opinions.

        • ell1e@leminal.space
          link
          fedilink
          English
          arrow-up
          5
          ·
          3 months ago

          It’s less extremist if you look at how easily these LLMs will just plagiarize 1:1, apparently:

          https://github.com/mastodon/mastodon/issues/38072#issuecomment-4105681567

          Some see “AI slop” as “identified by the immediate problems of it that I can identify right away”.

          Many others see “AI slop” as bringing many more problems beyond the immediate ones. Then seeing LLM output as anything but slop becomes difficult.

          • FauxLiving@lemmy.world
            link
            fedilink
            English
            arrow-up
            4
            arrow-down
            2
            ·
            3 months ago

            It’s extremist to take the fact that you CAN get plagiaristic output and to conclude that all other output is somehow tainted.

            You personally CAN quote copyrighted music and screenplays. If you’re an artist then you also CAN produce copyright violating works. None of these facts taint any of the other things that you produce that are not copyright or plagiarized.

            In this situation, and in the current legal environment, the responsibility to not produce illegal and unlicensed code is on the human. The fact that the tool that they use has the capability to break the law does not mean that everything generated by it is tainted.

            Photoshop can be used to plagiarize and violate copyright too. It would be just as absurd to declare all images created with Photoshop are somehow suspect or unusable because of the capability of the tool to violate copyright laws.

            The fact that AI can, when specifically prompted, produce memorized segments of the training data has essentially no legal weight in any of the cases where it has been argued. It is a fact that is of interest to scientists who study how AI represent knowledge internally and not any kind of foundation for a legal argument against the use of AI.

            • badgermurphy@lemmy.world
              link
              fedilink
              English
              arrow-up
              2
              ·
              3 months ago

              Sure, but if they can be demonstrated to ever plagiarize without attribution, and the default user behavior is to pencil-whip the output, which it is, then it becomes statistically certain that users are unwittingly plagiarizing other works.

              Its like using a tool that usually bakes cookies, but every once in a great while, it knocks over the building its in. It almost never does that, though.

              • FauxLiving@lemmy.world
                link
                fedilink
                English
                arrow-up
                1
                arrow-down
                1
                ·
                3 months ago

                Plagiarism and copyright violation are two different things, one is ethical and the other is legal.

                Copyright has a body of case law which helps determine when a work significantly infringes on the copyrighted work of another. Plagiarism has no body of law at all, it is an ethical construct and not a legal one.

                You can plagiarize something that has no copyright protection and you can infringe on copyright protection without plagiarizing. They’re not interchangeable concepts.

                In your example, some institutions would not allow such a device to operate on their property but it would not be illegal to operate and the liability would be on the person and not on the oven.

                To further strain the metaphor, Linus is saying that you can use (possibly) exploding ovens, because he isn’t taking a moral stance on the topic, but you are responsible for the damages if they cause any because the legal systems require that this be the case.

    • MoogleMaestro@lemmy.zip
      link
      fedilink
      English
      arrow-up
      11
      arrow-down
      3
      ·
      3 months ago

      Microsoft needs to try to ruin Linux somehow, it can’t just hurt windows 11 with AI slop code, it needs to expand it’s efforts to other systems.

  • null@lemmy.org
    link
    fedilink
    English
    arrow-up
    23
    arrow-down
    3
    ·
    3 months ago

    Ah, the solution that recognizes there’s no way to eliminate AI from the supply chain after it’s already been introduced.

    • sunbeam60@feddit.uk
      link
      fedilink
      English
      arrow-up
      7
      arrow-down
      8
      ·
      3 months ago

      You make it sound as if there was another choice if just people had better principles. Pray tell us, what would you have done, now. Not in the past, now.

      • null@lemmy.org
        link
        fedilink
        English
        arrow-up
        10
        ·
        3 months ago

        That wasn’t my intent. This is me saying, “of course that’s what they’re going to do because there’s nothing else they can do.”

  • stylusmobilus@aussie.zone
    link
    fedilink
    English
    arrow-up
    23
    arrow-down
    4
    ·
    3 months ago

    any resulting bugs or security flaws firmly onto the shoulders of the human submitting it.

    Watch Americans and their companies pull some mad gymnastics on proportioning blame for this

  • Seth Taylor@lemmy.world
    link
    fedilink
    English
    arrow-up
    32
    arrow-down
    14
    ·
    edit-2
    3 months ago

    Bad actors submitting garbage code aren’t going to read the documentation anyway, so the kernel should focus on holding human developers accountable rather than trying to police the software they run on their local machines.

    “Guns don’t kill people. People kill people”

    Torvalds and the maintainers are acknowledging reality: developers are going to use AI tools to code faster, and trying to ban them is like trying to ban a specific brand of keyboard.

    The author should elaborate on how exactly AI is like “a specific brand of keyboard”. Last I checked a keyboard only enters what I type, without hallucinating 50 extra pages. And if AI, a tool that generates content, is like “a specific brand of keyboard”, does that mean my brain is also a “specific brand of keyboard”?

    I get their point. If you want to create good code by having AI create bad code and then spending twice the time to fix it, feel free to do that. But I’m in favor of a complete ban.

    • ede1998@feddit.org
      link
      fedilink
      English
      arrow-up
      4
      ·
      3 months ago

      Last I checked a keyboard only enters what I type

      I’ve had (broken) keyboard “hallucinate” extra keystrokes before, because of stuck keys. Or ignore keypresses. But yeah, that means the keyboard is broken.

    • Electricd@lemmybefree.net
      link
      fedilink
      English
      arrow-up
      4
      arrow-down
      1
      ·
      3 months ago

      You’re the one comparing AI and guns/killing people, and then saying their metaphorical comparison isn’t accurate? Lol

    • ziproot@lemmy.ml
      link
      fedilink
      English
      arrow-up
      1
      ·
      3 months ago

      Last I checked a keyboard only enters what I type

      I’m assuming the author is talking about mobile keyboards, which have autocomplete and autocorrect.

    • alyth@lemmy.world
      link
      fedilink
      English
      arrow-up
      4
      arrow-down
      4
      ·
      3 months ago

      Out of curiosity how much code have you contributed to the Linux kernel?

  • ell1e@leminal.space
    link
    fedilink
    English
    arrow-up
    18
    arrow-down
    1
    ·
    edit-2
    3 months ago

    Ultimately, the policy legally anchors every single line of AI-generated code

    How would that even be possible? Given the state of things:

    https://dl.acm.org/doi/10.1145/3543507.3583199

    Our results suggest that […] three types of plagiarism widely exist in LMs beyond memorization, […] Given that a majority of LMs’ training data is scraped from the Web without informing content owners, their reiteration of words, phrases, and even core ideas from training sets into generated texts has ethical implications. Their patterns are likely to exacerbate as both the size of LMs and their training data increase, […] Plagiarized content can also contain individuals’ personal and sensitive information.

    https://www.theatlantic.com/technology/2026/01/ai-memorization-research/685552/

    Four popular large language models—OpenAI’s GPT, Anthropic’s Claude, Google’s Gemini, and xAI’s Grok—have stored large portions of some of the books they’ve been trained on, and can reproduce long excerpts from those books. […] This phenomenon has been called “memorization,” and AI companies have long denied that it happens on a large scale. […]The Stanford study proves that there are such copies in AI models, and it is just the latest of several studies to do so.

    https://www.twobirds.com/en/insights/2025/landmark-ruling-of-the-munich-regional-court-(gema-v-openai)-on-copyright-and-ai-training

    The court confirmed that training large language models will generally fall within the scope of application of the text and data mining barriers, […] the court found that the reproduction of the disputed song lyrics in the models does not constitute text and data mining, as text and data mining aims at the evaluation of information such as abstract syntactic regulations, common terms and semantic relationships, whereas the memorisation of the song lyrics at issue exceeds such an evaluation and is therefore not mere text and data mining

    https://www.sciencedirect.com/science/article/pii/S2949719123000213#b7

    In this work we explored the relationship between discourse quality and memorization for LLMs. We found that the models that consistently output the highest-quality text are also the ones that have the highest memorization rate.

    https://arxiv.org/abs/2601.02671

    recent work shows that substantial amounts of copyrighted text can be extracted from open-weight models. However, it remains an open question if similar extraction is feasible for production LLMs, given the safety measures […]. We investigate this question […] our work highlights that, even with model- and system-level safeguards, extraction of (in-copyright) training data remains a risk for production LLMs.

    How does merely tagging the apparently stolen content make it less problematic, given I’m guessing it still won’t have any attribution of the actual source (which for all we know, might often even be GPL incompatible)?

    But I’m not a lawyer, so I guess what do I know. But even from a non-legal angle, what is this road the Linux Foundation seems to embrace of just ignoring the license of projects? Why even have the kernel be GPL then, rather than CC0?

    I don’t get it. And the article calling this “pragmatism” seems absurd to me.

    • FauxLiving@lemmy.world
      link
      fedilink
      English
      arrow-up
      7
      arrow-down
      3
      ·
      edit-2
      3 months ago

      Given the research that you’ve done here I’m going to assume that you’re looking for an answer and not simply taking us on a gish gallop.

      Your premise, and what appears to be the primary source of confusion, is built on the idea that this is ‘stolen’ work which, from a legal point of view, is untrue. If you want to dig into why that is, look into the precedent setting case of Authors Guild, Inc. v. Google, Inc. (2015). The TL;DR is that training AI on copyrighted works falls under the Fair Use exemptions in copyright law. i.e. It is legal, not stealing.

      The case you linked from Munich shows that other country’s legal systems are interpreting AI training in the same way. Training AI isn’t about memorization and plagiarism of existing work, it’s using existing work to learn the underlying patterns.

      That isn’t to say that memorization doesn’t happen, but it is more of a point of interest to AI scientists that are working on understanding how AI represents knowledge internally than a point that lands in a courtrooom.

      We all memorize copyrighted data as part of our learning. You, too, can quote Disney movies or Stephen King novels if prompted in the right way. This doesn’t make any work you create automatically become plagarism, it just means that you have viewed copyrighted work as part of your learning process. In the same way, artists have the capability to create works which violate the copyright of others and they consumed copyrighted works as part of their learning process. These facts don’t taint all of their work, either morally or legally… only the output that literally violates copyright laws.

      The pragmatism here is recognizing that these tools exist and that people use them. The current legal landscape is such that the output of these tools is as if they were the output of the users. If an image generator generates a copyrighted image then the rightsholder can sue the person, not the software. If a code generator generates licensed code then the tool user is responsible.

      This is much like how we don’t restrict the usage of Photoshop despite the fact that it can be used to violate copyright. We, instead, put the burden on the person who operates the tool

      That’s what is happening here. Linus isn’t using his position to promote/enforce/encourage LLM use, nor is he using his position to prevent/restrict/disallow any AI use at all. He is recognizing that this is a tool that exists in the world in 2026 and that his project needs to have procedures that acknowledge this while also ensuring that a human is the one responsible for their submissions.

      This is the definition of pragmatism (def: action or policy dictated by consideration of the immediate practical consequences rather than by theory or dogma).

      e: precedent, not president (I’m blaming the AI/autocorrect on this one)

      • mimavox@piefed.social
        link
        fedilink
        English
        arrow-up
        4
        ·
        3 months ago

        Training AI isn’t about memorization and plagiarism of existing work, it’s using existing work to learn the underlying patterns.

        Thank you. This is exactly what people misunderstands. LLMs aren’t gigantic databases that just shuffles information that they’ve copied from the internet.

  • 404found@lemmy.zip
    link
    fedilink
    English
    arrow-up
    7
    arrow-down
    2
    ·
    3 months ago

    I don’t understand the full picture here, but the person who is submitting AI slop will be held accountable. Never a company.

    So if a company is pushing staff to us AI to complete projects faster and their code ends up being AI slop when submitted, only the person working for the company will be held responsible.

    I’m not sure what the repercussions are here but hopefully it’s not a large fine. Those fines could add up quick if the person is submitting code all the time and doesn’t know they are messing up.

    • Wispy2891@lemmy.world
      link
      fedilink
      English
      arrow-up
      13
      ·
      3 months ago

      Which fines, this is just an internal rule in an organization.

      At most can be rightfully banned from contributing

      It someone is contributing with code that doesn’t really understand, then shouldn’t contribute

      • 404found@lemmy.zip
        link
        fedilink
        English
        arrow-up
        3
        ·
        3 months ago

        Ah okay got it now. Thanks. I didn’t understand it all the way. My comment is irrelevant

  • Venia Silente@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    5
    ·
    3 months ago

    How is this all supposed to be, when AI code can not be copyrighted and thus those submissions to the Linux kernel can not be eg.: GPLv{number}?

    • MoogleMaestro@lemmy.zip
      link
      fedilink
      English
      arrow-up
      8
      ·
      3 months ago

      It’s definitely financially motivated. Linus said himself that AI has been very lucrative for Linux as it has expanded investment from companies that normally wouldn’t give a fuck (he name dropped NVidia specifically) on that one LTT video.