  • If it averages several instances, with enough signal you could decompose a linear combination (e.g. average) of different patterns back out into its constituent parts.

    A smarter system won’t just take the mean of the votes from different instances but rather discard outliers as invalid input (flagging repeat offenders to be ignored in the future) and use the median or mode of the remainder. The results should also be quantitized to avoid leaking details about sources or internal algorithms; only the larger trends need to be reported.

    Of course you could always just keep the collected data private and only provide it to customers willing to pay $$$ for access, which handily limits instance operators’ ability to reverse-engineer the source of the data. And nothing prevents you from using separate instances for public and private data sets.

  • I’d settle for just the limits, personally.

    The part that makes me the most paranoid is the outbound data. They set every VM up with a 5 Gbps symmetric link, which is cool and all, but then you get charged based on how much data you send. When everything’s working properly that’s not an issue as the data size is predictable, but if something goes wrong you could end up with a huge bill before you even find out about the problem. My solution, for my own peace of mind, was to configure traffic shaping inside the VM to throttle the uplink to a more manageable speed and then set alarms which will automatically shut down the instance after observing sustained high traffic, either short-term or long-term. That’s still reliant on correct configuration, however, and consumes a decent chunk of the free-tier alarms. I’d prefer to be able to set hard spending limits for specific services like CPU time and network traffic and not have to worry about accidentally running up a bill.

  • Section 6 of the GPLv3, which the LGPLv3 includes by reference as one of the required distribution terms in paragraph 4.d.0:

    Convey the Minimal Corresponding Source under the terms of this License, and the Corresponding Application Code in a form suitable for, and under terms that permit, the user to recombine or relink the Application with a modified version of the Linked Version to produce a modified Combined Work, in the manner specified by section 6 of the GNU GPL for conveying Corresponding Source.

    (emphasis added) There is the alternative of following 4.d.1 instead, but that’s only if the application links against a shared library already present on the user’s computer system—it couldn’t be distributed with the program.

    GPLv3 section six offers five alternative methods of satisfying the obligation to provide source code. The first (6.a) applies only to physical distribution and must include source code with the physical media. The second (6.b) also requires physical distribution plus a written offer to provide the source code to anyone possessing the object code. The third (6.c) is the one I mentioned that applies only “occasionally and noncommercially” for those who received a written offer themselves under the previous clause. The fourth option (6.d) allows for the source to be provided through a network server:

    If the place to copy the object code is a network server, the Corresponding Source may be on a different server (operated by you or a third party) that supports equivalent copying facilities, provided you maintain clear directions next to the object code saying where to find the Corresponding Source. Regardless of what server hosts the Corresponding Source, you remain obligated to ensure that it is available for as long as needed to satisfy these requirements.

    The fifth and final alternative (6.e) pertains to object code provided through P2P distribution, with the same requirements as the fourth method for the source code.

  • The GPL in most cases only requires that derivative work must also be shipped with the same license. The source code from providers doesn’t have to be distributed by unity, it has to be distributed by the provider.

    This is incorrect. The distributor of derivative works in binary form is responsible for providing the source code. They can refer to a server operated by a third party, but if that third party stops providing the source code the distributor remains obligated to ensure that it is still available. The only exception is for binaries which were originally received with a written offer of source code, where the offer can be passed on as-is, but that only applies for “occasional and non-commercial” distribution which wouldn’t work here.

  • Sure, they don’t rule the world. They only have the power to ban you (either the company per se or its individual owners, officers, and/or employees) from ever again doing any business in the EU. Which naturally includes business with any individuals or companies either based in the EU (as a seller or a buyer) or wanting to do business in the EU. Or from traveling to the EU, whether for business or personal reasons. Little things like that. Nothing too inconvenient. (/s)

    They haven’t taken things quite that far—yet. But they could. It’s dangerous to assume that you can ignore them without consequences just because your company doesn’t currently depend on revenue from EU customers. The world is more interconnected than that, and the consequences may not be limited to your company.

  • Geoblocking in such cases would not be sufficient. For one thing your geo-IP database will never be perfectly accurate, even without considering that “data subjects who are in the Union” can connect to your site via proxies or VPNs with non-EU IP addresses. For another you still need to respond to GDPR requests e.g. to remove data collected on a data subject currently residing in the EU, even if the data was collected while they were outside the EU, and you can’t do that if you’re blocking their access to the site. For a newspaper in particular the same would apply to any EU data subject they happened to report on, whether they had previously visited the site or not.

  • They never should have made opt-in an option in the first place. All the legitimate reasons to store data are already permitted without asking permission (required for the site to function, or storing data the user specifically asked the site to store such as settings). All that’s left is things no one would reasonably choose to consent to if they fully understood the question, so they should have just legislated that the answer is always “no”. That plus a bit more skepticism about what sites really “need” to perform their function properly. (As that function is understood by the user—advertising is not a primary function of most sites, or desired by their users, so “needed for advertising to work” does not make a cookie “functional” in nature. Likewise for “we need this ad revenue to offer the site for free”; you could use that line to justify any kind of monetization of private user data.)

  • So you're not remapping the source ports to be unique? There's no mechanism to avoid collisions when multiple clients use the same source port? Full Cone NAT implies that you have to remember the mapping (potentially indefinitely—if you ever reassign a given external IP:port combination to a different internal IP or port after it's been used you're not implementing Full Cone NAT), but not that the internal and external ports need to be identical. It would generally only be used when you have a large enough pool of external IP addresses available to assign a unique external IP:port for every internal IP:port. Which usually implies a unique external IP for each internal IP, as you can't restrict the number of unique ports used by each client. This is why most routers only implement Symmetric NAT.

    (If you do have sufficient external IPs the Linux kernel can do Full Cone NAT by translating only the IP addresses and not the ports, via SNAT/DNAT prefix mapping. The part it lacks, for very practical reasons, is support for attempting to create permanent unique mappings from a larger number of unconstrained internal IP:port combinations to a smaller number of external ones.)

  • What "increased risks as far as csam"? You're not hosting any yourself, encrypted or otherwise. You have no access to any data being routed through your node, as it's encrypted end-to-end and your node is not one of the endpoints. If someone did use I2P or Tor to access CSAM and your node was randomly selected as one of the intermediate onion routers there is no reason for you to have any greater liability for it than any of the ISPs who are also carrying the same traffic without being able to inspect the contents. (Which would be equally true for CSAM shared over HTTPS—I2P & Tor grant anonymity but any standard password-protected web server with TLS would obscure the content itself from prying eyes.)

  • No, that's not how I2P works.

    First, let's start with the basics. An exit node is a node which interfaces between the encrypted network (I2P or Tor) and the regular Internet. A user attempting to access a regular Internet site over I2P or Tor would route their traffic through the encrypted network to an exit node, which then sends the request over the Internet without the I2P/Tor encryption. Responses follow the reverse path back to the user. Nodes which only establish encrypted connections to other I2P or Tor nodes, including ones used for internal (onion) routing, are not exit nodes.

    Both I2P and Tor support the creation of services hosted directly through the encrypted network. In Tor these are referred to as onion services and are accessed through *.onion hostnames. In I2P these internal services (*.i2p or *.b32) are the only kind of service the protocol directly supports—though you can configure a specific I2P service linked to a HTTP/HTTPS proxy to handle non-I2P URLs in the client configuration. There are only a few such proxy services as this is not how I2P is primarily intended to be used.

    Tor, by contrast, has built-in support for exit nodes. Routing traffic anonymously from Tor users to the Internet is the original model for the Tor network; onion services were added later. There is no need to choose an exit node in Tor—the system maintains a list and picks one automatically. Becoming a Tor exit node is a simple matter of enabling an option in the settings, whereas in I2P you would need to manually configure a proxy server, inform others about it, and have them adjust their proxy configuration to use it.

    If you set up an I2P node and do not go out of your way to expose a HTTP/HTTPS proxy as an I2P service then no traffic from the I2P network can be routed to non-I2P destinations via your node. This is equivalent to running a Tor internal, non-exit node, possibly hosting one or more onion services.

  • It is not true that every node is an exit node in I2P. The I2P protocol does not officially have exit nodes—all I2P communication terminates at some node within the I2P network, encrypted end-to-end. It is possible to run a local proxy server and make it accessible to other users as an I2P service, creating an "exit node" of sorts, but this is something that must be set up deliberately; it's not the default or recommended configuration. Users would need to select a specific I2P proxy service (exit node) to forward non-I2P traffic through and configure their browser (or other network-based programs) to use it.