You don’t think nearly 1/6th is statistically significant? What’s the lower bound on significance as you see things?
To be clear, it’s obviously dumb for their generative system to be overrepresenting turbans like this, although it’s likely to be a bias in the inputs rather than something the system came up with itself, I just think that 5% is generally enough to be considered significant and calling three times that not significant confuses me.
The fact less people of that group actually wear it than do is significant when you want an average sample. When categorizing a collection of images then, naturally, the traditional garments of a group is associated more with that group than any other group: 1/6 is bigger than any other race.
It’s not an LLM, it’s a GAN and it’s inner workings are very different.
If that 1/6th has the most positive feedback in recognizability, for the GAN it becomes a high weighted part of the standard. These model’s categorizing flow favors unique features of images.
You don’t think nearly 1/6th is statistically significant? What’s the lower bound on significance as you see things?
To be clear, it’s obviously dumb for their generative system to be overrepresenting turbans like this, although it’s likely to be a bias in the inputs rather than something the system came up with itself, I just think that 5% is generally enough to be considered significant and calling three times that not significant confuses me.
5/6 not wearing them seems more statistically significant
The fact less people of that group actually wear it than do is significant when you want an average sample. When categorizing a collection of images then, naturally, the traditional garments of a group is associated more with that group than any other group: 1/6 is bigger than any other race.
so if there was a country where 1 in 6 people had blue skin you would consider that insignificant because 5 out of 6 didn’t?
For a caricature of the population? Yes, that’s not what the algorithm should be optimising for.
The algorithm is optimizing for results that are rated as precise, not just frequent.
For statistics’ sake? Yes.
For the LLM bias? No.
It’s not an LLM, it’s a GAN and it’s inner workings are very different.
If that 1/6th has the most positive feedback in recognizability, for the GAN it becomes a high weighted part of the standard. These model’s categorizing flow favors unique features of images.