That’s perhaps why image generators are comparatively better than text generators*. But there’s still something off: by your example it seems that the model cannot reliably use clues like position to understand “this is a «leg»”. And I don’t know much about image generators but I think that they’re still statistics- and probability-based.
That’s perhaps why image generators are comparatively better than text generators*. But there’s still something off: by your example it seems that the model cannot reliably use clues like position to understand “this is a «leg»”. And I don’t know much about image generators but I think that they’re still statistics- and probability-based.