Here’s what we’re working on in Firefox | The Mozilla Blog

petsoi@discuss.tchncs.de · 3 months ago

Here’s what we’re working on in Firefox | The Mozilla Blog

sweng@programming.dev · edit-2 3 months ago

What do you mean “full set if data”?

Obviously you can not train on 100% of material ever created, so you pick a subset. There is a a lot of permissively licensed content (e.g. Wikipedia) and content you can license (e.g. Reddit). While not sufficient for an advanced LLM, it certainly is for smaller models that do not need wide knowledge.