• 2xsaiko@discuss.tchncs.de
    link
    fedilink
    arrow-up
    1
    ·
    8 months ago

    None of these examples are for parsing English sentences. They parse completely different formal languages. That it’s text is irrelevant, regex usually operates on text.

    You cannot write a regex to give you for example “the subject of an English sentence”, just as you can’t write a regex to give you “the contents of a complete div tag”, because neither of those are regular languages (HTML is context-free, not sure about English, my guess is it would be considered recursively enumerable).

    You can’t even write a regex to just consume <div> repeated exactly n times followed by </div> repeated exactly n times, because that is already a context-free language instead of a regular language, in fact it is the classic example for a minimal context-free language that Wikipedia also uses.