I do like the comparison to cryptography, as that is a field I “take seriously” and does also have the issue of it being very difficult to “fairly” define terms.
Indistinguishability under chosen plain text attack being the definition for something to be canonically “secure” seems a lot more defensible than “properly modeling this random weird utility game maybe means something for AGI ??” but I get why it’s a similar sort of issue
I do like the comparison to cryptography, as that is a field I “take seriously” and does also have the issue of it being very difficult to “fairly” define terms.
Indistinguishability under chosen plain text attack being the definition for something to be canonically “secure” seems a lot more defensible than “properly modeling this random weird utility game maybe means something for AGI ??” but I get why it’s a similar sort of issue