‘non-robust features’ in image classification: they exist, and predict out of sample, but it’s difficult to say what they are
stylometrics: in natural language analysis, author identification can be done well by looking at use of particle words like ‘the’ or ‘an’. We find it difficult to impossible to notice subtle changes in frequency of use of hundreds of common words, but statistical models can integrate them and identify authors in cases where humans fail.
You have to admit, in terms of the Eliezeresque definition of ‘agency/optimization power’ as ‘steering future states towards a small region of state-space’, aaa is the most agentic prompt of all! (aaaaaaaah -!)
Now I want a “who would win” meme, with something like “agentic misaligned deceptive mesa optimizer scheming to take over the world” on the left side, and “one screamy boi” on the right.
More examples beyond CycleGAN:
‘non-robust features’ in image classification: they exist, and predict out of sample, but it’s difficult to say what they are
stylometrics: in natural language analysis, author identification can be done well by looking at use of particle words like ‘the’ or ‘an’. We find it difficult to impossible to notice subtle changes in frequency of use of hundreds of common words, but statistical models can integrate them and identify authors in cases where humans fail.
degenerate completions/the repetition trap: aaaaaaaaaaaaaaaaa -!
Ah yes, aaaaaaaaaaaaaaaaa, the most agentic string
You have to admit, in terms of the Eliezeresque definition of ‘agency/optimization power’ as ‘steering future states towards a small region of state-space’, aaa is the most agentic prompt of all! (aaaaaaaah -!)
Now I want a “who would win” meme, with something like “agentic misaligned deceptive mesa optimizer scheming to take over the world” on the left side, and “one screamy boi” on the right.