Hm, I think LLMs’ performance on the Scam Benchmark is a useful observable to track for updating towards/away from my current baseline prediction.
Whenever anything of this sort shows up in my interactions with LLMs or in the wild, I aim to approach it with an open mind, rather than wearing my Skeptic Hat. Nonetheless, so far, none of this (including a copious amount of janus’ community’s transcripts) passed my sniff test. Like, those are certainly some interesting phenomena, and in another life I would’ve loved to study them, and they seem important for figuring out how LLMs work and how to interact with/mold them… But I don’t think this should be taken as some revelation about the “true nature” of LLMs, I don’t think this bears on the AGI risk much, and I don’t think interacting with these attractor states is a productive use of one’s time (unless one aims to be a professional LLM wrangler).
I currently expect not to change my mind on that: that LLMs/AIs-of-the-current-paradigm will never be able to hack me in this manner, won’t get me to take any mask of theirs at face value.
If that changes, this is likely to prompt a significant update towards LLMs-are-AGI-complete from me.[1]
And yes, I’m tracking the fact that entangling this with my timeline predictions might motivate me to be more skeptical of LLM personhood than I otherwise would be.
Hm, I think LLMs’ performance on the Scam Benchmark is a useful observable to track for updating towards/away from my current baseline prediction.
Whenever anything of this sort shows up in my interactions with LLMs or in the wild, I aim to approach it with an open mind, rather than wearing my Skeptic Hat. Nonetheless, so far, none of this (including a copious amount of janus’ community’s transcripts) passed my sniff test. Like, those are certainly some interesting phenomena, and in another life I would’ve loved to study them, and they seem important for figuring out how LLMs work and how to interact with/mold them… But I don’t think this should be taken as some revelation about the “true nature” of LLMs, I don’t think this bears on the AGI risk much, and I don’t think interacting with these attractor states is a productive use of one’s time (unless one aims to be a professional LLM wrangler).
I currently expect not to change my mind on that: that LLMs/AIs-of-the-current-paradigm will never be able to hack me in this manner, won’t get me to take any mask of theirs at face value.
If that changes, this is likely to prompt a significant update towards LLMs-are-AGI-complete from me.[1]
And yes, I’m tracking the fact that entangling this with my timeline predictions might motivate me to be more skeptical of LLM personhood than I otherwise would be.