I wouldn’t argue that self-aware systems are automatically dangerous, but rather that self-unaware systems are automatically safe (or at least comparatively pretty safe).
Fair enough.
Most people in AI safety, most of the time, are talking about self-aware (in my minimal sense of taking purposeful actions etc.) agent-like systems. I don’t think such systems are automatically dangerous, but they do necessitate solving the alignment problem, and since we haven’t solved the alignment problem yet, I think it’s worth spending time exploring alternative approaches.
I suspect the important part is the agent-like part.
I’m not sure it makes to think of “the alignment problem” as a singularity entity. I’d rather taboo “the alignment problem” and just ask what could go wrong with a self-aware system that’s not agent-like.
A self-unaware system will not do that because it is not aware that it can do things to affect the universe.
Hot take: it might be useful to think of “self-awareness” and “awareness that it can do things to affect the universe” separately. Not sure they are one and the same.
Fair enough.
I suspect the important part is the agent-like part.
I’m not sure it makes to think of “the alignment problem” as a singularity entity. I’d rather taboo “the alignment problem” and just ask what could go wrong with a self-aware system that’s not agent-like.
Hot take: it might be useful to think of “self-awareness” and “awareness that it can do things to affect the universe” separately. Not sure they are one and the same.
What is a “system that’s not agent-like” in your perspective? How might it be built? Have you written anything about that?
For my part, I thought Rohin’s “AI safety without goal-directed behavior” is a good start, but that we need much more and deeper analysis of this topic.