Regarding 3, yeah, I definitely don’t want to say that the LLM in the thought experiment is itself power-seeking. Telling someone how to power-seek is not power seeking.
Regarding 1 and 2, I agree that the problem here is producing an LLM that refuses to give dangerous advice to another agent. I’m pretty skeptical that this can be done in a way that scales, but this could very well be lack of imagination on my part.
Regarding 3, yeah, I definitely don’t want to say that the LLM in the thought experiment is itself power-seeking. Telling someone how to power-seek is not power seeking.
Regarding 1 and 2, I agree that the problem here is producing an LLM that refuses to give dangerous advice to another agent. I’m pretty skeptical that this can be done in a way that scales, but this could very well be lack of imagination on my part.