jacob_drori comments on [missing post]

jacob_drori 1 Dec 2023 21:59 UTC
2 points
0
Regarding 3, yeah, I definitely don’t want to say that the LLM in the thought experiment is itself power-seeking. Telling someone how to power-seek is not power seeking.
Regarding 1 and 2, I agree that the problem here is producing an LLM that refuses to give dangerous advice to another agent. I’m pretty skeptical that this can be done in a way that scales, but this could very well be lack of imagination on my part.