Roman Leventov comments on Role Architectures: Applying LLMs to consequential tasks

Roman Leventov 31 Mar 2023 8:42 UTC
1 point
0
This could be facilitated by employing AI systems in red-team roles, employing systems that can plan malign behaviors as hypothetical challenges to detection and defense strategies. In this way, worst-case misaligned plans can contribute to achieving aligned outcomes.
I doubt we should even try this. Our experience thus far with gain-of-function research shows that it’s on the net bad rather than good.