Daniel Kokotajlo comments on [Linkpost] “Governance of superintelligence” by OpenAI

Daniel Kokotajlo 24 May 2023 3:51 UTC
14 points
17
I think you are overestimating how aligned these models are right now, and very much overestimating how aligned they will be in the future absent massive regulations forcing people to pay massive alignment taxes. They won’t be aligned to any users, or any corporations either. Current methods like RLHF will not work on situationally aware, agentic AGIs.

I agree that IF all we had to do to get alignment was the sort of stuff we are currently doing, the world would be as you describe. But instead there will be a significant safety tax.
What links here?
- The case for a negative alignment tax by Cameron Berg (18 Sep 2024 18:33 UTC; 74 points)