Ben Smith comments on Looking back on my alignment PhD

Ben Smith 3 Jul 2022 1:26 UTC
3 points
0
Not sure what I was thinking about, but probably just that my understanding is that “safe AGI via AUP” would have to penalize the agent for learning to achieve anything not directly related to the end goal, and that might make it too difficult to actually achieve the end goal when e.g. it turns out to need tangentially related behavior.
Your “social dynamics” section encouraged me to be bolder sharing my own ideas on this forum, and I wrote up some stuff today that I’ll post soon, so thank you for that!