Thanks for writing this! I strongly appreciate a well-thought out post in this direction.
My own level of worry is pretty dependent on a belief that we know and understand shaping NN behaviors much better than we do (values/goals/motivations/desires) (although I don’t think eg chatGPT has any of the latter in the first place). Do you have thoughts on the distinction between behaviors and goals? In particular, do you feel like you have any evidence we know how to shape/create/guide goals and values, rather than just behaviors?
Thanks for writing this! I strongly appreciate a well-thought out post in this direction.
My own level of worry is pretty dependent on a belief that we know and understand shaping NN behaviors much better than we do (values/goals/motivations/desires) (although I don’t think eg chatGPT has any of the latter in the first place). Do you have thoughts on the distinction between behaviors and goals? In particular, do you feel like you have any evidence we know how to shape/create/guide goals and values, rather than just behaviors?