Wei Dai comments on Evolution is a bad analogy for AGI: inner alignment

Wei Dai 17 Jan 2024 23:06 UTC
LW: 4 AF: 2
1
AF

In contrast, the sorts of things that we humans end up valuing are usually the sorts of things that are easy to form abstractions around. Thus, we are not doomed by the same difficulty that likely prevented evolution from aligning humans to inclusive genetic fitness.

Many humans (especially me) value normativity (doing what is right) and philosophy, which are mysterious and/or contentious even to professional philosophers. Do you think it will be easy to align AIs to these values? If so, can you please go into some detail about this? For example, if there are many competing ideas for what normativity and philosophy really are, how can we ensure that AI will learn to value the right ones?

Edit: since writing this post, I’ve learned a lot more about inductive biases and what deep learning theory we currently have, so my relative weightings have shifted quite a lot towards “current results in machine learning”.

I’m curious what learnings you’re referring to. If you’ve written about it somewhere, please add a link to the post?
What links here?
- Wei Dai's comment on Evolution is a bad analogy for AGI: inner alignment by Quintin Pope (19 Jan 2024 3:25 UTC; 3 points)