Do you have a good intuition for why it would make interpretability less valuable or at least lower value compared to the increased risk of failure.
Just discussed this above with Quintin Pope—it’s more a question of interpretability cost scaling unfavorably with complexity from RSO.
IRL/Value Learning is far more difficult than first appearances suggest, see #2
That’s not immediately clear to me. Could you elaborate?
First off, Value learning is essential for successful alignment. It really has two components though: learning the utility/value functions of external agents, and then substituting those as the agent’s own utility/value function. The first part we get for free, the second part is difficult because it conflicts with intrinsic motivation—somehow we need to transition from intrinsic motivation (which is what drives the value learning in the first place), to the learned external motivation. Getting this right was difficult for bio evolution, and seems far more difficult for future ANNs with a far more liquid RSO architecture. I have a future half-written post going deeper into this. I do think we can/should learn a great deal more about how altruism/empathy/VL works in the brain.
Just discussed this above with Quintin Pope—it’s more a question of interpretability cost scaling unfavorably with complexity from RSO.
First off, Value learning is essential for successful alignment. It really has two components though: learning the utility/value functions of external agents, and then substituting those as the agent’s own utility/value function. The first part we get for free, the second part is difficult because it conflicts with intrinsic motivation—somehow we need to transition from intrinsic motivation (which is what drives the value learning in the first place), to the learned external motivation. Getting this right was difficult for bio evolution, and seems far more difficult for future ANNs with a far more liquid RSO architecture. I have a future half-written post going deeper into this. I do think we can/should learn a great deal more about how altruism/empathy/VL works in the brain.