On this 11th anniversary of the release of Friendship is Optimal, I’d like to remind everyone that it’s a piece of speculative fiction and was a product of it’s time. I’ve said this before in other venues, but Science Marches On and FiO did not predict how things have turned out. The world looks very different.
A decade ago, people speculated that AI would think symbolically and would try to maximize a utility function. Someone would write a Seed AI that would recursively self improve its source code. And since value is complex and fragile, we were unlikely to get our specification of the utility function correct and would create an agent which wanted to do things that conflicted with things we wanted. That’s possible because intelligence doesn’t imply that it would share our values. And the AI would want to disempower us because obtaining power is an instrumental goal of all utility functions. And thus any AI has the incentive to become smarter than all humans and then bide its time until it suddenly disempowers us. You then end up with a cinematic universe filled with focus on formal utility functions, systems which maximize one, formal decision theory, formal game theory, and emulation of other agents to try to figure out how they’ll respond to a given action.
Nothing we have actually looks like this story! Nothing! None of the systems we’ve made have a utility function, at least in the sense of the traditional MIRI narrative! AlphaGo doesn’t have a utility function like that! GPT doesn’t have a utility function like that! None of these things are agents! Even AutoGPT isn’t an agent, in the traditional MIRI sense!
Who a decade ago thought that AI would think symbolically? I’m struggling to think of anyone. There was a debate on LW though, around “cleanly designed” versus “heuristics based” AIs, as to which might come first and which one safety efforts should be focused around. (This was my contribution to it.)
If someone had followed this discussion, there would be no need for dramatic updates / admissions of wrongness, just smoothly (more or less) changing one’s credences as subsequent observations came in, perhaps becoming increasingly pessimistic if one’s hope for AI safety mainly rested on actual AIs being “cleanly designed” (as Eliezer’s did). (I guess I’m a bit peeved that you single out an example of “dramatic update” for praise, while not mentioning people who had appropriate uncertainty all along and updated constantly.)
In what sense doesn’t alphago have a utility function? IIRC, in every step of self-play it’s exploring potential scenarios based on likelihood in the case that it follows its expected value, and then when it plays it just follows expected value according to that experience.
it doesn’t have an explicitly factored utility function that it does entirely runtime reasoning about, though I think you’re right that TurnTrout is overestimating the degree of difference between AlphaGo and the original thing, just because it uses a policy to approximate the results of the search doesn’t mean it isn’t effectively modeling the shape of the reward function. It’s definitely not the same as a strictly defined utility function as originally envisioned, though. Of course, we can talk about whether policies imply utility functions, that’s a different thing and I don’t see any reason to expect otherwise, but then I was one of the people who jumped on deep learning pretty early and thought people were fools to be surprised that alphago was at all strong (though admittedly I lost a bet that it would lose to lee sedol.)
(though admittedly I lost a bet that it would lose to lee sedol.)
Condolances :( I often try to make money of future knowledge only to lose to precise timing or some other specific detail.
I wonder why I missed deep learning. Idk whether I was wrong to, actually. It obviously isn’t AGI. It still can’t do math and so it still can’t check its own outputs. It was obvious that symbolic reasoning was important. I guess I didn’t realize the path to getting my “dreaming brainstuff” to write proofs well would be long, spectacular and profitable.
Hmm, the way humans’ utility function is shattered and strewn about a bunch of different behaviors that don’t talk to each other, I wonder if that will always happen in ML too (until symbolic reasoning and training in the presence of that)
With the additional assumption that GPT-8s weren’t strong or useful enough to build a world where GPT-9 couldn’t go singleton, or where the evals on GPT-9 weren’t good enough to notice it was deceptively aligned or attempting rhetoric hacking.
Excellent retrospective/update. I’m intimately familiar with the emotional difficulty of changing your mind and admitting you were wrong.
Friendship is Optimal is science fiction:
Who a decade ago thought that AI would think symbolically? I’m struggling to think of anyone. There was a debate on LW though, around “cleanly designed” versus “heuristics based” AIs, as to which might come first and which one safety efforts should be focused around. (This was my contribution to it.)
If someone had followed this discussion, there would be no need for dramatic updates / admissions of wrongness, just smoothly (more or less) changing one’s credences as subsequent observations came in, perhaps becoming increasingly pessimistic if one’s hope for AI safety mainly rested on actual AIs being “cleanly designed” (as Eliezer’s did). (I guess I’m a bit peeved that you single out an example of “dramatic update” for praise, while not mentioning people who had appropriate uncertainty all along and updated constantly.)
In what sense doesn’t alphago have a utility function? IIRC, in every step of self-play it’s exploring potential scenarios based on likelihood in the case that it follows its expected value, and then when it plays it just follows expected value according to that experience.
it doesn’t have an explicitly factored utility function that it does entirely runtime reasoning about, though I think you’re right that TurnTrout is overestimating the degree of difference between AlphaGo and the original thing, just because it uses a policy to approximate the results of the search doesn’t mean it isn’t effectively modeling the shape of the reward function. It’s definitely not the same as a strictly defined utility function as originally envisioned, though. Of course, we can talk about whether policies imply utility functions, that’s a different thing and I don’t see any reason to expect otherwise, but then I was one of the people who jumped on deep learning pretty early and thought people were fools to be surprised that alphago was at all strong (though admittedly I lost a bet that it would lose to lee sedol.)
Condolances :( I often try to make money of future knowledge only to lose to precise timing or some other specific detail.
I wonder why I missed deep learning. Idk whether I was wrong to, actually. It obviously isn’t AGI. It still can’t do math and so it still can’t check its own outputs. It was obvious that symbolic reasoning was important. I guess I didn’t realize the path to getting my “dreaming brainstuff” to write proofs well would be long, spectacular and profitable.
Hmm, the way humans’ utility function is shattered and strewn about a bunch of different behaviors that don’t talk to each other, I wonder if that will always happen in ML too (until symbolic reasoning and training in the presence of that)
So from today’s perspective, Friendship is Optimal is a story of someone starting GPT-9 with a silly prompt?
With the additional assumption that GPT-8s weren’t strong or useful enough to build a world where GPT-9 couldn’t go singleton, or where the evals on GPT-9 weren’t good enough to notice it was deceptively aligned or attempting rhetoric hacking.