In what sense doesn’t alphago have a utility function? IIRC, in every step of self-play it’s exploring potential scenarios based on likelihood in the case that it follows its expected value, and then when it plays it just follows expected value according to that experience.
it doesn’t have an explicitly factored utility function that it does entirely runtime reasoning about, though I think you’re right that TurnTrout is overestimating the degree of difference between AlphaGo and the original thing, just because it uses a policy to approximate the results of the search doesn’t mean it isn’t effectively modeling the shape of the reward function. It’s definitely not the same as a strictly defined utility function as originally envisioned, though. Of course, we can talk about whether policies imply utility functions, that’s a different thing and I don’t see any reason to expect otherwise, but then I was one of the people who jumped on deep learning pretty early and thought people were fools to be surprised that alphago was at all strong (though admittedly I lost a bet that it would lose to lee sedol.)
(though admittedly I lost a bet that it would lose to lee sedol.)
Condolances :( I often try to make money of future knowledge only to lose to precise timing or some other specific detail.
I wonder why I missed deep learning. Idk whether I was wrong to, actually. It obviously isn’t AGI. It still can’t do math and so it still can’t check its own outputs. It was obvious that symbolic reasoning was important. I guess I didn’t realize the path to getting my “dreaming brainstuff” to write proofs well would be long, spectacular and profitable.
Hmm, the way humans’ utility function is shattered and strewn about a bunch of different behaviors that don’t talk to each other, I wonder if that will always happen in ML too (until symbolic reasoning and training in the presence of that)
In what sense doesn’t alphago have a utility function? IIRC, in every step of self-play it’s exploring potential scenarios based on likelihood in the case that it follows its expected value, and then when it plays it just follows expected value according to that experience.
it doesn’t have an explicitly factored utility function that it does entirely runtime reasoning about, though I think you’re right that TurnTrout is overestimating the degree of difference between AlphaGo and the original thing, just because it uses a policy to approximate the results of the search doesn’t mean it isn’t effectively modeling the shape of the reward function. It’s definitely not the same as a strictly defined utility function as originally envisioned, though. Of course, we can talk about whether policies imply utility functions, that’s a different thing and I don’t see any reason to expect otherwise, but then I was one of the people who jumped on deep learning pretty early and thought people were fools to be surprised that alphago was at all strong (though admittedly I lost a bet that it would lose to lee sedol.)
Condolances :( I often try to make money of future knowledge only to lose to precise timing or some other specific detail.
I wonder why I missed deep learning. Idk whether I was wrong to, actually. It obviously isn’t AGI. It still can’t do math and so it still can’t check its own outputs. It was obvious that symbolic reasoning was important. I guess I didn’t realize the path to getting my “dreaming brainstuff” to write proofs well would be long, spectacular and profitable.
Hmm, the way humans’ utility function is shattered and strewn about a bunch of different behaviors that don’t talk to each other, I wonder if that will always happen in ML too (until symbolic reasoning and training in the presence of that)