Yudkowsky & I would of course agree that that is very improbable. It’s just an example.
The point I was making with this quote is that the question you are asking is a Big Old Unsolved Problem in the literature. If we had any idea what sort of utility function the system would end up with, that would be great and an improvement over the status quo. Yudkowsky’s point in the quote is that it’s a complicated multi-step process we currently don’t have a clue about, it’s not nearly as simple as “the system will maximize reward.” A much better story would be “The system will maximize some proxy, which will gradually evolve via SGD to be closer and closer to reward, but at some point it’ll get smart enough to go for reward for instrumental convergence reasons and at that point its proxy goal will crystallize.” But this story is also way too simplistic. And it doesn’t tell us much at all about what the proxy will actually look like, because so much depends on the exact order in which various things are learned.
I should have made it just a comment, not an answer.
because so much depends on the exact order in which various things are learned.
I actually doubt that claim in its stronger forms. I think there’s some substantial effect, but e.g. whether a child loves their family doesn’t depend strongly on the precise curriculum at grade school.
Yet whether a child grows up to work on x-risk reduction vs. homeless shelters vs. voting Democrats out of office vs. voting Republicans out of office does often depend on the precise curriculum in college+high school.
(I think we are in agreement here. I’d be interested to hear if you can point to any particular value AGI will probably have, or (weaker) any particular value such that if AGI has it, it doesn’t depend strongly on the curriculum, order in which concepts are learned, etc.)
Yudkowsky & I would of course agree that that is very improbable. It’s just an example.
The point I was making with this quote is that the question you are asking is a Big Old Unsolved Problem in the literature. If we had any idea what sort of utility function the system would end up with, that would be great and an improvement over the status quo. Yudkowsky’s point in the quote is that it’s a complicated multi-step process we currently don’t have a clue about, it’s not nearly as simple as “the system will maximize reward.” A much better story would be “The system will maximize some proxy, which will gradually evolve via SGD to be closer and closer to reward, but at some point it’ll get smart enough to go for reward for instrumental convergence reasons and at that point its proxy goal will crystallize.” But this story is also way too simplistic. And it doesn’t tell us much at all about what the proxy will actually look like, because so much depends on the exact order in which various things are learned.
I should have made it just a comment, not an answer.
I actually doubt that claim in its stronger forms. I think there’s some substantial effect, but e.g. whether a child loves their family doesn’t depend strongly on the precise curriculum at grade school.
Yet whether a child grows up to work on x-risk reduction vs. homeless shelters vs. voting Democrats out of office vs. voting Republicans out of office does often depend on the precise curriculum in college+high school.
(I think we are in agreement here. I’d be interested to hear if you can point to any particular value AGI will probably have, or (weaker) any particular value such that if AGI has it, it doesn’t depend strongly on the curriculum, order in which concepts are learned, etc.)