Possible post on suspicious multidimensional pessimism:
I think MIRI people (specifically Soares and Yudkowsky but probably others too) are more pessimistic than the alignment community average on several different dimensions, both technical and non-technical: morality, civilizational response, takeoff speeds, probability of easy alignment schemes working, and our ability to usefully expand the field of alignment. Some of this is implied by technical models, and MIRI is not more pessimistic in every possible dimension, but it’s still awfully suspicious.
I strongly suspect that one of the following is true:
the MIRI “optimism dial” is set too low
everyone else’s “optimism dial” is set too high. (Yudkowsky has said this multiple times in different contexts)
There are common generators that I don’t know about that are not just an “optimism dial”, beyond MIRI’s models
I’m only going to actually write this up if there is demand; the full post will have citations which are kind of annoying to find.
After working at MIRI (loosely advised by Nate Soares) for a while, I now have more nuanced views and also takes on Nate’s research taste. It seems kind of annoying to write up so I probably won’t do it unless prompted.
I would be genuinely curious to hear your more nuanced views and takes on Nate s research taste. This is really quite interesting to me and even a single paragraph would be valuable!
I really want to see the post on multidimensional pessimism.
As for why, I’d argue 1 is happening.
For examples of 1, a good example of this is FOOM probabilities. I think MIRI hasn’t updated on the evidence that FOOM is likely impossible for classical computers, and this ought to lower their probabilities to the chance that quantum/reversible computers appear.
Another good example is the emphasis on pivotal acts like “burn all GPUs.” I think MIRI has too much probability mass on it being necessary, primarily because I think that they are biased by fiction, where problems must be solved by heroic acts, while in the real world more boring things are necessary. In other words, it’s too exciting, which should be suspicious.
However that doesn’t mean alignment is much easier. We can still fail, there’s no rule that we make it through. It’s that MIRI is systematically irrational here regarding doom probabilities or alignment.
Edit: I now think alignment is way, way easier than my past self, so I disendorse this sentence “However that doesn’t mean alignment is much easier.”
What constitutes pessimism about morality, and why do you think that one fits Eliezer? He certainly appears more pessimistic across a broad area, and has hinted at concrete arguments for being so.
Value fragility / value complexity. How close do you need to get to human values to get 50% of the value of the universe, and how complicated must the representation be? Also in the past there was orthogonality, but that’s now widely believed.
I think the distance from human values or complexity of values is not a crux, as web/books corpus overdetermines them in great detail (for corrigibility purposes). It’s mostly about alignment by default, whether human values in particular can be noticed in there, or if correctly specifying how to find them is much harder than finding some other deceptively human-value-shaped thing. If they can be found easily once there are tools to go looking for them at all, it doesn’t matter how complex they are or how important it is to get everything right, that happens by default.
But also there is this pervasive assumption of it being possible to formulate values in closed form, as tractable finite data, which occasionally fuels arguments. Like, value is said to be complex, but of finite complexity. In an open environment, this doesn’t need to be the case, a code/data distinction is only salient when we can make important conclusions by only looking at code and not at data. In an open environment, data is unbounded, can’t be demonstrated all at once. So it doesn’t make much sense to talk about complexity of values at all, without corrigibility alignment can’t work out anyway.
See, MIRI in the past has sounded dangerously optimistic to me on that score. While I thought EY sounded more sensible than the people pushing genetic enhancement of humans, it’s only now that I find his presence reassuring, thanks in part to the ongoing story he’s been writing. Otherwise I might be yelling at MIRI to be more pessimistic about fragility of value, especially with regard to people who might wind up in possession of a corrigible ‘Tool AI’.
I’d be very interested in a write-up, especially if you have receipts for pessimism which seems to be poorly calibrated, e.g. based on evidence contrary to prior predictions.
Possible post on suspicious multidimensional pessimism:
I think MIRI people (specifically Soares and Yudkowsky but probably others too) are more pessimistic than the alignment community average on several different dimensions, both technical and non-technical: morality, civilizational response, takeoff speeds, probability of easy alignment schemes working, and our ability to usefully expand the field of alignment. Some of this is implied by technical models, and MIRI is not more pessimistic in every possible dimension, but it’s still awfully suspicious.
I strongly suspect that one of the following is true:
the MIRI “optimism dial” is set too low
everyone else’s “optimism dial” is set too high. (Yudkowsky has said this multiple times in different contexts)
There are common generators that I don’t know about that are not just an “optimism dial”, beyond MIRI’s models
I’m only going to actually write this up if there is demand; the full post will have citations which are kind of annoying to find.
After working at MIRI (loosely advised by Nate Soares) for a while, I now have more nuanced views and also takes on Nate’s research taste. It seems kind of annoying to write up so I probably won’t do it unless prompted.
Edit: this is now up
I would be genuinely curious to hear your more nuanced views and takes on Nate s research taste. This is really quite interesting to me and even a single paragraph would be valuable!
I really want to see the post on multidimensional pessimism.
As for why, I’d argue 1 is happening.
For examples of 1, a good example of this is FOOM probabilities. I think MIRI hasn’t updated on the evidence that FOOM is likely impossible for classical computers, and this ought to lower their probabilities to the chance that quantum/reversible computers appear.
Another good example is the emphasis on pivotal acts like “burn all GPUs.” I think MIRI has too much probability mass on it being necessary, primarily because I think that they are biased by fiction, where problems must be solved by heroic acts, while in the real world more boring things are necessary. In other words, it’s too exciting, which should be suspicious.
However that doesn’t mean alignment is much easier. We can still fail, there’s no rule that we make it through. It’s that MIRI is systematically irrational here regarding doom probabilities or alignment.
Edit: I now think alignment is way, way easier than my past self, so I disendorse this sentence “However that doesn’t mean alignment is much easier.”
What constitutes pessimism about morality, and why do you think that one fits Eliezer? He certainly appears more pessimistic across a broad area, and has hinted at concrete arguments for being so.
Value fragility / value complexity. How close do you need to get to human values to get 50% of the value of the universe, and how complicated must the representation be? Also in the past there was orthogonality, but that’s now widely believed.
I think the distance from human values or complexity of values is not a crux, as web/books corpus overdetermines them in great detail (for corrigibility purposes). It’s mostly about alignment by default, whether human values in particular can be noticed in there, or if correctly specifying how to find them is much harder than finding some other deceptively human-value-shaped thing. If they can be found easily once there are tools to go looking for them at all, it doesn’t matter how complex they are or how important it is to get everything right, that happens by default.
But also there is this pervasive assumption of it being possible to formulate values in closed form, as tractable finite data, which occasionally fuels arguments. Like, value is said to be complex, but of finite complexity. In an open environment, this doesn’t need to be the case, a code/data distinction is only salient when we can make important conclusions by only looking at code and not at data. In an open environment, data is unbounded, can’t be demonstrated all at once. So it doesn’t make much sense to talk about complexity of values at all, without corrigibility alignment can’t work out anyway.
See, MIRI in the past has sounded dangerously optimistic to me on that score. While I thought EY sounded more sensible than the people pushing genetic enhancement of humans, it’s only now that I find his presence reassuring, thanks in part to the ongoing story he’s been writing. Otherwise I might be yelling at MIRI to be more pessimistic about fragility of value, especially with regard to people who might wind up in possession of a corrigible ‘Tool AI’.
I’d be very interested in a write-up, especially if you have receipts for pessimism which seems to be poorly calibrated, e.g. based on evidence contrary to prior predictions.
I think they pascals-mugged themselves and being able to prove they were wrong efficiently would be helpful