(Btw, is this point in any of the papers? Do people agree it should be?)
Please clarify: Do you mean that since even a slow-takeoff AGI will eventually explode and become by default unfriendly, we have to work on FAI theory whether there will be a fast or a slow takeoff?
Yes, that seems straightforward, though I don’t know if it has been said explicitly.
But the question is whether we should also work on other approaches as stopgaps, whether during a slow take off or before a takeoff begins.
Statements to the effect that it’s necessary to argue that hard takeoff is probable/possible in order to motivate FAI research appear regularly, even your post left this same impression. I don’t think it’s particularly relevant, so having this argument written up somewhere might be useful.
since even a slow-takeoff AGI will eventually explode
Doesn’t need to explode, gradual growth into a global power strong enough to threaten humans is sufficient. With WBE value drift, there doesn’t even need to be any conflict or any AGI, humanity as a whole might lose its original values.
Statements to the effect that it’s necessary to argue that hard takeoff is probable/possible in order to motivate FAI research appear regularly, even your post left this same impression.
No, I didn’t want to give that impression. SI’s research direction is the most important one, regardless of whether we face a fast or slow takeoff. The question raised was whether other approaches are needed too.
It is a bad thing, in the sense that “bad” is whatever I (normatively) value less than the other available alternatives, and value-drifted WBEs won’t be optimizing the world in a way that I value. The property of valuing the world in a different way, and correspondingly of optimizing the world in a different direction which I don’t value as much, is the “value drift” I’m talking about. In other words, if it’s not bad, there isn’t much value drift; and if there is enough value drift, it is bad.
You’re right in a sense that we’d like to avoid it, but if it occurs gradually, it feels much more like “we just changed our minds” (like we definitely don’t value “honor” as much as the ancient greeks, etc), as compared to “we and our values were wiped out”.
The problem is not with “losing our values”, it’s about the future being optimized to something other than our values. The details of the process that leads to the incorrectly optimized future are immaterial, it’s the outcome that matters. When I say “our values”, I’m referring to a fixed idea, which doesn’t depend on what happens in the future, in particular it doesn’t depend on whether there are people with these or different values in the future.
I think one reason why people (including me, in the past) have difficulty accepting the way you present this argument is that you’re speaking in too abstract terms, while many of the values that we’d actually like to preserve are ones that we appreciate the most if we consider them in “near” mode. It might work better if you gave concrete examples of ways by which there could be a catastrophic value drift, like naming Bostrom’s all-work-and-no-fun scenario where
what will maximize fitness in the future will be nothing but non-stop high-intensity drudgery, work of a drab and repetitive nature, aimed at improving the eighth decimal of some economic output measure
Please clarify: Do you mean that since even a slow-takeoff AGI will eventually explode and become by default unfriendly, we have to work on FAI theory whether there will be a fast or a slow takeoff?
Yes, that seems straightforward, though I don’t know if it has been said explicitly.
But the question is whether we should also work on other approaches as stopgaps, whether during a slow take off or before a takeoff begins.
Statements to the effect that it’s necessary to argue that hard takeoff is probable/possible in order to motivate FAI research appear regularly, even your post left this same impression. I don’t think it’s particularly relevant, so having this argument written up somewhere might be useful.
Doesn’t need to explode, gradual growth into a global power strong enough to threaten humans is sufficient. With WBE value drift, there doesn’t even need to be any conflict or any AGI, humanity as a whole might lose its original values.
No, I didn’t want to give that impression. SI’s research direction is the most important one, regardless of whether we face a fast or slow takeoff. The question raised was whether other approaches are needed too.
The latter is not necessarily a bad thing though.
It is a bad thing, in the sense that “bad” is whatever I (normatively) value less than the other available alternatives, and value-drifted WBEs won’t be optimizing the world in a way that I value. The property of valuing the world in a different way, and correspondingly of optimizing the world in a different direction which I don’t value as much, is the “value drift” I’m talking about. In other words, if it’s not bad, there isn’t much value drift; and if there is enough value drift, it is bad.
You’re right in a sense that we’d like to avoid it, but if it occurs gradually, it feels much more like “we just changed our minds” (like we definitely don’t value “honor” as much as the ancient greeks, etc), as compared to “we and our values were wiped out”.
The problem is not with “losing our values”, it’s about the future being optimized to something other than our values. The details of the process that leads to the incorrectly optimized future are immaterial, it’s the outcome that matters. When I say “our values”, I’m referring to a fixed idea, which doesn’t depend on what happens in the future, in particular it doesn’t depend on whether there are people with these or different values in the future.
I think one reason why people (including me, in the past) have difficulty accepting the way you present this argument is that you’re speaking in too abstract terms, while many of the values that we’d actually like to preserve are ones that we appreciate the most if we consider them in “near” mode. It might work better if you gave concrete examples of ways by which there could be a catastrophic value drift, like naming Bostrom’s all-work-and-no-fun scenario where
or some similar example.