… I’m not sure why I used the word “we” in the sentence you quoted. (Maybe I was thinking about a group of value-aligned agents? Maybe I was imagining that “reasonable reflection process” meant that we were in a post-scarcity world, everyone agreed that we should be doing reflection, everyone was already safe? Maybe I didn’t want the definition to sound like I would only care about what I thought and not what everyone else thought? I’m not sure.)
In any case, I think you can change that sentence to “whatever I decide based on some ‘reasonable’ reflection process is good”, and that’s closer to what I meant.
I am much more uncertain about multiagent interactions. Like, suppose we give every person access to a somewhat superintelligent AI assistant that is legitimately trying to help them. Are things okay by default? I lean towards yes, but I’m uncertain. I did read through those two articles, and I broadly buy the theses they advance; I still lean towards yes because:
Things have broadly become better over time, despite the effects that the articles above highlight. The default prediction is that they continue to get better. (And I very uncertainly think people from the past would agree, given enough time to understand our world?)
In general, we learn reasonably well from experience; we try things and they go badly, but then things get better as we learn from that.
Humans tend to be quite risk-averse at trying things, and groups of humans seem to be even more risk-averse. As a result, it seems unlikely that we try a thing that ends up having a “direct” existentially bad effect.
You could worry about an “indirect” existentially bad effect, along the lines of Moloch, where there isn’t any single human’s optimization causing bad things to happen, but selection pressure causes problems. Selection pressure has existed for a long time and hasn’t caused an existentially-bad outcome yet, so the default is that it won’t in the future.
Perhaps AI accelerates the rate of progress in a way where we can’t adapt fast enough, and this is why selection pressures can now cause an existentially bad effect. But this didn’t happen with the Industrial Revolution. (That said, I do find this more plausible than the other scenarios.)
But in fact I usually don’t aim to make claims about these sorts of scenarios; as I mentioned above I’m more optimistic about social solutions (that being the way we have solved this in the past).
… I’m not sure why I used the word “we” in the sentence you quoted. (Maybe I was thinking about a group of value-aligned agents? Maybe I was imagining that “reasonable reflection process” meant that we were in a post-scarcity world, everyone agreed that we should be doing reflection, everyone was already safe? Maybe I didn’t want the definition to sound like I would only care about what I thought and not what everyone else thought? I’m not sure.)
In any case, I think you can change that sentence to “whatever I decide based on some ‘reasonable’ reflection process is good”, and that’s closer to what I meant.
I am much more uncertain about multiagent interactions. Like, suppose we give every person access to a somewhat superintelligent AI assistant that is legitimately trying to help them. Are things okay by default? I lean towards yes, but I’m uncertain. I did read through those two articles, and I broadly buy the theses they advance; I still lean towards yes because:
Things have broadly become better over time, despite the effects that the articles above highlight. The default prediction is that they continue to get better. (And I very uncertainly think people from the past would agree, given enough time to understand our world?)
In general, we learn reasonably well from experience; we try things and they go badly, but then things get better as we learn from that.
Humans tend to be quite risk-averse at trying things, and groups of humans seem to be even more risk-averse. As a result, it seems unlikely that we try a thing that ends up having a “direct” existentially bad effect.
You could worry about an “indirect” existentially bad effect, along the lines of Moloch, where there isn’t any single human’s optimization causing bad things to happen, but selection pressure causes problems. Selection pressure has existed for a long time and hasn’t caused an existentially-bad outcome yet, so the default is that it won’t in the future.
Perhaps AI accelerates the rate of progress in a way where we can’t adapt fast enough, and this is why selection pressures can now cause an existentially bad effect. But this didn’t happen with the Industrial Revolution. (That said, I do find this more plausible than the other scenarios.)
But in fact I usually don’t aim to make claims about these sorts of scenarios; as I mentioned above I’m more optimistic about social solutions (that being the way we have solved this in the past).