This is a good point. Pretty much all the things we’re optimizing for which aren’t our values are due to coordination problems. (There’s also Akrasia/addiction sorts of things, but that’s optimizing for values which we don’t endorse upon reflection, and so arguably isn’t as bad as optimizing for a random part of value-space.)
So, Moloch might optimize for things like GDP instead of Gross National Happiness, and individuals might throw a thousand starving orphans under the bus for a slightly bigger yacht or whatever, but neither is fully detached from human values. Even if U(orphans)>>U(yacht), at least there’s an awesome yacht to counterbalance the mountain of suck.
I guess the question is precisely how diverse human values are in the grand scheme of things, and what the odds are of hitting a human value when picking a random or semi-random subset of value-space. If we get FAI slightly wrong, precisely how wrong does it have to be before it leaves our little island of value-space? Tiling the universe with smiley faces is obviously out, but what about hedonium, or wire heading everyone? Faced with an unwinnable AI arms race and no time for true FAI, I’d probably consider those better than nothing.
That’s a really, really tiny sliver of my values though, so I’m not sure I’d even endorse such a strategy if the odds were 100:1 against FAI. If that’s the best we could do by compromising, I’d still rate the expected utility of MIRI’s current approach higher, and hold out hope for FAI.
This is a good point. Pretty much all the things we’re optimizing for which aren’t our values are due to coordination problems. (There’s also Akrasia/addiction sorts of things, but that’s optimizing for values which we don’t endorse upon reflection, and so arguably isn’t as bad as optimizing for a random part of value-space.)
So, Moloch might optimize for things like GDP instead of Gross National Happiness, and individuals might throw a thousand starving orphans under the bus for a slightly bigger yacht or whatever, but neither is fully detached from human values. Even if U(orphans)>>U(yacht), at least there’s an awesome yacht to counterbalance the mountain of suck.
I guess the question is precisely how diverse human values are in the grand scheme of things, and what the odds are of hitting a human value when picking a random or semi-random subset of value-space. If we get FAI slightly wrong, precisely how wrong does it have to be before it leaves our little island of value-space? Tiling the universe with smiley faces is obviously out, but what about hedonium, or wire heading everyone? Faced with an unwinnable AI arms race and no time for true FAI, I’d probably consider those better than nothing.
That’s a really, really tiny sliver of my values though, so I’m not sure I’d even endorse such a strategy if the odds were 100:1 against FAI. If that’s the best we could do by compromising, I’d still rate the expected utility of MIRI’s current approach higher, and hold out hope for FAI.