Wei Dai comments on Cosmopolitan values don’t come free

Wei Dai 3 Jun 2023 19:39 UTC
4 points
0

I do explicitly flag the loss of control over the future in that same sentence.

In your initial comment you talked a lot about AI respecting the preferences of weak agents (using 1/trillion of its resources) which implies handing back control of a lot of resources to humans, which from the selfish or scope insensitive perspective of typical humans probably seems almost as good as not losing that control in the first place.

I don’t think the much worse outcomes are closely related to unaligned AI so I don’t think they seem super relevant to my comment or Nate’s post.

If people think that (conditional on unaligned AI) in 50% of worlds everyone dies and the other 50% of worlds typically look like small utopias where existing humans get to live out long and happy lives (because of 1/trillion kindness), then they’re naturally going to think that aligned AI can only be better than that. So even if s-risks apply almost equally to both aligned and unaligned AI, I still want people to talk about it when talking about unaligned AIs, or take some other measure to ensure that people aren’t potentially misled like this.

(It could be that I’m just worrying too much here, that empirically people who read your top-level comment won’t get the impression that close to 50% of worlds with unaligned AIs will look like small utopias. If this is what you think, I guess we could try to find out, or just leave the discussion here.)

where is the upside to the AI from spite during training?

Maybe the AI develops it naturally from multi-agent training (intended to make the AI more competitive in the real world) or the AI developer tried to train some kind of morality (e.g. sense of fairness or justice) into the AI.