RedFishBlueFish comments on Does AI care about reality or just its own perception?

RedFishBlueFish 5 Jan 2024 21:31 UTC
1 point
0

We don’t know how to represent “do not kill everyone”

I think this goes to Matthew Barnett’s recent article of actually yes we do. And regardless I don’t think this point is a big part of Eliezer’s argument. https://www.lesswrong.com/posts/i5kijcjFJD6bn7dwq/evaluating-the-historical-value-misspecification-argument

We don’t know how to pick which quantity would be maximized by a would-be strong consequentialist maximizer

Yeah so I think this is the crux of it. My point is that if we find some training approach that leads to a model that cares about the world itself rather than hacking some reward function, that’s a sign that we can in fact guide the model in important ways and there’s a good chance this includes being able to tell it not to kill everyone

We don’t know know what a strong consequentialist maximizer would look like, if we had one around, because we don’t have one around (because if we did, we’d be dead)

This is just a way of saying “we don’t know what AGI would do”. I don’t think this point pushes us toward x-risk any more than it pushes us toward not-x-risk.