So a powerful agent (or a mass of tiny agents with large total power) needs a different utility function on future worlds than that of a lone rationalist observer, due to the need to avoid exploits. Well… which should I pick, then?
Looks like we’ve run into another of those nasty recursive problems: I choose my utility function depending on what every other agent could do to exploit me, and everyone else does the same. The only natural solution might well turn out to be everyone caring about their own welfare and no one else’s, to avoid “mugging by suffering”. Let’s model the problem mathematically and look for other solutions—I love this stuff.
So a powerful agent (or a mass of tiny agents with large total power) needs a different utility function on future worlds than that of a lone rationalist observer, due to the need to avoid exploits.
No, it needs a different method of maximizing expected utility. Avoiding moral sabotage doesn’t reflect a preference, it’s purely instrumental.
So a powerful agent (or a mass of tiny agents with large total power) needs a different utility function on future worlds than that of a lone rationalist observer, due to the need to avoid exploits. Well… which should I pick, then?
Looks like we’ve run into another of those nasty recursive problems: I choose my utility function depending on what every other agent could do to exploit me, and everyone else does the same. The only natural solution might well turn out to be everyone caring about their own welfare and no one else’s, to avoid “mugging by suffering”. Let’s model the problem mathematically and look for other solutions—I love this stuff.
No, it needs a different method of maximizing expected utility. Avoiding moral sabotage doesn’t reflect a preference, it’s purely instrumental.
Thanks, this clicked.