[habryka] The way humans think about the question of “preferences for weak agents” and “kindness” feels like the kind of thing that will come apart under extreme optimization, in a similar way to how I expect the idea of “having a continuous stream of consciousness with a good past and good future is important” to come apart as humans can make copies of themselves and change their memories, and instantiate slightly changed versions of themselves, etc.
I think there will be options that are good under most of the things that “preferences for weak agents” would likely come apart into under close examination. If you’re trying to fulfill the preferences of fish, you might argue about whether the exact thing you should care about is maximizing their hedonic state vs ensuring that they exist in an ecological environment which resembles their niche vs minimizing “boundary-crossing actions”… but you can probably find an action that is better than “kill the fish” by all of those possible metrics.
I think that some people have an intuition that any future agent must pick exactly one utility function over the physical configuration of matter in the universe, and that any agent that has a deontological constraint like “don’t do any actions which are 0.00001% better under my current interpretation of my utility function but which are horrifyingly bad to every other agent ” will be outcompeted in the long term. I personally don’t see it, and particularly I don’t see how there’s an available slot for an arbitrary outcome-based utility function that is not “reproduce yourself at all costs” but there isn’t an available slot for process-based preferences like “and don’t be an asshole for miniscule gains while doing that”.
I think there will be options that are good under most of the things that “preferences for weak agents” would likely come apart into under close examination. If you’re trying to fulfill the preferences of fish, you might argue about whether the exact thing you should care about is maximizing their hedonic state vs ensuring that they exist in an ecological environment which resembles their niche vs minimizing “boundary-crossing actions”… but you can probably find an action that is better than “kill the fish” by all of those possible metrics.
I think that some people have an intuition that any future agent must pick exactly one utility function over the physical configuration of matter in the universe, and that any agent that has a deontological constraint like “don’t do any actions which are 0.00001% better under my current interpretation of my utility function but which are horrifyingly bad to every other agent ” will be outcompeted in the long term. I personally don’t see it, and particularly I don’t see how there’s an available slot for an arbitrary outcome-based utility function that is not “reproduce yourself at all costs” but there isn’t an available slot for process-based preferences like “and don’t be an asshole for miniscule gains while doing that”.