You seem to assume that AGI will likely be designed to judge any action with regard to a strict utility-function. You are assuming a very special kind of AGI design with a rigid utility-function that the AGI then cares to satisfy the way it was initially hardcoded.
Hmm. Actually, I’m not making any assumptions about the AGI’s decision-making process (or at least I’m trying not to): it could have a formal utility function, but it could also have e.g. a more human-like system with various instincts that pull it in different directions, or pretty much any decision-making system that might be reasonable.
You make a good point that this probably needs to be clarified. Could you point out the main things that give the impression that I’m presuming utility function -based decision making?
Could you point out the main things that give the impression that I’m presuming utility function -based decision making?
I am not sure what other AGI designs exist, other than utility function based decision makers, where it would make sense to talk about “friendly” and “unfriendly” goal architectures. If we’re talking about behavior executors or AGI designs with malleable goals, then we’re talking about hardcoded tools in the former case and unpredictable systems in the latter case, no?
Hmm. Actually, I’m not making any assumptions about the AGI’s decision-making process (or at least I’m trying not to): it could have a formal utility function, but it could also have e.g. a more human-like system with various instincts that pull it in different directions, or pretty much any decision-making system that might be reasonable.
You make a good point that this probably needs to be clarified. Could you point out the main things that give the impression that I’m presuming utility function -based decision making?
I am not sure what other AGI designs exist, other than utility function based decision makers, where it would make sense to talk about “friendly” and “unfriendly” goal architectures. If we’re talking about behavior executors or AGI designs with malleable goals, then we’re talking about hardcoded tools in the former case and unpredictable systems in the latter case, no?