Oddly enough, I meant pretty much the same thing you did: a perfectly self-aware agent understands its own implementation so well that it would be able to implement it from scratch. I find your definition very clear. But I’ll taboo the term for now.
Ok we have some major confusion here. I just provided a mathematical example for why it will be generally a bad idea to change your utility function...
I think you have provided an example for why, given a utility function F0(action) , the return value of F0(change F0 to F1) is very low. However, F1(change F0 to F1) is probably quite high. I argue that an agent who can examine its own implementation down to minute details (in a way that we humans cannot) would be able to compare various utility functions, and then pick the one that gives it the most utilons (or however you spell them) given the physical constraints it has to work with. We humans cannot do this because a). we can’t introspect nearly as well, b). we can’t change our utility functions even if we wanted to, and c). one of our terminal goals is, “never change your utility function”. A non-human agent would not necessarily possess such a goal (though it could).
Typically, the reason you wouldn’t change your utility function is that you’re not trying to “get utilons”, you’re trying to maximize F0 (for example), and that won’t happen if you change yourself into something that maximizes a different function.
Ok, let’s say you’re a super-smart AI researcher who is evaluating the functionality of two prospective AI agents, each running in its own simulation (naturally, they don’t know that they’re running in a simulation, but believe that their worlds are fully real).
Agent A cares primarily about paperclips; it spends all its time building paperclips, figuring out ways to make more paperclips faster, etc. Agent B cares about a variety of things, such as exploration, or jellyfish, or black holes or whatever—but not about paperclips. You can see the utility functions for both agents, and you could evaluate them on your calculator given a variety of projected scenarios.
At this point, would you—the AI researcher—be able to tell which agent was happier, on the average ? If not, is it because you lack some piece of information, or because the two agents cannot be compared to each other in any meaningful way, or for some other reason ?
Huh. It’s not clear to me that they’d have something equivalent to happiness, but if they did I might be able to tell. Even if they did, though, they wouldn’t necessarily care about happiness, unless we really screwed up in designing it (like evolution did). Even if it was some sort of direct measure of utility, it’d only be a valuable metric insofar as it reflected F0.
It seems somewhat arbitrary to pick “maximize the function stored in this location” as the “real” fundamental value of the AI. A proper utility maximizer would have “maximize this specific function”, or something. I mean, you could just as easily say that the AI would reason “hey, it’s tough to maximize utility functions, I might as well just switch from caring about utility to caring about nothing, that’d be pretty easy to deal with.”
Oddly enough, I meant pretty much the same thing you did: a perfectly self-aware agent understands its own implementation so well that it would be able to implement it from scratch. I find your definition very clear. But I’ll taboo the term for now.
I think you have provided an example for why, given a utility function F0(action) , the return value of F0(change F0 to F1) is very low. However, F1(change F0 to F1) is probably quite high. I argue that an agent who can examine its own implementation down to minute details (in a way that we humans cannot) would be able to compare various utility functions, and then pick the one that gives it the most utilons (or however you spell them) given the physical constraints it has to work with. We humans cannot do this because a). we can’t introspect nearly as well, b). we can’t change our utility functions even if we wanted to, and c). one of our terminal goals is, “never change your utility function”. A non-human agent would not necessarily possess such a goal (though it could).
Typically, the reason you wouldn’t change your utility function is that you’re not trying to “get utilons”, you’re trying to maximize F0 (for example), and that won’t happen if you change yourself into something that maximizes a different function.
Ok, let’s say you’re a super-smart AI researcher who is evaluating the functionality of two prospective AI agents, each running in its own simulation (naturally, they don’t know that they’re running in a simulation, but believe that their worlds are fully real).
Agent A cares primarily about paperclips; it spends all its time building paperclips, figuring out ways to make more paperclips faster, etc. Agent B cares about a variety of things, such as exploration, or jellyfish, or black holes or whatever—but not about paperclips. You can see the utility functions for both agents, and you could evaluate them on your calculator given a variety of projected scenarios.
At this point, would you—the AI researcher—be able to tell which agent was happier, on the average ? If not, is it because you lack some piece of information, or because the two agents cannot be compared to each other in any meaningful way, or for some other reason ?
Huh. It’s not clear to me that they’d have something equivalent to happiness, but if they did I might be able to tell. Even if they did, though, they wouldn’t necessarily care about happiness, unless we really screwed up in designing it (like evolution did). Even if it was some sort of direct measure of utility, it’d only be a valuable metric insofar as it reflected F0.
It seems somewhat arbitrary to pick “maximize the function stored in this location” as the “real” fundamental value of the AI. A proper utility maximizer would have “maximize this specific function”, or something. I mean, you could just as easily say that the AI would reason “hey, it’s tough to maximize utility functions, I might as well just switch from caring about utility to caring about nothing, that’d be pretty easy to deal with.”