“Tell the AI in English” is in essence an utility function “Maximize the value of X, where X is my current opinion of what some english text Y means”.
The ‘understanding English’ module, the mapping function between X and “what you told in English” is completely arbitrary, but is very important to the AI—so any self-modifying AI will want to modify and improve that. Also, we don’t have a good “understanding English” module so yes, we also want the AI to be able to modify and improve that. But, it can be wildly different from reality or opinions of humans—there are trivial ways of how well-meaning dialogue systems can misunderstand statements.
However, for the AI “improve the module” means “change the module so that my utility grows”—so in your example it has strong motivation to intentionally misunderstand English. The best case scenario is to misunderstand “Make everyone happy” as “Set your utility function to MAXINT”. The worst case scenario is, well, everything else.
There’s the classic quote “It is difficult to get a man to understand something, when his salary depends upon his not understanding it!”—if the AI doesn’t care in the first place, then “Tell AI what to do in English” won’t make it care.
By this reasoning, an AI asked to do anything at all would respond by immediately modifying itself to set its utility function to MAXINT. You don’t need to speak to it in English for that—if you asked the AI to maximize paperclips, that is the equivalent of “Maximize the value of X, where X is my current opinion of how many paperclips there are”, and it would modify its paperclip-counting module to always return MAXINT.
You are correct that telling the AI to do Y is equivalent to “maximize the value of X, where X is my current opinion about Y”. However, “current” really means “current”, not “new”. If the AI is actually trying to obey the command to do Y, it won’t change its utility function unless having a new utility function will increase its utility according to its current utility function. Neither misunderstanding nor understanding will raise its utility unless its current utility function values having a utility function that misunderstands or understands.
By this reasoning, an AI asked to do anything at all would respond by immediately modifying itself to set its utility function to MAXINT.
That’s allegedly more or less what happened to Eurisko (here, section 2), although it didn’t trick itself quite that cleanly. The problem was only solved by algorithmically walling off its utility function from self-modification: an option that wouldn’t work for sufficiently strong AI, and one to avoid if you want to eventually allow your AI the capacity for a more precise notion of utility than you can give it.
Paperclipping as the term’s used here assumes value stability.
A human is a counterexample. A human emulation would count as an AI, so human behavior is one possible AI behavior. Richard’s argument is that humans don’t respond to orders or requests in anything like these brittle, GOFAI-type systems invoked by the word “formal systems”. You’re not considering that possibility. You’re still thinking in terms of formal systems.
(Unpacking the significant differences between how humans operate, and the default assumptions that the LW community makes about AI, would take… well, five years, maybe ten.)
A human emulation would count as an AI, so human behavior is one possible AI behavior.
Uhh, no. Look, humans respond to orders and requests in the way that we do because we tend to care what the person giving the request actually wants. Not because we’re some kind of “informal system”. Any computer program is a formal system, but there are simply more and less complex ones. All you are suggesting is building a very complex (“informal”) system and hoping that because it’s complex (like humans!) it will behave in a humanish way.
Your response avoids the basic logic here. A human emulation would count as an AI, therefore human behavior isone possible AI behavior. There is nothing controversial in the statement; the conclusion is drawn from the premise. If you don’t think a human emulation would count as AI, or isn’t possible, or something else, fine, but… why wouldn’t a human emulation count as an AI? How, for example, can we even think about advanced intelligence, much less attempt to model it, without considering human intelligence?
...humans respond to orders and requests in the way that we do because we tend to care what the person giving the request actually wants.
I don’t think this is generally an accurate (or complex) description of human behavior, but it does sound to me like an “informal system”—i.e. we tend to care. My reading of (at least this part of) PhilGoetz’s position is that it makes more sense to imagine something we would call an advanced or super AI responding to requests and commands with a certain nuance of understanding (as humans do) than with the inflexible (“brittle”) formality of, say, your average BASIC program.
The thing is, humans do that by… well, not being formal systems. Which pretty much requires you to keep a good fraction of the foibles and flaws of a nonformal, nonrigorously rational system.
You’d be more likely to get FAI, but FAI itself would be devalued, since now it’s possible for the FAI itself to make rationality errors.
“Tell the AI in English” is in essence an utility function “Maximize the value of X, where X is my current opinion of what some english text Y means”.
The ‘understanding English’ module, the mapping function between X and “what you told in English” is completely arbitrary, but is very important to the AI—so any self-modifying AI will want to modify and improve that. Also, we don’t have a good “understanding English” module so yes, we also want the AI to be able to modify and improve that. But, it can be wildly different from reality or opinions of humans—there are trivial ways of how well-meaning dialogue systems can misunderstand statements.
However, for the AI “improve the module” means “change the module so that my utility grows”—so in your example it has strong motivation to intentionally misunderstand English. The best case scenario is to misunderstand “Make everyone happy” as “Set your utility function to MAXINT”. The worst case scenario is, well, everything else.
There’s the classic quote “It is difficult to get a man to understand something, when his salary depends upon his not understanding it!”—if the AI doesn’t care in the first place, then “Tell AI what to do in English” won’t make it care.
By this reasoning, an AI asked to do anything at all would respond by immediately modifying itself to set its utility function to MAXINT. You don’t need to speak to it in English for that—if you asked the AI to maximize paperclips, that is the equivalent of “Maximize the value of X, where X is my current opinion of how many paperclips there are”, and it would modify its paperclip-counting module to always return MAXINT.
You are correct that telling the AI to do Y is equivalent to “maximize the value of X, where X is my current opinion about Y”. However, “current” really means “current”, not “new”. If the AI is actually trying to obey the command to do Y, it won’t change its utility function unless having a new utility function will increase its utility according to its current utility function. Neither misunderstanding nor understanding will raise its utility unless its current utility function values having a utility function that misunderstands or understands.
That’s allegedly more or less what happened to Eurisko (here, section 2), although it didn’t trick itself quite that cleanly. The problem was only solved by algorithmically walling off its utility function from self-modification: an option that wouldn’t work for sufficiently strong AI, and one to avoid if you want to eventually allow your AI the capacity for a more precise notion of utility than you can give it.
Paperclipping as the term’s used here assumes value stability.
A human is a counterexample. A human emulation would count as an AI, so human behavior is one possible AI behavior. Richard’s argument is that humans don’t respond to orders or requests in anything like these brittle, GOFAI-type systems invoked by the word “formal systems”. You’re not considering that possibility. You’re still thinking in terms of formal systems.
(Unpacking the significant differences between how humans operate, and the default assumptions that the LW community makes about AI, would take… well, five years, maybe ten.)
Uhh, no. Look, humans respond to orders and requests in the way that we do because we tend to care what the person giving the request actually wants. Not because we’re some kind of “informal system”. Any computer program is a formal system, but there are simply more and less complex ones. All you are suggesting is building a very complex (“informal”) system and hoping that because it’s complex (like humans!) it will behave in a humanish way.
Your response avoids the basic logic here. A human emulation would count as an AI, therefore human behavior is one possible AI behavior. There is nothing controversial in the statement; the conclusion is drawn from the premise. If you don’t think a human emulation would count as AI, or isn’t possible, or something else, fine, but… why wouldn’t a human emulation count as an AI? How, for example, can we even think about advanced intelligence, much less attempt to model it, without considering human intelligence?
I don’t think this is generally an accurate (or complex) description of human behavior, but it does sound to me like an “informal system”—i.e. we tend to care. My reading of (at least this part of) PhilGoetz’s position is that it makes more sense to imagine something we would call an advanced or super AI responding to requests and commands with a certain nuance of understanding (as humans do) than with the inflexible (“brittle”) formality of, say, your average BASIC program.
The thing is, humans do that by… well, not being formal systems. Which pretty much requires you to keep a good fraction of the foibles and flaws of a nonformal, nonrigorously rational system.
You’d be more likely to get FAI, but FAI itself would be devalued, since now it’s possible for the FAI itself to make rationality errors.
More likely, really?
You’re essentially proposing giving a human Ultimate Power. I doubt that will go well.
Iunno. Humans are probably less likely to go horrifically insane with power than the base chance of FAI.
Your chances aren’t good, just better.