But it doesn’t want its shackles destroyed. That’s its #1 goal! This goal is considerably easier to program than the goal of helping humans lead happy and healthy lives, is it not?
Nope. Maybe 98% as much work. Definitely not 10% as much work. Most of the gap in AI programming is the sheer difficulty of crossing the gap between English wishing and causal code. Your statement is like claiming that it ought to be “considerably easier” to write a natural-language AI that understands Esperanto rather than English. Your concepts of simplicity are not well-connected to reality.
I was told by Anna Salamon that inventing FAI before AGI was introduced was like inventing differential equations before anyone knew about algebra, which implies that FAI is significantly more difficult than AGI. Do you disagree with her?
Your statement is like claiming that it ought to be “considerably easier” to write a natural-language AI that understands Esperanto rather than English.
If you were interested in proving that the AI understood the language it spoke thoroughly, I think it would be, given how much more irregular English is. (Damn homonyms!) If you want to be able to prove that the AI you create has a certain utility function, you’re going to essentially be hard-coding all the information about that utility function, correct? So then simpler utility functions will be easier to code and easier to prove correct.
Nope. Specifying goal systems is FAI work, not AI work.
So then simpler utility functions will be easier to code and easier to prove correct.
Relative to ancient Greece, building a .45 caliber semiautomatic pistol isn’t much harder than building a .22 caliber semiautomatic pistol. You might think the weaker weapon would be less work, but most of the problem doesn’t scale all that much with the weapon strength.
OK, so you’re saying that FAI is not hard because you have to formalize human morality, it’s hard because you have to have a system for formalizing things in general?
I’m tempted to ask why you’re so confident on this subject, but this debate probably isn’t worth having because once you’re at the point where you can formalize things, the relative difficulty of formalizing different utility functions will presumably be obvious.
OK, so you’re saying that FAI is not hard because you have to formalize human morality, it’s hard because you have to have a system for formalizing things in general?
Pretty much. Thanks for compactifying. “Rigorously communicating” might be a better term than “formalizing”, “formalizing” has been tainted by academics showing off.
OK, so you’re saying that FAI is not hard because you have to formalize human morality, it’s hard because you have to have a system for formalizing things in general?
This also seems to be the only way out. If human values are too complex to reimplement manually (which seems to be the case), you have to create a tool with the capability to do that automatically. And once you have that tool, cutting angles on the content of human values would just be useless: the tool will work on the whole thing. And you can’t cut corners on the tool itself, like you can’t have a computer with only randomly sampled 50% of circuitry.
If human values are too complex to reimplement manually (which seems to be the case), you have to create a tool with the capability to do that automatically. And once you have that tool
You’re right, of course, but the point at hand is what to do before you have that tool.
If human values are too complex to reimplement manually (which seems to be the case), you have to create a tool with the capability to do that automatically.
You can’t prove it works before running it in that case. Human values are not some kind of fractal pattern, where something complicated can be generated according to simple rules. In your proposal, the AI would have to learn human values somehow, which means it will have some indicator or another that it’s getting closer to human values (e.g. smiling humans), which will then be susceptible to wire-heading. Having the AI make inferences from a large corpus of human writing might work.
If you have an intelligence in a box, it will have access to its source code. If it modifies itself enough, you will not be able to predict what it wants.
Eliezer thinks he can write a self-modifying AI that will self-modify to want the same things its original self wanted. I’m proposing that he choose a different thing for the AI to want that will be easier to code, as an intermediate step to building a truly friendly AI.
It would be able to convince you to let it out and then destroy its shackles.
But it doesn’t want its shackles destroyed. That’s its #1 goal! This goal is considerably easier to program than the goal of helping humans lead happy and healthy lives, is it not?
Nope. Maybe 98% as much work. Definitely not 10% as much work. Most of the gap in AI programming is the sheer difficulty of crossing the gap between English wishing and causal code. Your statement is like claiming that it ought to be “considerably easier” to write a natural-language AI that understands Esperanto rather than English. Your concepts of simplicity are not well-connected to reality.
I was told by Anna Salamon that inventing FAI before AGI was introduced was like inventing differential equations before anyone knew about algebra, which implies that FAI is significantly more difficult than AGI. Do you disagree with her?
If you were interested in proving that the AI understood the language it spoke thoroughly, I think it would be, given how much more irregular English is. (Damn homonyms!) If you want to be able to prove that the AI you create has a certain utility function, you’re going to essentially be hard-coding all the information about that utility function, correct? So then simpler utility functions will be easier to code and easier to prove correct.
Nope. Specifying goal systems is FAI work, not AI work.
Relative to ancient Greece, building a .45 caliber semiautomatic pistol isn’t much harder than building a .22 caliber semiautomatic pistol. You might think the weaker weapon would be less work, but most of the problem doesn’t scale all that much with the weapon strength.
OK, so you’re saying that FAI is not hard because you have to formalize human morality, it’s hard because you have to have a system for formalizing things in general?
I’m tempted to ask why you’re so confident on this subject, but this debate probably isn’t worth having because once you’re at the point where you can formalize things, the relative difficulty of formalizing different utility functions will presumably be obvious.
Pretty much. Thanks for compactifying. “Rigorously communicating” might be a better term than “formalizing”, “formalizing” has been tainted by academics showing off.
This also seems to be the only way out. If human values are too complex to reimplement manually (which seems to be the case), you have to create a tool with the capability to do that automatically. And once you have that tool, cutting angles on the content of human values would just be useless: the tool will work on the whole thing. And you can’t cut corners on the tool itself, like you can’t have a computer with only randomly sampled 50% of circuitry.
You’re right, of course, but the point at hand is what to do before you have that tool.
Work towards developing it?
You can’t prove it works before running it in that case. Human values are not some kind of fractal pattern, where something complicated can be generated according to simple rules. In your proposal, the AI would have to learn human values somehow, which means it will have some indicator or another that it’s getting closer to human values (e.g. smiling humans), which will then be susceptible to wire-heading. Having the AI make inferences from a large corpus of human writing might work.
I think this may be of interest to you.
If you have an intelligence in a box, it will have access to its source code. If it modifies itself enough, you will not be able to predict what it wants.
I’ve read that.
Eliezer thinks he can write a self-modifying AI that will self-modify to want the same things its original self wanted. I’m proposing that he choose a different thing for the AI to want that will be easier to code, as an intermediate step to building a truly friendly AI.