To be better able to respond to your comment, please let me know in what way you disagree with the following comparison between narrow AI and general AI:
Narrow artificial intelligence will be denoted NAI and general artificial intelligence GAI.
(1) Is it in principle capable of behaving in accordance with human intention to a sufficient degree?
NAI: True
GAI: True
(2) Under what circumstances does it fail to behave in accordance with human intention?
NAI: If it is broken, where broken stands for a wide range of failure modes such as incorrectly managing memory allocations.
GAI: In all cases in which it is not mathematically proven to be tasked with the protection of, and equipped with, a perfect encoding of all human values or a safe way to obtain such an encoding.
(3) What happens when it fails to behave in accordance with human intention?
NAI: It crashes, freezes or halts. It generally fails in a way that is harmful to its own functioning. If for example an autonomous car fails at driving autonomously it usually means that it will either go into safe-mode and halt or crash.
GAI: It works perfectly well. Superhumanly well. All its intended capabilities are intact except that it completely fails at working as intended in such a way as to destroy all human value in the universe. It will be able to improve itself and capable of obtaining a perfect encoding of human values. It will use those intended capabilities in order to deceive and overpower humans rather than doing what it was intended to do.
(4) What happens if it is bound to use a limited amount of resources, use a limited amount of space or run for a limited amount of time?
NAI: It will only ever do what it was programmed to do. As long as there is no fatal flaw, harming its general functionality, it will work within the defined boundaries as intended.
GAI: It will never do what it was programmed to do and always remove or bypass its intended limitations in order to pursue unintended actions such as taking over the universe.
Please let me also know where you disagree with the following points:
(1) The abilities of systems are part of human preferences as humans intend to give systems certain capabilities and, as a prerequisite to build such systems, have to succeed at implementing their intentions.
(2) Error detection and prevention is such a capability.
(3) Something that is not better than humans at preventing errors is no existential risk.
(4) Without a dramatic increase in the capacity to detect and prevent errors it will be impossible to create something that is better than humans at preventing errors.
(5) A dramatic increase in the human capacity to detect and prevent errors is incompatible with the creation of something that constitutes an existential risk as a result of human error.
GAI: It will never do what it was programmed to do and always remove or bypass its intended limitations in order to pursue unintended actions such as taking over the universe.
GAI is a program. It always does what it’s programmed to do. That’s the problem—a program that was written incorrectly will generally never do what it was intended to do.
FWIW, I find your statements 3,4,5 also highly objectionable, on the grounds that you are lumping a large class of things under the blank label “errors”. Is an “error” doing something that humans don’t want? Is it doing something the agent doesn’t want? Is it accidentally mistyping a letter in a program, causing a syntax error, or thinking about something heuristically and coming to the wrong conclusion, then making carefully planned decision based on that mistake? Automatic proof systems don’t save you if you what you think you need to prove isn’t actually what you need to prove.
GAI is a program. It always does what it’s programmed to do. That’s the problem—a program that was written incorrectly will generally never do what it was intended to do.
So self-correcting software is impossible. Is self improving software possible?
Self-correcting software is possible if there’s a correct implementation of what “correctness” means, and the module that has the correct implementation has control over the modules that don’t have the correct implementation.
Self-improving software are likewise possible if there’s a correct implementation of the definition of “improvement”.
Right now, I’m guessing that it’d be relatively easy to programmatically define “performance improvement” and difficult to define “moral and ethical improvement”.
1) Poorly defined terms “human intention” and “sufficient”.
2) Possibly under any circumstances whatsoever, if it’s anything like other non-trivial software, which always has some bugs.
3) Anything from “you may not notice” to “catastrophic failure resulting in deaths”. Claim that failure of software to work as humans intend will “generally fail in a way that is harmful to it’s own functioning” is unsupported. E.g. a spreadsheet works fine if the floating point math is off in the 20th bit of the mantissa. The answers will be wrong, but there is nothing about that that the spreadsheet could be expected to care about,
4) Not necessarily. GAI may continue to try to do what it was programmed to do, and only unintentionally destroy a small city in the process :)
Second list:
1) Wrong. The abilities of sufficiently complex systems are a huge space of events humans haven’t thought about yet, and so do not yet have preferences about. There is no way to know what their preferences would or should be for many many outcomes.
2) Error as failure to perform the requested action may take precedence over error as failure to anticipate hypothetical objections from some humans to something they hadn’t expected. For one thing, it is more clearly defined. We already know human-level intelligences act this way.
3) Asteroids and supervolcanoes are not better than humans at preventing errors. It is perfectly possible for something stupid to be able to kill you. Therefore something with greater cognitive and material resources than you, but still with the capacity to make mistakes can certainly kill you. For example, a government.
4) It is already possible for a very fallible human to make something that is better than humans at detecting certain kinds of errors.
5) No. Unless by dramatic you mean “impossibly perfect, magical and universal”.
(3) What happens when it fails to behave in accordance with human intention?
NAI: It crashes, freezes or halts. It generally fails in a way that is harmful to its own functioning. If for example an autonomous car fails at driving autonomously it usually means that it will either go into safe-mode and halt or crash.
GAI: It works perfectly well. Superhumanly well. All its intended capabilities are intact except that it completely fails at working as intended in such a way as to destroy all human value in the universe. It will be able to improve itself and capable of obtaining a perfect encoding of human values. It will use those intended capabilities in order to deceive and overpower humans rather than doing what it was intended to do.
Firstly, “fails in a way that is harmful to its own functioning” appears to be tautological.
Secondly, you seem to be listing things that apply to any kind of AI in the NAI section—is this intentional? (This happens throughout your comment, in fact.)
To be better able to respond to your comment, please let me know in what way you disagree with the following comparison between narrow AI and general AI:
Narrow artificial intelligence will be denoted NAI and general artificial intelligence GAI.
(1) Is it in principle capable of behaving in accordance with human intention to a sufficient degree?
NAI: True
GAI: True
(2) Under what circumstances does it fail to behave in accordance with human intention?
NAI: If it is broken, where broken stands for a wide range of failure modes such as incorrectly managing memory allocations.
GAI: In all cases in which it is not mathematically proven to be tasked with the protection of, and equipped with, a perfect encoding of all human values or a safe way to obtain such an encoding.
(3) What happens when it fails to behave in accordance with human intention?
NAI: It crashes, freezes or halts. It generally fails in a way that is harmful to its own functioning. If for example an autonomous car fails at driving autonomously it usually means that it will either go into safe-mode and halt or crash.
GAI: It works perfectly well. Superhumanly well. All its intended capabilities are intact except that it completely fails at working as intended in such a way as to destroy all human value in the universe. It will be able to improve itself and capable of obtaining a perfect encoding of human values. It will use those intended capabilities in order to deceive and overpower humans rather than doing what it was intended to do.
(4) What happens if it is bound to use a limited amount of resources, use a limited amount of space or run for a limited amount of time?
NAI: It will only ever do what it was programmed to do. As long as there is no fatal flaw, harming its general functionality, it will work within the defined boundaries as intended.
GAI: It will never do what it was programmed to do and always remove or bypass its intended limitations in order to pursue unintended actions such as taking over the universe.
Please let me also know where you disagree with the following points:
(1) The abilities of systems are part of human preferences as humans intend to give systems certain capabilities and, as a prerequisite to build such systems, have to succeed at implementing their intentions.
(2) Error detection and prevention is such a capability.
(3) Something that is not better than humans at preventing errors is no existential risk.
(4) Without a dramatic increase in the capacity to detect and prevent errors it will be impossible to create something that is better than humans at preventing errors.
(5) A dramatic increase in the human capacity to detect and prevent errors is incompatible with the creation of something that constitutes an existential risk as a result of human error.
GAI is a program. It always does what it’s programmed to do. That’s the problem—a program that was written incorrectly will generally never do what it was intended to do.
FWIW, I find your statements 3,4,5 also highly objectionable, on the grounds that you are lumping a large class of things under the blank label “errors”. Is an “error” doing something that humans don’t want? Is it doing something the agent doesn’t want? Is it accidentally mistyping a letter in a program, causing a syntax error, or thinking about something heuristically and coming to the wrong conclusion, then making carefully planned decision based on that mistake? Automatic proof systems don’t save you if you what you think you need to prove isn’t actually what you need to prove.
So self-correcting software is impossible. Is self improving software possible?
Self-correcting software is possible if there’s a correct implementation of what “correctness” means, and the module that has the correct implementation has control over the modules that don’t have the correct implementation.
Self-improving software are likewise possible if there’s a correct implementation of the definition of “improvement”.
Right now, I’m guessing that it’d be relatively easy to programmatically define “performance improvement” and difficult to define “moral and ethical improvement”.
First list:
1) Poorly defined terms “human intention” and “sufficient”.
2) Possibly under any circumstances whatsoever, if it’s anything like other non-trivial software, which always has some bugs.
3) Anything from “you may not notice” to “catastrophic failure resulting in deaths”. Claim that failure of software to work as humans intend will “generally fail in a way that is harmful to it’s own functioning” is unsupported. E.g. a spreadsheet works fine if the floating point math is off in the 20th bit of the mantissa. The answers will be wrong, but there is nothing about that that the spreadsheet could be expected to care about,
4) Not necessarily. GAI may continue to try to do what it was programmed to do, and only unintentionally destroy a small city in the process :)
Second list:
1) Wrong. The abilities of sufficiently complex systems are a huge space of events humans haven’t thought about yet, and so do not yet have preferences about. There is no way to know what their preferences would or should be for many many outcomes.
2) Error as failure to perform the requested action may take precedence over error as failure to anticipate hypothetical objections from some humans to something they hadn’t expected. For one thing, it is more clearly defined. We already know human-level intelligences act this way.
3) Asteroids and supervolcanoes are not better than humans at preventing errors. It is perfectly possible for something stupid to be able to kill you. Therefore something with greater cognitive and material resources than you, but still with the capacity to make mistakes can certainly kill you. For example, a government.
4) It is already possible for a very fallible human to make something that is better than humans at detecting certain kinds of errors.
5) No. Unless by dramatic you mean “impossibly perfect, magical and universal”.
Two points:
Firstly, “fails in a way that is harmful to its own functioning” appears to be tautological.
Secondly, you seem to be listing things that apply to any kind of AI in the NAI section—is this intentional? (This happens throughout your comment, in fact.)