For example, when the comment says “the formalization of the notion of ‘safety’ used by the proof is wrong,” it is not clear whether it means that the values the programmers have in mind are not correctly implemented by the formalization, or whether it means they are correctly implemented but are themselves catastrophic in a way that hasn’t been anticipated.
Both (with the caveat that SI’s plans are to implement an extrapolation procedure for the values, and not the values themselves).
Another way of putting this is that a “tool” has an underlying instruction set that conceptually looks like: “(1) Calculate which action A would maximize parameter P, based on existing data set D. (2) Summarize this calculation in a user-friendly manner, including what Action A is, what likely intermediate outcomes it would cause, what other actions would result in high values of P, etc.”
I think such a Tool-AI will be much less powerful than an equivalent Agent-AI, due to the bottleneck of having to summarize its calculations in a human-readable form, and then waiting for the human to read and understand the summary and then make a decision. It’s not even clear that the huge amounts of calculations that a Tool-AI might do in order to find optimal actions can be summarized in any useful way, or this process of summarization can be feasibly developed before others create Agent-AIs. (Edit: See further explanation of this problem here.) Of course you do implicitly acknowledge this:
Some have argued to me that humans are likely to choose to create agent-AGI, in order to quickly gain power and outrace other teams working on AGI. But this argument, even if accepted, has very different implications from SI’s view. [...] It seems that the appropriate measures for preventing such a risk are security measures aiming to stop humans from launching unsafe agent-AIs, rather than developing theories or raising awareness of “Friendliness.”
I do accept this argument (and have made similar arguments), except that I advocate trying to convince AGI researchers to slow down development of all types of AGI (including Tool-AI, which can be easily converted into Agent-AI), and don’t think “security measures” are of much help without a world government that implements a police state to monitor what goes on in every computer. Convincing AGI researchers to slow down is also pointless without a simultaneous program to create a positive Singularity via other means. I’ve written more about my ideas here, here, and here.
Both (with the caveat that SI’s plans are to implement an extrapolation procedure for the values, and not the values themselves).
(Responding to hypothetical-SingInst’s position:) It seems way too first-approximation-y to talk about values-about-extrapolation as anything other than just a subset of values—and if you look at human behavior, values about extrapolation vary very much and are very tied into object-level values. (Simply consider hyperbolic discounting! And consider how taking something as basic as coherence/consistency to its logical extreme leads to either a very stretched ethics or a more fitting but very different meta-ethics like theism.) Even if it were possible to formalize such a procedure it would still be fake meta. “No: at all costs, it is to be prayed by all men that Shams may cease.”
Is compiler an agent by your definition? We don’t read it’s output, usually. And it may try to improve runtime performance. It however differs in one fundamental way from agents—the value for the code actually running is not implemented into the compiler.
Some comments on objections 1 and 2.
Both (with the caveat that SI’s plans are to implement an extrapolation procedure for the values, and not the values themselves).
I think such a Tool-AI will be much less powerful than an equivalent Agent-AI, due to the bottleneck of having to summarize its calculations in a human-readable form, and then waiting for the human to read and understand the summary and then make a decision. It’s not even clear that the huge amounts of calculations that a Tool-AI might do in order to find optimal actions can be summarized in any useful way, or this process of summarization can be feasibly developed before others create Agent-AIs. (Edit: See further explanation of this problem here.) Of course you do implicitly acknowledge this:
I do accept this argument (and have made similar arguments), except that I advocate trying to convince AGI researchers to slow down development of all types of AGI (including Tool-AI, which can be easily converted into Agent-AI), and don’t think “security measures” are of much help without a world government that implements a police state to monitor what goes on in every computer. Convincing AGI researchers to slow down is also pointless without a simultaneous program to create a positive Singularity via other means. I’ve written more about my ideas here, here, and here.
(Responding to hypothetical-SingInst’s position:) It seems way too first-approximation-y to talk about values-about-extrapolation as anything other than just a subset of values—and if you look at human behavior, values about extrapolation vary very much and are very tied into object-level values. (Simply consider hyperbolic discounting! And consider how taking something as basic as coherence/consistency to its logical extreme leads to either a very stretched ethics or a more fitting but very different meta-ethics like theism.) Even if it were possible to formalize such a procedure it would still be fake meta. “No: at all costs, it is to be prayed by all men that Shams may cease.”
Is compiler an agent by your definition? We don’t read it’s output, usually. And it may try to improve runtime performance. It however differs in one fundamental way from agents—the value for the code actually running is not implemented into the compiler.