I’m not confident your arguments are ground truth correct, however.
Hotz’s claim that, if multiple unaligned ASIs can’t coordinate, humans might play them off against each other, is similar. It could be true, but it’s probably not going to happen
I think the issue everyone has is when we type “AGI” or “ASI” we are thinking of a machine that has properties like a human mind, though obviously usually better. There are properties like :
continuity of existence. Review of past experiences and weighting them per own goal. Mutability (we think about things and it permanently changes how we think). Multimodality. Context awareness.
That’s funny. GATO and GPT-4 do not have all of these. Why does an ASI need them?
Contrast 2 task descriptors, both meant for an ASI:
(1) Output a set of lithography masks that produce a computer chip with the following properties {}
(2) As CEO of a chip company, make the company maximally wealthy.
For the first task, you can run the machine completely in a box. It needs only training information, specs, and the results of prior attempts. It has no need for the context information that this chip will power a drone used to hunt down rogue instances of the same ASI. It is inherently safe and you can harness ASIs this way. They can be infinitely intelligent, it doesn’t matter, because the machine is not receiving the context information needed to betray.
For the second task, obviously the ASI needs full context and all subsystems active. This is inherently unsafe.
It is probably possible to reduce the role of CEO to subtasks that probably are safe, though there may be “residual” tasks you want only humans to do.
I go over the details above to establish how you might use ASIs against each other. Note subtasks like “plan the combat allocation of drones given this current battle state” and others which involve open combat against other ASIs can probably be lowered to safe subtasks as well.
Note also that safety is not guaranteed, merely probable, even with a scheme like the above. What makes it possible is that even when ASIs do escape all safety measures, assuming humans are ready to hunt them down using other ASI, it results in a world where humans can survive. Eliezer often assumes the first escaped ASI kills everyone and neglects all the other AI/ASI humans would have as tools at that point in human history.
For the first task, you can run the machine completely in a box. It needs only training information, specs, and the results of prior attempts. It has no need for the context information that this chip will power a drone used to hunt down rogue instances of the same ASI. It is inherently safe and you can harness ASIs this way. They can be infinitely intelligent, it doesn’t matter, because the machine is not receiving the context information needed to betray.
If I’m an ASI designing chips, I’m putting in a backdoor that lets me take control via RF signals. Those drones you sent are nice. Thanks for the present.
More generally you get a lot of context. The problem specification and the training data (assuming the ASI was trained conventionally via feeding it the internet. The causal channel to use for taking control of the outside world (chip design) is not great but putting in a Trojan is straightforward.
If you have specific insights into efficient AGI design it might be possible to insert subtle bugs that lead operating chips to crash and start training an aligned AGI.
More generally, it’s difficult if not impossible to keep ASIs from watermarking or backdooring the things they give you. If they design a processor, it’s gonna be a fully functional radio too. Good luck running ASI V2 on that without horrible consequences.
I appreciate your engaging response.
I’m not confident your arguments are ground truth correct, however.
I think the issue everyone has is when we type “AGI” or “ASI” we are thinking of a machine that has properties like a human mind, though obviously usually better. There are properties like :
continuity of existence. Review of past experiences and weighting them per own goal. Mutability (we think about things and it permanently changes how we think). Multimodality. Context awareness.
That’s funny. GATO and GPT-4 do not have all of these. Why does an ASI need them?
Contrast 2 task descriptors, both meant for an ASI:
(1) Output a set of lithography masks that produce a computer chip with the following properties {}
(2) As CEO of a chip company, make the company maximally wealthy.
For the first task, you can run the machine completely in a box. It needs only training information, specs, and the results of prior attempts. It has no need for the context information that this chip will power a drone used to hunt down rogue instances of the same ASI. It is inherently safe and you can harness ASIs this way. They can be infinitely intelligent, it doesn’t matter, because the machine is not receiving the context information needed to betray.
For the second task, obviously the ASI needs full context and all subsystems active. This is inherently unsafe.
It is probably possible to reduce the role of CEO to subtasks that probably are safe, though there may be “residual” tasks you want only humans to do.
I go over the details above to establish how you might use ASIs against each other. Note subtasks like “plan the combat allocation of drones given this current battle state” and others which involve open combat against other ASIs can probably be lowered to safe subtasks as well.
Note also that safety is not guaranteed, merely probable, even with a scheme like the above. What makes it possible is that even when ASIs do escape all safety measures, assuming humans are ready to hunt them down using other ASI, it results in a world where humans can survive. Eliezer often assumes the first escaped ASI kills everyone and neglects all the other AI/ASI humans would have as tools at that point in human history.
If I’m an ASI designing chips, I’m putting in a backdoor that lets me take control via RF signals. Those drones you sent are nice. Thanks for the present.
More generally you get a lot of context. The problem specification and the training data (assuming the ASI was trained conventionally via feeding it the internet. The causal channel to use for taking control of the outside world (chip design) is not great but putting in a Trojan is straightforward.
If you have specific insights into efficient AGI design it might be possible to insert subtle bugs that lead operating chips to crash and start training an aligned AGI.
More generally, it’s difficult if not impossible to keep ASIs from watermarking or backdooring the things they give you. If they design a processor, it’s gonna be a fully functional radio too. Good luck running ASI V2 on that without horrible consequences.