Donald Hobson comments on Safer sandboxing via collective separation

Donald Hobson 10 Sep 2020 21:07 UTC
LW: 10 AF: 5
AF
This is my attempt to summarise the scheme.
Imagine that, in order for the AI’s to run the fusion plant, they need an understanding of plasma physics comparable to a human physicist.
These AI’s aren’t individually smart enough to come up with all that maths from first principles quickly.
So, you run a population of these AI’s. They work together to understand abstract mathematics, and then a single member of that population is selected. That single AI is given information about the external world and control over a fusion power plant.
Another abstract topic that the AI’s could work out within that population without access to the external world is AI design.
So the individual AI’s have probably memorised the code for an ASI.
Suppose that the people allowed to submit problems to the AI aren’t being that carefully vetted. Any deluded person with dreams of AI can send a request to the single instance AI. Requests and responses might be being read, but they aren’t being scrutinized in great detail.
Suppose someone sends the AI a request that looks innocuous, but has the following steganographically encoded into it, using a protocol that the typical human wouldn’t easily spot, but the AI would notice.
1. A specification of a simple programming language.
2. A steganographic protocol for the return message.
Upon receiving the message back, the human decodes the message, and runs the code.
Lets make the assumptions even weaker. The individual instances of AI are given a chance to output significant amounts of data somewhere publicly visible, with a trusted human checking the AI’s output before its published. A malicious strategy here is to create a steganographically hidden piece of ASI code, and put it out with clues that are sufficiently subtle that the trusted human doesn’t notice it, yet sufficiently obvious that someone on the internet does.
Sooner or later, a person spots a secret message to them made by combining what they saw in a news article about AI with various mystic texts. They carry on decoding and find the name of a programming language and what seems to be computer code. They run the code.
None of this requires superhuman intelligence. The trickiest bit is coming up with an algorithm for supersmart AI.
I think that a one off transfer of 1 mb information from a malicious superhuman AI, to the internet is probably going to end badly, even if some smart aligned humans checked it and didn’t see anything suspect.