aphyer comments on Interacting with a Boxed AI

aphyer Apr 2, 2022, 6:22 PM
7 points
You’ve already lost.
1ms is $2^{134}$ Planck times. 3 bits in a millisecond is “actually” $2^{134} choose 3$ possibilities, or ~ $2^{137}$ possibilities. This is ~137 bits of information.
I think you’re mostly just denying the premise here.
If we actually control the communications channel, we can do something like this:
- Set up a bot that reads input along the channel.
- Interpret the first 1-3 bits received as a Tic-Tac-Toe move.
- When we reach the 10-second mark, send on to us the resulting move.
- If no input is received in those 10 seconds, send us a random move.
I am aware that we currently don’t know how to do this in a way that is secure against a superhuman AI. Right now an AI-in-a-box likely doesn’t even need to hack the mind of its Gatekeeper to get out, since it can plausibly e.g. abuse computer architecture to hack its way out, or hack and bypass the bot we set up along the communications channel to communicate more than we wanted it to. I’m premising this post on ‘imagine that we have solved the problems inherent in boxing an AI, and are only facing the issues in talking to it once it is boxed’.
This a) uses a terrible encoding (8 bits / character) and b) is nowhere near as compressed as it could be even given the encoding.
Absolutely! On the other hand, we can’t use any encoding we please, since we need the 1850 physicists to be able to decode it. A modern physicist knows that ‘c’ stands for ‘speed of light’ - but I don’t think 1850 physicists had a similar concept of the importance of the speed of light, and if they were writing an optimized encoding for us it might well not contain a term for ‘speed of light’.
Similarly, we can’t actually let the superintelligence specify its encoding—we have to specify the encoding for it, and it has to try to fit a message into our encoding. If we write a detailed encoding for the AI to tell us how to solve physics problems, and the valuable info the AI actually wants to communicate doesn’t fit well into that encoding, we lose ground.
So what kind of encoding could we write today that would let a hypothesized superintelligent AI communicate as much value to us as possible in as few bits as possible?
For example, one idea might be:
‘Here is a list of 200 different plausible avenues of inquiry that we gathered from scientists. Send back 55 bits (log2(200C10)) indicating which 10 of those we should throw more resources at.’
- TLW Apr 2, 2022, 9:46 PM
  2 points
  Parent
  I think you’re mostly just denying the premise here.
  I’m arguing that the premise is itself unrealistic, yes. If you assume false, then you can ‘prove’ anything.
  It’s like saying “assume you had a Halting oracle”. It can be a useful tool in some cases where you can e.g. show that even with something unrealistic you can’t do X, but that’s about it.
  On the other hand, we can’t use any encoding we please, since we need the 1850 physicists to be able to decode it.
  Fair. That being said, you can still do significantly better than 520 bits.
  So what kind of encoding could we write today that would let a hypothesized superintelligent AI communicate as much value to us as possible in as few bits as possible?
  I mean, we already kind of have a general answer here from Kolmogorov complexity / Minimum Description Length / friends. “Give me a Binary Lambda Calculus^[1] program that when run gives us as much value as possible”.
  Don’t split into multiple separate queries. Figure out your risk level in # of bits, and ask for a single program that is that many bits.
  This requires no more than o(1) additional bits beyond the optimal for any possible output^[2].
  1. ^
    https://web.archive.org/web/20161019165606/https://en.wikipedia.org/wiki/Binary_lambda_calculus^[3]
  2. ^
    Of course, said o(1) potentially hides a large constant, but empirically BLC is fairly compact. There’s still few hundred bits of overhead, however, which isn’t great.
  3. ^
    The deletionists have won :-(