TLW comments on Interacting with a Boxed AI

TLW 2 Apr 2022 18:07 UTC
1 point
We give it only a very short window of time to make its move, so that it can’t manipulate move timings.
You’ve already lost.
1ms is $2^{134}$ Planck times. 3 bits in a millisecond is “actually” $2^{134} choose 3$ possibilities, or ~ $2^{137}$ possibilities. This is ~137 bits of information.
Now the question becomes ‘are any of those ~ $2^{137}$ possibilities dangerous’… which we could answer if we knew the exact laws of physics, but we don’t. Even if we make a Newtonian approximation: Newtonian mechanics being chaotic, it’s entirely possible that one of said possibilities e.g. causes an asteroid on a chaotic tumble to hit the Earth in a couple thousand years.
‘c=spd of light. y=1/sqrt(1-v^2/c^2). all lengths scale down by y’ is 65 characters long, or 520 bits.
This a) uses a terrible encoding (8 bits / character) and b) is nowhere near as compressed as it could be even given the encoding.
Information content of human text is “only” ~9.8 bits / word^[1]. A ballpark guess then is that said 137 bits can encode ~14 words worth of information.
Is there a series of 14 words that can encode the essence of e.g. Relativity? I would not be surprised.
1. ^
  https://www.sciencedirect.com/science/article/pii/S0019995864903262
- aphyer 2 Apr 2022 18:22 UTC
  7 points
  Parent
  You’ve already lost.
  1ms is $2^{134}$ Planck times. 3 bits in a millisecond is “actually” $2^{134} choose 3$ possibilities, or ~ $2^{137}$ possibilities. This is ~137 bits of information.
  I think you’re mostly just denying the premise here.
  If we actually control the communications channel, we can do something like this:
  - Set up a bot that reads input along the channel.
  - Interpret the first 1-3 bits received as a Tic-Tac-Toe move.
  - When we reach the 10-second mark, send on to us the resulting move.
  - If no input is received in those 10 seconds, send us a random move.
  I am aware that we currently don’t know how to do this in a way that is secure against a superhuman AI. Right now an AI-in-a-box likely doesn’t even need to hack the mind of its Gatekeeper to get out, since it can plausibly e.g. abuse computer architecture to hack its way out, or hack and bypass the bot we set up along the communications channel to communicate more than we wanted it to. I’m premising this post on ‘imagine that we have solved the problems inherent in boxing an AI, and are only facing the issues in talking to it once it is boxed’.
  This a) uses a terrible encoding (8 bits / character) and b) is nowhere near as compressed as it could be even given the encoding.
  Absolutely! On the other hand, we can’t use any encoding we please, since we need the 1850 physicists to be able to decode it. A modern physicist knows that ‘c’ stands for ‘speed of light’ - but I don’t think 1850 physicists had a similar concept of the importance of the speed of light, and if they were writing an optimized encoding for us it might well not contain a term for ‘speed of light’.
  Similarly, we can’t actually let the superintelligence specify its encoding—we have to specify the encoding for it, and it has to try to fit a message into our encoding. If we write a detailed encoding for the AI to tell us how to solve physics problems, and the valuable info the AI actually wants to communicate doesn’t fit well into that encoding, we lose ground.
  So what kind of encoding could we write today that would let a hypothesized superintelligent AI communicate as much value to us as possible in as few bits as possible?
  For example, one idea might be:
  ‘Here is a list of 200 different plausible avenues of inquiry that we gathered from scientists. Send back 55 bits (log2(200C10)) indicating which 10 of those we should throw more resources at.’
  - TLW 2 Apr 2022 21:46 UTC
    2 points
    Parent
    I think you’re mostly just denying the premise here.
    I’m arguing that the premise is itself unrealistic, yes. If you assume false, then you can ‘prove’ anything.
    It’s like saying “assume you had a Halting oracle”. It can be a useful tool in some cases where you can e.g. show that even with something unrealistic you can’t do X, but that’s about it.
    On the other hand, we can’t use any encoding we please, since we need the 1850 physicists to be able to decode it.
    Fair. That being said, you can still do significantly better than 520 bits.
    So what kind of encoding could we write today that would let a hypothesized superintelligent AI communicate as much value to us as possible in as few bits as possible?
    I mean, we already kind of have a general answer here from Kolmogorov complexity / Minimum Description Length / friends. “Give me a Binary Lambda Calculus^[1] program that when run gives us as much value as possible”.
    Don’t split into multiple separate queries. Figure out your risk level in # of bits, and ask for a single program that is that many bits.
    This requires no more than o(1) additional bits beyond the optimal for any possible output^[2].
    ^
    https://web.archive.org/web/20161019165606/https://en.wikipedia.org/wiki/Binary_lambda_calculus^[3]
    ^
    Of course, said o(1) potentially hides a large constant, but empirically BLC is fairly compact. There’s still few hundred bits of overhead, however, which isn’t great.
    ^
    The deletionists have won :-(