Gurkenglas comments on Contest: $1,000 for good questions to ask to an Oracle AI

Gurkenglas 26 Aug 2019 15:36 UTC
3 points
Counterfactual Oracle submission. In case of erasure, have humans write a message to send back in time. Make the Oracle produce the probability distribution over messages. This distribution isn’t safe for humans to read, because some of the messages were written by UFAI. If we read one sample from the distribution, the probability of our doom is as it would usually be over the simulated time frame. Use differential privacy theory to design a query to the distribution that does better than this.

More detail: Sample a hundred answers. Ten of them were written by UFAI. Consider the sample list to lie in the space Message^100, where two lists are adjacent iff they differ by one entry. We are going to get about the worst query outcome in some 10-ball. Differentially private queries enforce that each step in listspace changes the probability of each query answer by at most a factor of 1+epsilon. For example, when asking how many messages sound panicked to some dumb text processor, we could randomize the answer up or down a little in order to bound the optimization power of the UFAIs to, say, one bit.
What links here?
- Gurkenglas's comment on Contest: $1,000 for good questions to ask to an Oracle AI by Stuart_Armstrong (28 Aug 2019 11:25 UTC; 5 points)
- Gurkenglas's comment on Analysing: Dangerous messages from future UFAI via Oracles by Stuart_Armstrong (22 Nov 2019 16:52 UTC; 3 points)
- Wei Dai 27 Aug 2019 1:28 UTC
  2 points
  Parent
  I can’t tell if there’s a good idea here or not. Can you explain more? The first sentence that I have trouble with is “We are going to get about the worst query outcome in some 10-ball.” Also how much background in differential privacy theory are you assuming? (Which papers or articles do we need to read first to understand your idea?)
  - Gurkenglas 27 Aug 2019 2:04 UTC
    3 points
    Parent
    In the worst case, the UFAIs cooperate and choose their ten list entries adversially to our protocol. The subspace of Message^100 within which we might as well assume that we get the worst outcome lies in a ball of radius 10, aka the sample lists that can be reached in ten adjacency steps from some list.
    My differential privacy knowledge comes from a broader university course. Its idea is that you are managing private data, perhaps medical, and want to be able to answer database queries without comprimising individuals. So you randomize query answers, to gurantee that changing an individual entry makes little difference to the possible query outputs. If going one adjacency step multiplies the probability of each possible output by between 0.99 and 1.01, ten steps will stay within [0.99^10, 1.01^10]. The same theory should let you make sure you can’t hear Cthulhu’s murmurs in the noise. If each output probability can be changed only by a factor of two by moving within the ball, that’s one bit of optimization power for the UFAIs.
    An easy way to gurantee that a query which counts some subset of your list is differentially private is to use a Laplace distribution: Note how every step to the side divides the probability by 1.01.
    I had hoped a reader of the first paragraph to see that there should be safer queries than sampling the distribution, and to perhaps need a google keyword like differential privacy to design it. Then I expected to be asked for more detail, so I added a paragraph that I hoped to be understandable as-is. :(
    - Wei Dai 27 Aug 2019 5:29 UTC
      2 points
      Parent
      Ok, I think I understand now. But it seems like imposing a differential privacy constraint on the query makes many desirable uses of counterfactual oracles (such as all of my submissions) impossible. Correct? You gave the example of “asking how many messages sound panicked to some dumb text processor” but that doesn’t seem hugely useful. Do you have any other ideas?
      - Gurkenglas 27 Aug 2019 11:32 UTC
        3 points
        Parent
        We can make any query differentially private given a metric map (one that doesn’t increase any distance) from Message^100 to a space of possible query outputs. Set the probability mass of each query output to 0.99^(steps removed from the default answer). (Then normalize.)
        For the identity metric map of a human just trying to read the sample list, this scrambles it entirely. The metric map image needs to branch less combinatorially for this not to happen.
        One metric map image metric d(a,b) that comes to mind is one that bounds the utility cost of getting answer b instead of a. For example, we could ask the counterfactual humans to send back stock market prices, and try to calculate a trading policy that is profitable even if some forecasts are fake. And then, whoever is willing to assume the lowest UFAI probability wins the market! x(