SamEisenstat comments on Open question: are minimal circuits daemon-free?

SamEisenstat 8 May 2018 20:01 UTC
LW: 10 AF: 7
AF
I’m having trouble thinking about what it would mean for a circuit to contain daemons such that we could hope for a proof. It would be nice if we could find a simple such definition, but it seems hard to make this intuition precise.
For example, we might say that a circuit contains daemons if it displays more optimization that necessary to solve a problem. Minimal circuits could have daemons under this definition though. Suppose that some function $f$ describes the behaviour of some powerful agent, a function $~ f$ is like $f$ with noise added, and our problem is to predict sufficiently well the function $~ f$ . Then, the simplest circuit that does well won’t bother to memorize a bunch of noise, so it will pursue the goals of the agent described by $f$ more efficiently than $~ f$ , and thus more efficiently than necessary.
- paulfchristiano 9 May 2018 5:44 UTC
  LW: 6 AF: 4
  AF Parent
  I don’t know what the statement of the theorem would be. I don’t really think we’d have a clean definition of “contains daemons” and then have a proof that a particular circuit doesn’t contain daemons.
  Also I expect we’re going to have to make some assumption that the problem is “generic” (or else be careful about what daemon means), ruling out problems with the consequentialism embedded in them.
  (Also, see the comment thread with Wei Dai above, clearly the plausible version of this involves something more specific than daemons.)
  - Ofer 7 Dec 2019 22:48 UTC
    LW: 3 AF: 2
    AF Parent
    
    Also I expect we’re going to have to make some assumption that the problem is “generic” (or else be careful about what daemon means), ruling out problems with the consequentialism embedded in them.
    
    I agree. The following is an attempt to show that if we don’t rule out problems with the consequentialism embedded in them then the answer is trivially “no” (i.e. minimal circuits may contain consequentialists).
    
    Let $c$ be a minimal circuit that takes as input a string of length $10^{100}$ that encodes a Turing machine, and outputs a string that is the concatenation of the first $10^{100}$ configurations in the simulation of that Turing machine (each configuration is encoded as a string).
    
    Now consider a string $x^{'}$ that encodes a Turing machine that simulates some consequentialist (e.g. a human upload). For the input $x^{'}$ , the computation of the output of $c$ simulates a consequentialist; and $c$ is a minimal circuit.
- interstice 8 May 2018 23:31 UTC
  1 point
  Parent
  By “predict sufficiently well” do you mean “predict such that we can’t distinguish their output”?
  Unless the noise is of a special form, can’t we distinguish $f$ and $tilde{f}$ by how well they do on $f$’s goals? It seems like for this not to be the case, the noise would have to be of the form “occasionally do something weak which looks strong to weaker agents”. But then we could get this distribution by using a weak (or intermediate) agent directly, which would probably need less compute.
  - paulfchristiano 9 May 2018 1:38 UTC
    2 points
    Parent
    Suppose “predict well” means “guess the output with sufficiently high probability,” and the noise is just to replace the output with something random 5% of the time.
  - SamEisenstat 9 May 2018 1:52 UTC
    1 point
    Parent
    Yeah, I had something along the lines of what Paul said in mind. I wanted not to require that the circuit implement exactly a given function, so that we could see if daemons show up in the output. It seems easier to define daemons if we can just look at input-output behaviour.