Wei Dai comments on Can corrigibility be learned safely?

Wei Dai 26 Apr 2018 2:28 UTC
LW: 4 AF: 1
AF

Instead, when something goes wrong, you add the data to whatever dataset of experiences you are maintaining (or use amplification to decide how to update some small sketch), and then trust the mechanism that makes decisions from that database.

By “mechanism that makes decisions from that database” are you thinking of some sort of linguistics mechanism, or a mechanism for general scientific research?

The reason I ask is, what if what went wrong was that H is missing some linguistics concept, for example the concept of implicature? Since we can’t guarantee that H knows all useful linguistics concepts (the field of linguistics may not be complete), it seems that in order to “make fewer errors than the RL agent (in the infinite computing limit)” IDA has to be able to invent linguistics concepts that H doesn’t know, and if IDA can do that then presumably IDA can do science in general?

If the latter (mechanism for general scientific research) is what you have in mind, we can’t really show that meta-execution is hopeless by pointing to some object-level task that it doesn’t seem able to do, because if we run into any difficulties we can always say “we don’t know how to do X with meta-execution, but if IDA can learn to do general scientific research, then it will invent whatever tools are needed to do X”.

Does this match your current thinking?
- paulfchristiano 5 May 2018 19:20 UTC
  LW: 4 AF: 2
  AF Parent
  There is some mechanism the RL agent uses, which doesn’t rest on scientific research. IDA should use the same mechanism.
  This may sometimes involve “heuristic X works well empirically, but has no detectable internal structure.” In those cases IDA needs to be able to come up with a safe version of that procedure (i.e. a version that wouldn’t leave us at a disadvantage relative to people who just want to maximize complexity or whatever). I think the main obstacle to safety is if heuristic X itself involves consequentialism. But in that case there seems to necessarily be some internal structure. (This is the kind of thing that I have been mostly thinking about recently.)
  - Wei Dai 5 May 2018 23:43 UTC
    LW: 6 AF: 3
    AF Parent
    There is some mechanism the RL agent uses, which doesn’t rest on scientific research. IDA should use the same mechanism.
    How does IDA find such a mechanism, if not by scientific research? RL does it by searching for weights that do well empirically, and William and I were wondering if that idea could be adapted to IDA but you said “Searching for trees that do well empirically is scary business, since now you have all the normal problems with ML.” (I had interpreted you to mean that we should avoid doing that. Did you actually mean that we should try to figure out a safe way to do it?)
    - paulfchristiano 6 May 2018 0:27 UTC
      LW: 5 AF: 3
      AF Parent
      I think you need to do some trial and error, and was saying we should be scared of it ( / be careful about it / minimize it, though it’s subtle why minimization might help).
      For example, suppose that I put a random 20 gate circuit in a black box and let you observe input-output behavior. At some point you don’t have any options other than guess and check, and no amount of cleverness about alignment could possibly avoid the need to sometimes use brute force.