johnswentworth comments on Fixing The Good Regulator Theorem

johnswentworth 10 Feb 2021 17:22 UTC
LW: 4 AF: 3
AF
Your bullet points are basically correct. In practice, applying the theorem to any particular NN would require some careful setup to make the causal structure match—i.e. we have to designate the right things as “system”, “regulator”, “map”, “inputs X & Y”, and “outcome”, and that will vary from architecture to architecture. But I expect it can be applied to most architectures used in practice.
I’m probably not going to turn this into a paper myself soon. At the moment, I’m pursuing threads which I think are much more promising—in particular, thinking about when a “regulator’s model” mirrors the structure of the system/environment, not just its black-box functionality. This was just a side-project within that pursuit. If someone else wants to turn this into a paper, I’d be happy to help, and there’s enough technical work to be done in applying it to NNs that you wouldn’t just be writing up this post.
- Daniel Kokotajlo 10 Feb 2021 17:38 UTC
  LW: 2 AF: 1
  AF Parent
  Doesn’t sound like a job for me, but would you consider e.g. getting a grant to hire someone to coauthor this with you? I think the “getting a grant” part would not be the hard part.
  - johnswentworth 10 Feb 2021 20:02 UTC
    LW: 4 AF: 3
    AF Parent
    Yeah, “get a grant” is definitely not the part of that plan which is a hard sell. Hiring people is a PITA. If I ever get to a point where I have enough things like this, which could relatively-easily be offloaded to another person, I’ll probably do it. But at this point, no.