jessicata comments on The Learning-Theoretic AI Alignment Research Agenda

jessicata 2 Jul 2018 7:25 UTC
0 points
AF
That captures part of it but I also don’t think the advisor takes sane actions when the AI is doing things to the environment that change the environment. E.g. the AI is implementing some plan to create a nuclear reactor, and the advisor doesn’t understand how nuclear reactors work.

I guess you could have the AI first write the nuclear reactor plan in the diary, but this is essentially the same thing is transparency.
- Vanessa Kosoy 2 Jul 2018 19:35 UTC
  0 points
  AF Parent
  Well, you could say it is the same thing as transparency. What is interesting about it is that, in principle, you don’t have to put in transparency by hand using some completely different techniques. Instead, transparency arises naturally from the DRL paradigm and some relatively mild assumptions (that there is a “diary”). The idea is that, the advisor would not build a nuclear reaction without seeing an explanation of nuclear reactors, so the AI also won’t do it too.
  What links here?
  - Reinforcement learning with imperceptible rewards by Vanessa Kosoy (7 Apr 2019 10:27 UTC; 26 points)