Adam Jermyn comments on Adversarial attacks and optimal control

Adam Jermyn 23 May 2022 17:23 UTC
5 points
Thanks for clarifying!
Maybe the ‘actions → nats’ mapping can be sharpened if it’s not an AI but a very naive search process?
Say the controller can sample k outcomes at random before choosing one to actually achieve. I think that let’s it get ~ln(k) extra nats of surprise, right? Then you can talk about the AI’s ability to control things in terms of ‘the number of random samples you’d need to draw to achieve this much improvement’.
- Jan 29 May 2022 12:42 UTC
  2 points
  Parent
  This sounds right to me! In particular, I just (re-)discovered this old post by Yudkowsky and this newer post by Alex Flint that both go a lot deeper on the topic. I think the optimal control perspective is a nice complement to those posts and if I find the time to look more into this then that work is probably the right direction.