David Scott Krueger (formerly: capybaralet) comments on Open question: are minimal circuits daemon-free?

David Scott Krueger (formerly: capybaralet) 27 Jun 2019 17:13 UTC
1 point
A concrete vision:
Suppose the best a system can do without a daemon is 97% accuracy.
The daemon can figure out how to get 99% accuracy.
But in order to outperform other systems, it can just provide 98% accuracy, and use 1% of inputs to pursue it’s own agenda.
This all happens on-distribution.

If there are multiple daemon-containing systems competing for survival (with selection happening according to accuracy), this might force them to maximize accuracy, instead of just beating a “non-daemon baseline”.
- Liam Donovan 27 Jun 2019 21:31 UTC
  1 point
  Parent
  This is all only relevant to downstream daemons, right? If so, I don’t understand why the DD would ever provide 98% accuracy; I’d expect it to provide 99% accuracy until it sees a chance to provide [arbitarily low]% accuracy and start pursuing its agenda directly. As you say, this might happen due to competition between daemon-containing systems, but I think a DD would want to maximize its chances of survival by maximizng its accuracy either way.
  - David Scott Krueger (formerly: capybaralet) 28 Jun 2019 4:47 UTC
    2 points
    Parent
    I think it’s relevant for either kind (actually, I’m not sure I like the distinction, or find it particularly relevant).
    If there aren’t other daemons to compete with, then 98% is sufficient for survival, so why not use the extra 1% to begin pursuing your own agenda immediately and covertly? This seems to be how principle-agent problems often play out in real life with humans.