HoldenKarnofsky comments on A Playbook for AI Risk Reduction (focused on misaligned AI)

HoldenKarnofsky 14 Jul 2023 23:48 UTC
2 points
0
I’m not convinced it requires a huge compute tax to reliably avoid being caught. (If I were, I would in fact probably be feeling a lot more chill than I am.)
The analogy to humans seems important. Humans are capable of things like going undercover, and pulling off coups, and also things like “working every day with people they’d fire if they could, without clearly revealing this.” I think they mostly pull this off with:
- Simple heuristics like “Be nice, unless you’re in the very unusual situation where hostile action would work well.” (I think the analogy to how AIs might behave is straightforward.)
- The fact that they don’t need to be perfect—lots of behaviors and situations are just ambiguous. (I think this also applies to AI, and maybe more so: there could be lots of weird little incidents that are chalked up to bugs without causing concern sufficient to address the problem reliably.)
- Also, humans generally need to do a lot of reasoning along the lines of “X usually works, but I do need to notice the rare situations when something radically different is called for.” So if this is expensive, they just need to be doing that expensive thing a lot.