Nathan Helm-Burger comments on Corrigibility could make things worse

Nathan Helm-Burger 14 Sep 2024 17:01 UTC
2 points
0
From my point of view, you are making an important point that I agree with: corrigibility isn’t uniformly safe for all use cases, you must use it only carefully and in the use-cases it is safe for. I’ve discussed this point with Max a bunch. The key aspect of corrigibility is keeping the operator empowered, and thus is necessarily unsafe in the hands of foolish or malicious operators.
Examples of good use:
- further AI alignment research
- monitoring the web for rogue AGI
- operating and optimizing a factory production line
- medical research
- helping with mundane aspects of government action, like smoothing out a part of a specific bureaucratic process that needed well-described bounded decision-making (e.g. being a DMV assistant, or a tax-evasion investigator who takes no action other than filing reports on suspected misbehavior)
Examples of bad use:
- asking the AI to convince you of something, or even just explain a concept persistently until its sure you understand
- trying to do a highly-world-affecting dramatic and irreversible act, such as a pivotal act
- trying to implement a value-aligned or PCEV or whatever agent. In fact, trying to create any agent which isn’t just an exact copy of the known-safe current corrigible agent.
- trying to research and create particularly dangerous technology, such as self-replicating tech that might get out of hand (e.g. synthetic biology, bioweapons). This is a case where the AI succeeding safely at the task is itself a dangerous result! Now you’ve got a potential Bostrom-esque ‘black ball’ technology in hand, even though the AI didn’t malfunction in any way.