From my point of view, you are making an important point that I agree with: corrigibility isn’t uniformly safe for all use cases, you must use it only carefully and in the use-cases it is safe for. I’ve discussed this point with Max a bunch. The key aspect of corrigibility is keeping the operator empowered, and thus is necessarily unsafe in the hands of foolish or malicious operators.
Examples of good use:
further AI alignment research
monitoring the web for rogue AGI
operating and optimizing a factory production line
medical research
helping with mundane aspects of government action, like smoothing out a part of a specific bureaucratic process that needed well-described bounded decision-making (e.g. being a DMV assistant, or a tax-evasion investigator who takes no action other than filing reports on suspected misbehavior)
Examples of bad use:
asking the AI to convince you of something, or even just explain a concept persistently until its sure you understand
trying to do a highly-world-affecting dramatic and irreversible act, such as a pivotal act
trying to implement a value-aligned or PCEV or whatever agent. In fact, trying to create any agent which isn’t just an exact copy of the known-safe current corrigible agent.
trying to research and create particularly dangerous technology, such as self-replicating tech that might get out of hand (e.g. synthetic biology, bioweapons). This is a case where the AI succeeding safely at the task is itself a dangerous result! Now you’ve got a potential Bostrom-esque ‘black ball’ technology in hand, even though the AI didn’t malfunction in any way.
From my point of view, you are making an important point that I agree with: corrigibility isn’t uniformly safe for all use cases, you must use it only carefully and in the use-cases it is safe for. I’ve discussed this point with Max a bunch. The key aspect of corrigibility is keeping the operator empowered, and thus is necessarily unsafe in the hands of foolish or malicious operators.
Examples of good use:
further AI alignment research
monitoring the web for rogue AGI
operating and optimizing a factory production line
medical research
helping with mundane aspects of government action, like smoothing out a part of a specific bureaucratic process that needed well-described bounded decision-making (e.g. being a DMV assistant, or a tax-evasion investigator who takes no action other than filing reports on suspected misbehavior)
Examples of bad use:
asking the AI to convince you of something, or even just explain a concept persistently until its sure you understand
trying to do a highly-world-affecting dramatic and irreversible act, such as a pivotal act
trying to implement a value-aligned or PCEV or whatever agent. In fact, trying to create any agent which isn’t just an exact copy of the known-safe current corrigible agent.
trying to research and create particularly dangerous technology, such as self-replicating tech that might get out of hand (e.g. synthetic biology, bioweapons). This is a case where the AI succeeding safely at the task is itself a dangerous result! Now you’ve got a potential Bostrom-esque ‘black ball’ technology in hand, even though the AI didn’t malfunction in any way.