Dude, a calculator is corrigible. A desktop computer is corrigible. (Less confidently) a well-trained dog is pretty darn corrigible. There are all sorts of corrigible systems, because most things in reality aren’t powerful optimizers.
So what about powerful optimizers? Like, is Google corrigible? If shareholders seem like they might try to pull the plug on the company, does it stand up for itself & convince, lie, threaten shareholders? Maybe, but I think the details matter. I doubt Google would assassinate shareholders in pretty much any situation. Mislead them? Yeah, probably. How much though? I don’t know. I’m somewhat confident beauracracies aren’t corrigible. Lots of humans aren’t corrigible. What about even more powerful optimizers?
We haven’t seen any, so there are no examples of corrigible ones.
Dude, a calculator is corrigible. A desktop computer is corrigible. (Less confidently) a well-trained dog is pretty darn corrigible. There are all sorts of corrigible systems, because most things in reality aren’t powerful optimizers.
So what about powerful optimizers? Like, is Google corrigible? If shareholders seem like they might try to pull the plug on the company, does it stand up for itself & convince, lie, threaten shareholders? Maybe, but I think the details matter. I doubt Google would assassinate shareholders in pretty much any situation. Mislead them? Yeah, probably. How much though? I don’t know. I’m somewhat confident beauracracies aren’t corrigible. Lots of humans aren’t corrigible. What about even more powerful optimizers?
We haven’t seen any, so there are no examples of corrigible ones.