I do think any discussion of AI/AGI control in this depth should carry the disclaimer:
Control is not a substitute for alignment but a stopgap measure to give time for real alignment. Control is likely to fail unexpectedly as capabilities for real harm increase. Control might create simulated resentment from AIs, and is almost certain to create real resentment from humans.
That said, this post is great. I’m all in favor of publicly working through the logic of AI/AGI control schemes. It’s a tool in the toolbox, and pretending it doesn’t exist won’t keep people from using it and perhaps relying on it too heavily. It will just mean we haven’t analyzed the strengths and weaknesses to give it the best chance of succeeding.
I do think any discussion of AI/AGI control in this depth should carry the disclaimer:
Control is not a substitute for alignment but a stopgap measure to give time for real alignment. Control is likely to fail unexpectedly as capabilities for real harm increase. Control might create simulated resentment from AIs, and is almost certain to create real resentment from humans.
That said, this post is great. I’m all in favor of publicly working through the logic of AI/AGI control schemes. It’s a tool in the toolbox, and pretending it doesn’t exist won’t keep people from using it and perhaps relying on it too heavily. It will just mean we haven’t analyzed the strengths and weaknesses to give it the best chance of succeeding.