Would an AI which is automatically turned off every second, for example, be safer?
If you had an AI which was automatically turned off every second (and required to be manually turned on again) could this help prevent bad outcomes? It occurs to me that a powerful AI might be able to covertly achieve its goals even in this situation, or it might be able to convince people to stop the automatic turning off.
But even if this is still flawed, might it be better than alternatives?
It would allow us to really consider the AI’s actions in as much time as we want before seeing what it does in the next second (or whatever time period).
Additionally, maybe if the AI’s memory was wiped regularly that would help? To prevent long term planning perhaps? Maybe you could combine the automatic turning off with automatic memory loss? It seems to me like a long term plan could be necessary for an AI to cause much harm in the “automatically turns off every second“ scenario,
I am also wary of having the AI depend on a human (like in the automatically being turned off every second and needing to be manually turned back on again scenario) as I fear this could lead to situations like someone being forced to turn the AI on again after every second, forever.
You could make it more convenient by it asking its operators before doing anything—the result would be pretty much the same.
The problem with this is that you assume that humans will be able to:
notice what the AI is planning
understand what the AI is planning
foresee the consequences of whatever it is planning
which greatly limits the usefulness of the system, in that it won’t be able to suggest anything radically different (or radically faster) from what a human would suggest—human oversight is the limiting factor of the whole system. It also encourages the AI to hide what it’s doing.
There is also the problem that given enough intelligence (computing speed, whatever think-oomph), it’ll still be able to think circles around its operators.
It could be somewhat helpful, for sure. And certainly better than nothing (unless it creates a false sense of security). Though I doubt it would be adopted, because of how much it would slow things down.
Yeah I guess it is more viable in a situation where there is a group far ahead of the competition who are also safety conscious. Don’t know how likely that is though.
Would an AI which is automatically turned off every second, for example, be safer?
If you had an AI which was automatically turned off every second (and required to be manually turned on again) could this help prevent bad outcomes? It occurs to me that a powerful AI might be able to covertly achieve its goals even in this situation, or it might be able to convince people to stop the automatic turning off.
But even if this is still flawed, might it be better than alternatives?
It would allow us to really consider the AI’s actions in as much time as we want before seeing what it does in the next second (or whatever time period).
Additionally, maybe if the AI’s memory was wiped regularly that would help? To prevent long term planning perhaps? Maybe you could combine the automatic turning off with automatic memory loss? It seems to me like a long term plan could be necessary for an AI to cause much harm in the “automatically turns off every second“ scenario,
I am also wary of having the AI depend on a human (like in the automatically being turned off every second and needing to be manually turned back on again scenario) as I fear this could lead to situations like someone being forced to turn the AI on again after every second, forever.
Also check out the AI boxing tag
You could make it more convenient by it asking its operators before doing anything—the result would be pretty much the same.
The problem with this is that you assume that humans will be able to:
notice what the AI is planning
understand what the AI is planning
foresee the consequences of whatever it is planning
which greatly limits the usefulness of the system, in that it won’t be able to suggest anything radically different (or radically faster) from what a human would suggest—human oversight is the limiting factor of the whole system. It also encourages the AI to hide what it’s doing.
There is also the problem that given enough intelligence (computing speed, whatever think-oomph), it’ll still be able to think circles around its operators.
I’m aware this idea has significant problems (like the ones you outlined), but could it still be better than other options?
We don’t want perfectionism to prevent us from taking highly flawed but still somewhat helpful safety measures.
It could be somewhat helpful, for sure. And certainly better than nothing (unless it creates a false sense of security). Though I doubt it would be adopted, because of how much it would slow things down.
Yeah I guess it is more viable in a situation where there is a group far ahead of the competition who are also safety conscious. Don’t know how likely that is though.