I have a question about AI safety. I’m sorry in advance if it’s too obvious, I just couldn’t find an answer on the internet or in my head.
The way AI has bad consequences is through its drive to maximize (destroys the world in order to produce paperclips more efficiently). If you instead designed AIs to:
1) find a function/algorithm within an error range of the goal,
2)stop once that method is found,
3) do 1) and 2) while minimizing the amount of resources it uses and/or its effect on the outside world
If the above could be incorporated as a convention into any AI designed, would that mitigate the risk of AI going “rougue”?
It’s one of the proposed plans. The main difficulty is that low impact is hard to formalize. For example, if you ask the AI to cure cancer with low impact, it might give people another disease that kills them instead, to keep the global death rate constant. Fully unpacking “low impact” might be almost as hard as the friendliness problem. See this page for more. The LW user who’s doing most work on this now is Stuart Armstrong.
I have a question about AI safety. I’m sorry in advance if it’s too obvious, I just couldn’t find an answer on the internet or in my head.
The way AI has bad consequences is through its drive to maximize (destroys the world in order to produce paperclips more efficiently). If you instead designed AIs to: 1) find a function/algorithm within an error range of the goal, 2)stop once that method is found, 3) do 1) and 2) while minimizing the amount of resources it uses and/or its effect on the outside world
If the above could be incorporated as a convention into any AI designed, would that mitigate the risk of AI going “rougue”?
It’s one of the proposed plans. The main difficulty is that low impact is hard to formalize. For example, if you ask the AI to cure cancer with low impact, it might give people another disease that kills them instead, to keep the global death rate constant. Fully unpacking “low impact” might be almost as hard as the friendliness problem. See this page for more. The LW user who’s doing most work on this now is Stuart Armstrong.