It’s easy to find ways of searching for truth in ways that harm the instrumental goal.
Example 1: I’m a self-driving car AI, and don’t know whether hitting pedestrians at 35 MPH is somewhat bad, because it injures them, or very bad, because it kills them. I should not gather data to update my estimate.
Example 2: I’m a medical AI. Repeatedly trying a potential treatment that I am highly uncertain about the effects of to get high-confidence estimates isn’t optimal. I should be trying to maximize something other than knowledge. Even though I need to know whether treatments work, I should balance the risks and benefits of trying them.
It’s easy to find ways of searching for truth in ways that harm the instrumental goal.
Example 1: I’m a self-driving car AI, and don’t know whether hitting pedestrians at 35 MPH is somewhat bad, because it injures them, or very bad, because it kills them. I should not gather data to update my estimate.
Example 2: I’m a medical AI. Repeatedly trying a potential treatment that I am highly uncertain about the effects of to get high-confidence estimates isn’t optimal. I should be trying to maximize something other than knowledge. Even though I need to know whether treatments work, I should balance the risks and benefits of trying them.