At the risk of providing a non-answer I’ll say: Operant conditioning.
The test problem, the solving of it, and getting an answer correspond to a light coming on, pressing a lever, and getting food.
We’ve long since been trained that solving problems in that context build up token points that will pay out later in praise and promises of money.
Presumably this training translates fairly well to real world problems.
Indeed, that’s the conclusion I came to. What I wonder now is how we operant-condition ourselves without just reinforcing reinforcement itself. Which, I suppose, is more or less precisely what the Friendly AI problem is.
At the risk of providing a non-answer I’ll say: Operant conditioning.
The test problem, the solving of it, and getting an answer correspond to a light coming on, pressing a lever, and getting food.
We’ve long since been trained that solving problems in that context build up token points that will pay out later in praise and promises of money.
Presumably this training translates fairly well to real world problems.
Indeed, that’s the conclusion I came to. What I wonder now is how we operant-condition ourselves without just reinforcing reinforcement itself. Which, I suppose, is more or less precisely what the Friendly AI problem is.