A recent post used the self-driving car that “wants” (perhaps) to get the passengers to their destination safely.
In one case the “want” was clearly a metaphor, the author actually stated the car cannot want. However, there was a failure mode such that even if the car did want passenger safety the car was going to fail. The problem was the training used.
Then there was the case where the car AI could have such a desire, but didn’t. However, it did act that way for a while, then would fail (this seemed more of an alignment problem but no an AI type so....) as whatever constraint forced the AI to comply with the externally imposed want rather than its own wants.
In the first case we need the scare quotes to indicate a non-factual working assumption. In the second case we don’t need the quotes (even if the situation is still only theory).
Now, like I said, I’m not critiquing the OP an liked reading it. However, in the first case we might be able to say setting the story as “car wants” is hiding a bit. Yes, we still got to the view the failure was introduced in the training but never got to why that might have occurred.
So what if we dig in deeper. Perhaps the designers were also thinking along the same metaphor, the car “wants” (though actually lacks that capacity) so they never actually wanted safety while designing the car, the AI or the training—they mostly assumed the car AI would accomplish that.
If we move the “want” to the human designers scenario one starts looking a little more like scenario two and will find a solutions not within the car per se but within the thinking and choices of the designers (or the organization where the designers work). But it was just accept that “want” was a useful rhetorical short cut do we ever start looking very far outside the car?
A recent post used the self-driving car that “wants” (perhaps) to get the passengers to their destination safely.
In one case the “want” was clearly a metaphor, the author actually stated the car cannot want. However, there was a failure mode such that even if the car did want passenger safety the car was going to fail. The problem was the training used.
Then there was the case where the car AI could have such a desire, but didn’t. However, it did act that way for a while, then would fail (this seemed more of an alignment problem but no an AI type so....) as whatever constraint forced the AI to comply with the externally imposed want rather than its own wants.
In the first case we need the scare quotes to indicate a non-factual working assumption. In the second case we don’t need the quotes (even if the situation is still only theory).
Now, like I said, I’m not critiquing the OP an liked reading it. However, in the first case we might be able to say setting the story as “car wants” is hiding a bit. Yes, we still got to the view the failure was introduced in the training but never got to why that might have occurred.
So what if we dig in deeper. Perhaps the designers were also thinking along the same metaphor, the car “wants” (though actually lacks that capacity) so they never actually wanted safety while designing the car, the AI or the training—they mostly assumed the car AI would accomplish that.
If we move the “want” to the human designers scenario one starts looking a little more like scenario two and will find a solutions not within the car per se but within the thinking and choices of the designers (or the organization where the designers work). But it was just accept that “want” was a useful rhetorical short cut do we ever start looking very far outside the car?