By “caring about truth” here do you mean wanting systems to make explicit utterances that accurately reflect their actual motives? E.g., if X is a chess-playing AI that doesn’t talk about what it wants at all, just plays chess, would a person who “cares about truth” would also be motivated to give X the ability and inclination to talk about its goals (and do so accurately)?
Or wanting systems not to make explicit utterances that inaccurately reflect their actual motives? E.g., a person who “cares about truth” might also be motivated to remove muflax’s AI’s ability to report on its goals at all? (This would also prevent it from winning Diplomacy games, but we’ve already stipulated that isn’t a showstopper.)
I intended both (i.e. that they wanted accurate statements to be uttered and no inaccurate statements) but the distinction isn’t important to my argument, which was just that they want what they want.
By “caring about truth” here do you mean wanting systems to make explicit utterances that accurately reflect their actual motives? E.g., if X is a chess-playing AI that doesn’t talk about what it wants at all, just plays chess, would a person who “cares about truth” would also be motivated to give X the ability and inclination to talk about its goals (and do so accurately)?
Or wanting systems not to make explicit utterances that inaccurately reflect their actual motives? E.g., a person who “cares about truth” might also be motivated to remove muflax’s AI’s ability to report on its goals at all? (This would also prevent it from winning Diplomacy games, but we’ve already stipulated that isn’t a showstopper.)
I intended both (i.e. that they wanted accurate statements to be uttered and no inaccurate statements) but the distinction isn’t important to my argument, which was just that they want what they want.