Unless the evaluation mechanism is extremely biased, it seems unlikely it would give biased answers for these questions.
But there’s now a question of “what is the AI trying to do?” If the truth-evaluation method is politically biased (even if not “extremely”), then it’s very likely no longer “trying to tell the truth”. I can imagine two other possibilities:
It might be “trying to advance a certain political agenda”. In this case I can imagine that it will selectively and unpredictably manipulate answers to especially important questions. For example it might insert backdoors into infrastructure-like software when users ask it coding questions, then tell other users how to take advantage of those backdoors to take power, or damage some important person or group’s reputation by subtly manipulating many answers that might influence how others view that person/group, or push people’s moral views in a certain direction by subtly manipulating many answers, etc.
It might be “trying to tell the truth using a very strange prior or reasoning process”, which also seems likely to have unpredictable and dangerous consequences down the line, but harder for me to imagine specific examples as I have little idea what the prior or reasoning process will be.
Do you have another answer to “what is the AI trying to do?”, or see other reasons to be less concerned about this than I am?
But there’s now a question of “what is the AI trying to do?” If the truth-evaluation method is politically biased (even if not “extremely”), then it’s very likely no longer “trying to tell the truth”. I can imagine two other possibilities:
It might be “trying to advance a certain political agenda”. In this case I can imagine that it will selectively and unpredictably manipulate answers to especially important questions. For example it might insert backdoors into infrastructure-like software when users ask it coding questions, then tell other users how to take advantage of those backdoors to take power, or damage some important person or group’s reputation by subtly manipulating many answers that might influence how others view that person/group, or push people’s moral views in a certain direction by subtly manipulating many answers, etc.
It might be “trying to tell the truth using a very strange prior or reasoning process”, which also seems likely to have unpredictable and dangerous consequences down the line, but harder for me to imagine specific examples as I have little idea what the prior or reasoning process will be.
Do you have another answer to “what is the AI trying to do?”, or see other reasons to be less concerned about this than I am?