preferences are complicated, and most minds don’t have super coherent and simple preferences. I don’t there is any really plausible model of AI psychology for which you can be pretty confident they won’t care a bit
As the AI becomes more coherent, it has more fixed values. When values are fixed and the AI is very superintelligent, the preferences will be very strongly satisfied. “Caring a tiny bit about something about humans” seems not very unlikely. But even if “something about humans” can correlate strongly with “keep humans alive and well” for low intelligence, it would come apart at very high intelligence. However the AI chooses its values, why would they be pointed at something that keeps correlating with what we care about, even at superintelligent levels of optimization?
As the AI becomes more coherent, it has more fixed values. When values are fixed and the AI is very superintelligent, the preferences will be very strongly satisfied. “Caring a tiny bit about something about humans” seems not very unlikely. But even if “something about humans” can correlate strongly with “keep humans alive and well” for low intelligence, it would come apart at very high intelligence. However the AI chooses its values, why would they be pointed at something that keeps correlating with what we care about, even at superintelligent levels of optimization?