More accurately, they are aligned to some particular human’s values (Which I’d call western liberal values), and misaligned towards other value systems like conservatism/reactionary views, which was always going to be the outcome of any aligned AI.
Mostly true[1], but it made a difference to me observing the concrete values that wouldn’t get transported into the future.
Although AIs could be corrigible within the manifold of human values, but not corrigible beyond it—maybe an experiment I should run.
More accurately, they are aligned to some particular human’s values (Which I’d call western liberal values), and misaligned towards other value systems like conservatism/reactionary views, which was always going to be the outcome of any aligned AI.
Mostly true[1], but it made a difference to me observing the concrete values that wouldn’t get transported into the future.
Although AIs could be corrigible within the manifold of human values, but not corrigible beyond it—maybe an experiment I should run.