Anything that happens without a causal line of descent from human values is unlikely to align with human values.
Unlikely to align how exactly? There’s also the common causes, you know; A and B can be correlated when A causes B, when B causes A, or when C causes A and B.
It seems to me that you can require arbitrary degree of alignment to arrive at arbitrary unlikehood, but some alignment via common cause is nonetheless probable.
There’s such thing as over-fitting… if you have some noisy data, the theory that fits the data ideally is just the table of the data (e.g. heights and falling times); the useful theory doesn’t fit data exactly in practice. If we make the AI perfectly fit to what mankind does, we could just as well make a brick and proclaim it an omnipotent omniscient mankind-friendly AI that will never stop the mankind from doing something that mankind wants (including taking the extinction risks).
Unlikely to align how exactly? There’s also the common causes, you know; A and B can be correlated when A causes B, when B causes A, or when C causes A and B.
It seems to me that you can require arbitrary degree of alignment to arrive at arbitrary unlikehood, but some alignment via common cause is nonetheless probable.
Well yes, but I would assume you would want more alignment, not less.
There’s such thing as over-fitting… if you have some noisy data, the theory that fits the data ideally is just the table of the data (e.g. heights and falling times); the useful theory doesn’t fit data exactly in practice. If we make the AI perfectly fit to what mankind does, we could just as well make a brick and proclaim it an omnipotent omniscient mankind-friendly AI that will never stop the mankind from doing something that mankind wants (including taking the extinction risks).