Alejandro Tlaie comments on On predictability, chaos and AIs that don’t game our goals

Alejandro Tlaie 17 Jul 2024 15:05 UTC
1 point
0

Claim: the degree to which the future is hard to predict has no bearing on the outer alignment problem.

With outer alignment I was referring to: “providing well-specified rewards” (https://arxiv.org/abs/2209.00626). Following this definition, I still think that if one is unable to disentangle what’s relevant to predict the future, one cannot carefully tailor a reward function that teaches an agent how to predict the future. Thus, it cannot be consequentialist, or at least it will have to deal with a large amount of uncertainty when forecasting in timescales that are longer than the predictable horizon. I think this reasoning is based in the basic premise that you mentioned (“one can construct a desirability tree over various possible various future states.”).

All we do with consequentialism is evaluate a particular terminal state. The complexity of how we got there doesn’t matter.

Oh, but it does matter! If your desirability tree consists of weak branches (i.e., wrong predictions), what’s it good for?

we can’t blame this on outer alignment, can we? This would be better described as goal misspecification.

I believe it may have been a mistake on my side, I have assumed that the definition I was using for outer alignment was standard/the default! I think this would match goal misspecification, yes! (And my working definition, as stated above).

If one subscribes to deontological ethics, then the problem becomes even easier. Why? One wouldn’t have to reason probabilistically over various future states at all. The goodness of an action only has to do with the nature of the action itself.

Completely agreed!

On a related note, you may find this interesting: https://arxiv.org/abs/1607.00913