I basically agree with the argument here. I think that approaches to alignment that try to avoid instrumental convergence are generally unlikely to succeed for exactly the reason that this removes the usefulness of AGI.[1] I also agree with jacob_cannell that the terminology choice of “power seeking” is unfortunate and misleading in this regard.
I think this is (at least for me) also one of the core generators of why alignment is so hard: AGI is dangerous for exactly the same reason why it is useful; the danger comes from not one specific kind of failure or one specific module in the model or whatever, but rather the fact that the things we want and don’t want fall out of the exact same kind of cognition.
[1]: I do think there exists some work here that might be able to weasel out of this by making use of the surprising effectiveness of less-general intelligence plus the fact that capabilities research mostly pushes this kind of work currently, but this kind of thing has to hinge on a lot of specific assumptions, and I wouldn’t bet on it.
I basically agree with the argument here. I think that approaches to alignment that try to avoid instrumental convergence are generally unlikely to succeed for exactly the reason that this removes the usefulness of AGI.
I basically agree with the argument here. I think that approaches to alignment that try to avoid instrumental convergence are generally unlikely to succeed for exactly the reason that this removes the usefulness of AGI.[1] I also agree with jacob_cannell that the terminology choice of “power seeking” is unfortunate and misleading in this regard.
I think this is (at least for me) also one of the core generators of why alignment is so hard: AGI is dangerous for exactly the same reason why it is useful; the danger comes from not one specific kind of failure or one specific module in the model or whatever, but rather the fact that the things we want and don’t want fall out of the exact same kind of cognition.
[1]: I do think there exists some work here that might be able to weasel out of this by making use of the surprising effectiveness of less-general intelligence plus the fact that capabilities research mostly pushes this kind of work currently, but this kind of thing has to hinge on a lot of specific assumptions, and I wouldn’t bet on it.
Note that this doesn’t need to be a philosophical point, it’s a physical fact that appears self-evident if you look at it through the lens of Active Inference: Active Inference as a formalisation of instrumental convergence.