I think instrumental convergence does provide meaningful evidence of doom, and you can make a valid counting argument for it, but as with deceptive alignment you have to run the counting argument over algorithms not over functions.
It’s not clear to me what an “algorithm” is supposed to be here, and I suspect that this might be cruxy. In particular I suspect (40-50% confidence) that:
You think there are objective and determinate facts about what “algorithm” a neural net is implementing, where
Algorithms are supposed to be something like a Boolean circuit or a Turing machine rather than a neural network, and
We can run counting arguments over these objective algorithms, which are distinct both from the neural net itself and the function it expresses.
I reject all three of these premises, but I would consider it progress if I got confirmation that you in fact believe in them.
I think instrumental convergence does provide meaningful evidence of doom, and you can make a valid counting argument for it, but as with deceptive alignment you have to run the counting argument over algorithms not over functions.
It’s not clear to me what an “algorithm” is supposed to be here, and I suspect that this might be cruxy. In particular I suspect (40-50% confidence) that:
You think there are objective and determinate facts about what “algorithm” a neural net is implementing, where
Algorithms are supposed to be something like a Boolean circuit or a Turing machine rather than a neural network, and
We can run counting arguments over these objective algorithms, which are distinct both from the neural net itself and the function it expresses.
I reject all three of these premises, but I would consider it progress if I got confirmation that you in fact believe in them.