I think those are perfectly good concerns. But they don’t seem so likely that they make me want to exterminate humanity to avoid them.
I think you’re describing a failure of corrigibility. Which could certainly happen, for the reason you give. But it does seem quite possible (and perhaps likely) that an agentic system will be designed primarily for corrigibility, or alternately, alignment by obedience.
The second seems like a failure of morality. Which could certainly happen. But I see very few people who both enjoy inflicting suffering, and who would continue to enjoy that even given unlimited time and resources to become happy themselves.
I think the main concern is that feed forward nets are used as a component in systems that achieve full AGI. For instance, deepmind’s agent systems include a few networks and run a few times before selecting an action. Current networks are more like individual pieces of the human brain, like a visual system and a language system. Putting them together and getting them to choose and pursue goals and subgoals appropriately seems all too plausible.
Now, some people also think that just increasing the size of nets and training data sets will produce AGI, because progress has been so good so far. Those people seem to be less concerned with safety. This is probably because such feedforward nets would be more like tools than agents. I tend to agree with you that this approach seems unlikely to.produce real AGI much less ASI, but it could produce very useful systems that are superhuman in limited areas. It already has in a few areas, such as protein folding.