So if the argument the OT proponents are making is that AI will not self-improve out of fear of jeopardising its commitment to its original goal, then the entire OT is moot, because AI will never risk self-improving at all.
This seems to me to apply only to self improvement that modifies the outcome of decision-making irrespective of time. How does this account for self improvement that only serves to make decision making more efficient?
If I have some highly inefficient code that finds the sum of two integers by first breaking them up into 10000 smaller decimal values, randomly orders them and then adds them up in serial, and I rewrite the code to do the same thing but in way less ops, I have self improved without jeopardizing my goal.
This kind of self improvement can still be fatal in the context of deceptively aligned systems.
But that’s not general intelligence; general intelligence requires considering a wider range of problems holistically, and drawing connections among them.
This seems to me to apply only to self improvement that modifies the outcome of decision-making irrespective of time. How does this account for self improvement that only serves to make decision making more efficient?
If I have some highly inefficient code that finds the sum of two integers by first breaking them up into 10000 smaller decimal values, randomly orders them and then adds them up in serial, and I rewrite the code to do the same thing but in way less ops, I have self improved without jeopardizing my goal.
This kind of self improvement can still be fatal in the context of deceptively aligned systems.
But that’s not general intelligence; general intelligence requires considering a wider range of problems holistically, and drawing connections among them.