“For example, I expect that AGIs will be able to self-modify in ways that are difficult for humans (e.g. there’s no magic-bullet super-Adderall for humans), which impacts the likelihood of your (1a).”
My (1a) (and related (1b)), for reference:
(1a) “You” (the decision-maker process we are modeling) can choose anything you like, without risk of losing control of your hardware. (Contrast case: if the ruler of a country chooses unpopular policies, they are sometimes ousted. If a human chooses dieting/unrewarding problems/social risk, they sometimes lose control of themselves.)
1b) There are no costs to maintaining control of your mind/hardware. (Contrast case: if a company hires some brilliant young scientists to be creative on its behalf, it often has to pay a steep overhead if it additionally wants to make sure those scientists don’t disrupt its goals/beliefs/normal functioning.)
I’m happy to posit an AGI with powerful ability to self-modify. But, even so, my (nonconfident) guess is that it won’t have property (1a), at least not costlessly.
My admittedly handwavy reasoning:
Self-modification doesn’t get you all powers: some depend on the nature of physics/mathematics. E.g. it may still be that verifying a proof is easier than generating a proof, for our AGI.
Intelligence involves discovering new things, coming into contact with what we don’t specifically expect (that’s why we bother to spend compute on it). Let’s assume our powerful AGI is still coming into contact with novel-to-it mathematics/empirics/neat stuff. Questions are: is it (possible at all / possible at costs worth paying) to anticipate enough about what it will uncover that it can prevent the new things from destablilizing its centralized goals/plans/[“utility function” if it has one]? I… am really not sure what the answers to these questions are, even for powerful AGI that has powerfully self-modified! There are maybe alien-to-it AGIs out there encoded in mathematics, waiting to boot up within it as it does its reasoning.
Steven Brynes wrotes:
My (1a) (and related (1b)), for reference:
I’m happy to posit an AGI with powerful ability to self-modify. But, even so, my (nonconfident) guess is that it won’t have property (1a), at least not costlessly.
My admittedly handwavy reasoning:
Self-modification doesn’t get you all powers: some depend on the nature of physics/mathematics. E.g. it may still be that verifying a proof is easier than generating a proof, for our AGI.
Intelligence involves discovering new things, coming into contact with what we don’t specifically expect (that’s why we bother to spend compute on it). Let’s assume our powerful AGI is still coming into contact with novel-to-it mathematics/empirics/neat stuff. Questions are: is it (possible at all / possible at costs worth paying) to anticipate enough about what it will uncover that it can prevent the new things from destablilizing its centralized goals/plans/[“utility function” if it has one]? I… am really not sure what the answers to these questions are, even for powerful AGI that has powerfully self-modified! There are maybe alien-to-it AGIs out there encoded in mathematics, waiting to boot up within it as it does its reasoning.