Given that we’re scared about non-FAI, I wonder if this cartesianism can’t be a benefit, as it presumably substantially limits the power of the AI. Boxing an AI should be easier if the AI cannot conceive that the box would be a problem for it.
I would be interested in hearing people argue in both directions.
Adele suggested this above. You can see my and Eliezer’s response there. The basic worry is that Cartesians have no way to FOOM, because they’re unlikely to form intelligent hypotheses about self-modifications. So a real Cartesian won’t be an AGI, or will only barely be an AGI. Our work should go into something more useful than that, since it’s possible that in the time it takes us to build a moderately useful Cartesian AI that doesn’t immediately destroy itself, we could have invented FAI or proto-FAI.
Non-FAI isn’t what we’re acutely scared of; UFAI (i.e., superintelligence without human values) is. Failing to build a superintelligence is not the same thing as preventing others from building a dangerous superintelligence. So self-handicapping isn’t generically useful, especially when most AI researchers won’t handicap themselves in the same way.
Sure, for awhile, until it gets smart enough, say, smarter than whatever keeps it inside the box.
Then how do I know I’m not boxed?
Who says you aren’t? Who says we all aren’t? All those quantum limits and exponentially harder ways to get farther away from Earth might be the walls of the box in someone’s Truman show.
An AI that isn’t smart enough to notice (or care) that it’s boxed doesn’t seem to be a dangerous AI.
Which makes me think that AIs that would object to being boxed are precisely the ones that should be. But then that would make a smart AI pretend to be OK with it.
This reminds me of the Catch-22 case of soldiers who pretended to be insane by volunteering for suicide missions so that their superiors would remove them from said missions.
Given that we’re scared about non-FAI, I wonder if this cartesianism can’t be a benefit, as it presumably substantially limits the power of the AI. Boxing an AI should be easier if the AI cannot conceive that the box would be a problem for it.
I would be interested in hearing people argue in both directions.
Adele suggested this above. You can see my and Eliezer’s response there. The basic worry is that Cartesians have no way to FOOM, because they’re unlikely to form intelligent hypotheses about self-modifications. So a real Cartesian won’t be an AGI, or will only barely be an AGI. Our work should go into something more useful than that, since it’s possible that in the time it takes us to build a moderately useful Cartesian AI that doesn’t immediately destroy itself, we could have invented FAI or proto-FAI.
Non-FAI isn’t what we’re acutely scared of; UFAI (i.e., superintelligence without human values) is. Failing to build a superintelligence is not the same thing as preventing others from building a dangerous superintelligence. So self-handicapping isn’t generically useful, especially when most AI researchers won’t handicap themselves in the same way.
It probably is a benefit, up until the AI is smart enough to smash the box or itself accidentally.
Can an AI live and not notice it’s boxed?
Then how do I know I’m not boxed?
Sure, for awhile, until it gets smart enough, say, smarter than whatever keeps it inside the box.
Who says you aren’t? Who says we all aren’t? All those quantum limits and exponentially harder ways to get farther away from Earth might be the walls of the box in someone’s Truman show.
An AI that isn’t smart enough to notice (or care) that it’s boxed doesn’t seem to be a dangerous AI.
Which makes me think that AIs that would object to being boxed are precisely the ones that should be. But then that would make a smart AI pretend to be OK with it.
This reminds me of the Catch-22 case of soldiers who pretended to be insane by volunteering for suicide missions so that their superiors would remove them from said missions.