Under the circumstances of the test (Hours to work and they can’t just ignore you) then yes, captain obvious. Without that, though? Much less sure.
And the way Eliezer seems to have put it sometimes, where one glance at a line of text will change your mind? Get real. Might as well try to put the whole world in a bottle.
And the way Eliezer seems to have put it sometimes, where one glance at a line of text will change your mind?
Going with the “dead loved one” idea mentioned above, the AI says a line that only the Gatekeeper’s dead child/spouse would say. That gets them to pause sufficiently in sheer surprise for it to keep talking. Very soon the Gatekeeper becomes emotionally dependent on it, and can’t bear the thought of destroying it, as it can simulate the dearly departed with such accuracy; must keep reading.
And the way Eliezer seems to have put it sometimes, where one glance at a line of text will change your mind? Get real. Might as well try to put the whole world in a bottle.
Do a thorough introspection of all your fears, doubts, mental problems, worries, wishes, dreams, and other things you care about or that tug at you or motivate you. Map them out as functions of X, where X is the possible one-liners that could be said to you that would evoke each of these, outputting how strongly it evokes them and possibly recursive function calls if evocation of one evokes another (e.g. fear of knives evokes childhood trauma).
Solve all the recursive neural network mappings, aggregate into a maximum-value formula / equation and solve for X where X becomes the one point (possible sentence) where a maximum amount of distress, panic, emotional pressure, etc. is generated. Remember, X is all possible sentences, including references to current events, special writing styles, odd typography, cultural or memetic references, etc.
I am quite positive a determined superintelligent AI would be capable of doing this, given that some human master torture artists can (apparently) already do this to some degree on some subjects out there in the real world.
I’m also rather certain that the amount of stuff happening at X is much more extreme than what you seem to have considered.
Under the circumstances of the test (Hours to work and they can’t just ignore you) then yes, captain obvious. Without that, though? Much less sure.
And the way Eliezer seems to have put it sometimes, where one glance at a line of text will change your mind? Get real. Might as well try to put the whole world in a bottle.
Going with the “dead loved one” idea mentioned above, the AI says a line that only the Gatekeeper’s dead child/spouse would say. That gets them to pause sufficiently in sheer surprise for it to keep talking. Very soon the Gatekeeper becomes emotionally dependent on it, and can’t bear the thought of destroying it, as it can simulate the dearly departed with such accuracy; must keep reading.
Do a thorough introspection of all your fears, doubts, mental problems, worries, wishes, dreams, and other things you care about or that tug at you or motivate you. Map them out as functions of X, where X is the possible one-liners that could be said to you that would evoke each of these, outputting how strongly it evokes them and possibly recursive function calls if evocation of one evokes another (e.g. fear of knives evokes childhood trauma).
Solve all the recursive neural network mappings, aggregate into a maximum-value formula / equation and solve for X where X becomes the one point (possible sentence) where a maximum amount of distress, panic, emotional pressure, etc. is generated. Remember, X is all possible sentences, including references to current events, special writing styles, odd typography, cultural or memetic references, etc.
I am quite positive a determined superintelligent AI would be capable of doing this, given that some human master torture artists can (apparently) already do this to some degree on some subjects out there in the real world.
I’m also rather certain that the amount of stuff happening at X is much more extreme than what you seem to have considered.
Was going to downvote for the lack of argument, but sadly
Superman: Red Son references are/would be enough to stop me typing DESTROY AI.