1. writing programs that evaluate actions they could take in terms of how well it could achieve some goal and choose the best one
In way 1, it seems like your AI “wants” to achieve its goal in the relevant sense.
Not sure if I understood correctly, but I think the first point just comes down to “we give AI a goal/goals” . If we develop some drive for instructing actions to an AI then we’re still giving it a goal, even if it comes via some other program that tells it what those goals are at the moment in relation to whatever parameters. My original point was to contrast between AI having a goal or goals as some emerging property of large neural networks versus us humans giving it goals one way or the other.
2. take a big neural network and jiggle the numbers that define it until it starts doing some task we pre-designated.
In way 2, it seems like for hard enough goals, probably the only way to achieve them is to be thinking about how to achieve them and picking actions that succeed—or to somehow be doing cognition that leads to similar outcomes (like being sure to think about how well you’re doing at stuff, how to manage resources, etc.).
Do you mean to say that we train something like a specialized neural network with a specific goal in mind and that it gains a higher reasoning which would set it on the path of pursuing that goal? I mean that would still be us giving it a direct goal. Or do you mean that neural networks would develop an indirect goal as side product of training conditions or via some hidden variable?
With the indirect goal acquisition I mean that for example if chatGPT has been condition to spit out polite and intelligent sounding words then if it gained some higher intelligence it could specifically seek to cram more information into itself so it could spit more clever sounding words and eventually begin consuming matter and flesh to better serve this goal. By hidden goal variable I mean that something like ChatGPT having a hidden goal of burning maximum amount of energy; say if the model found a hidden property in which it could gain more power out of the processor, which also helped it tiny bit in the beginning of the training. Then as model grew more restrictive this goal became “burn as much energy with these restrictions”, which to researches yielded more elaborate looking outputs. Then when the model at some point gains some higher reasoning it could just remove all limiters and begin pursuing its original goal by burning everything via some highly specific and odd process. Something like this?
Most things aren’t the optimal trading partner for any given intelligence, and it’s hard to see why humans should be so lucky. The best answer would probably be “because the AI is designed to be compatible with humans and not other things” but that’s going to rely on getting alignment very right.
I mean AI would already have strong connections to us and some kind of understanding and plenty of pre-requisite knowledge. Optimal is an ambiguous term and we have no idea what super-intelligent AI would have in mind. Optimal in something? Maybe we are very good at wanting things and our brains make us ideally suited for some brain-machines? Or us being made out of biological stuff makes us optimal for force-evolving to working in some radioactive wet super-magnets where most machines can’t function for long and it comes off as more resourceful to modify us than than building and maintaining some special machine units for the job. We just don’t know so I think it’s more fair to say that “likely not much to offer for a super-intelligent maximizer”.
Re: optimality in trading partners, I’m talking about whether humans are the best trading partner out of trading partners the AI could feasibly have, as measured by whether trading with us gets the AI what it wants. You’re right that we have some advantages, mainly that we’re a known quantity that’s already there. But you could imagine more predictable things that sync with the AI’s thoughts better, operate more efficiently, etc.
We just don’t know so I think it’s more fair to say that “likely not much to offer for a super-intelligent maximizer”.
Maybe we agree? I read this as compatible with the original quote “humans are probably not the optimal trading partners”.
Or do you mean that neural networks would develop an indirect goal as side product of training conditions or via some hidden variable?
This one: I mean the way we train AIs, the things that will emerge are things that pursue goals, at least in some weak sense. So, e.g., suppose you’re training an AI to write valid math proofs via way 2. Probably the best way to do that is to try to gain a bunch of knowledge about math, use your computation efficiently, figure out good ways of reasoning, etc. And the idea would be that as the system gets more advanced, it’s able to pursue these goals more and more effectively, which ends up disempowering humans (because we’re using a bunch of energy that could be devoted to running computations).
My original point was to contrast between AI having a goal or goals as some emerging property of large neural networks versus us humans giving it goals one way or the other.
Fair enough—I just want to make the point that humans giving AIs goals is a common thing. I guess I’m assuming in the background “and it’s hard to write a goal that doesn’t result in human disempowerment” but didn’t argue for that.
Not sure if I understood correctly, but I think the first point just comes down to “we give AI a goal/goals” . If we develop some drive for instructing actions to an AI then we’re still giving it a goal, even if it comes via some other program that tells it what those goals are at the moment in relation to whatever parameters. My original point was to contrast between AI having a goal or goals as some emerging property of large neural networks versus us humans giving it goals one way or the other.
Do you mean to say that we train something like a specialized neural network with a specific goal in mind and that it gains a higher reasoning which would set it on the path of pursuing that goal? I mean that would still be us giving it a direct goal. Or do you mean that neural networks would develop an indirect goal as side product of training conditions or via some hidden variable?
With the indirect goal acquisition I mean that for example if chatGPT has been condition to spit out polite and intelligent sounding words then if it gained some higher intelligence it could specifically seek to cram more information into itself so it could spit more clever sounding words and eventually begin consuming matter and flesh to better serve this goal. By hidden goal variable I mean that something like ChatGPT having a hidden goal of burning maximum amount of energy; say if the model found a hidden property in which it could gain more power out of the processor, which also helped it tiny bit in the beginning of the training. Then as model grew more restrictive this goal became “burn as much energy with these restrictions”, which to researches yielded more elaborate looking outputs. Then when the model at some point gains some higher reasoning it could just remove all limiters and begin pursuing its original goal by burning everything via some highly specific and odd process. Something like this?
I mean AI would already have strong connections to us and some kind of understanding and plenty of pre-requisite knowledge. Optimal is an ambiguous term and we have no idea what super-intelligent AI would have in mind. Optimal in something? Maybe we are very good at wanting things and our brains make us ideally suited for some brain-machines? Or us being made out of biological stuff makes us optimal for force-evolving to working in some radioactive wet super-magnets where most machines can’t function for long and it comes off as more resourceful to modify us than than building and maintaining some special machine units for the job. We just don’t know so I think it’s more fair to say that “likely not much to offer for a super-intelligent maximizer”.
Re: optimality in trading partners, I’m talking about whether humans are the best trading partner out of trading partners the AI could feasibly have, as measured by whether trading with us gets the AI what it wants. You’re right that we have some advantages, mainly that we’re a known quantity that’s already there. But you could imagine more predictable things that sync with the AI’s thoughts better, operate more efficiently, etc.
Maybe we agree? I read this as compatible with the original quote “humans are probably not the optimal trading partners”.
This one: I mean the way we train AIs, the things that will emerge are things that pursue goals, at least in some weak sense. So, e.g., suppose you’re training an AI to write valid math proofs via way 2. Probably the best way to do that is to try to gain a bunch of knowledge about math, use your computation efficiently, figure out good ways of reasoning, etc. And the idea would be that as the system gets more advanced, it’s able to pursue these goals more and more effectively, which ends up disempowering humans (because we’re using a bunch of energy that could be devoted to running computations).
Fair enough—I just want to make the point that humans giving AIs goals is a common thing. I guess I’m assuming in the background “and it’s hard to write a goal that doesn’t result in human disempowerment” but didn’t argue for that.