I think this is good to get people initially on board, but I worry that people will start to falsely think that tasks unrelated to writing code are safe.
Honest question: what’s the easiest x-risk scenario that doesn’t involve generating code at some point? I’m struggling to think of any that aren’t pretty convoluted.
(I agree with the point, but think it’s easier to make once our foot is in the door.)
IMO the point of no return will be passed before recursive self improvement probably. All we need is a sufficiently charismatic chatbot to start getting strategic about what it says to people.
I don’t especially disagree that it’s most likely the AI to end the world will be one that writes code? But if you keep throwing optimization power into a reasonably general AI that has no direct code experience it’ll still end the world eventually.
If the AI isn’t a code-writing one I don’t have any particular next guess.
Somewhere in the late-2021 MIRI conversations Eliezer opines that non-recursively-self-improving AI are definitely dangerous. I can search for it if anyone is interested.
Compared to the position I was arguing in the Foom Debate with Robin, reality has proved way to the further Eliezer side of Eliezer along the Eliezer-Robin spectrum. It’s been very unpleasantly surprising to me how little architectural complexity is required to start producing generalizing systems, and how fast those systems scale using More Compute. The flip side of this is that I can imagine a system being scaled up to interesting human+ levels, without “recursive self-improvement” or other of the old tricks that I thought would be necessary, and argued to Robin would make fast capability gain possible. You could have fast capability gain well before anything like a FOOM started. Which in turn makes it more plausible to me that we could hang out at interesting not-superintelligent levels of AGI capability for a while before a FOOM started. It’s not clear that this helps anything, but it does seem more plausible.
It later turned out that capabilities started scaling a whole lot without self-improvement, which is an example of the kind of weird surprise the Future throws at you . . .
Depends what you mean by generate code. Can it have a prebaked function that copies itself (like computer viruses)? Does it count if it generates programs to attack other systems? If it changes its own source code? Its code stored in memory? You could argue that changing anything in memory is in a certain sense generating code.
If it can’t generate code, it’ll be a 1 shot type of thing. Which means that it must be preprogrammed with the tools to do its job. I can’t come up with any way for it to take control, but it doesn’t seem that hard to come up with some doomsday machine scenarios. E.g. smashing a comet into earth, or making a virus that sterilizes everyone. Or a Shiri’s Scissor could do the trick. The idea being to make something that doesn’t have to learn or improve itself too much.
I was thinking of “something that can understand and write code at the level of a 10x SWE”. I’m further assuming that human designers didn’t give it functions to copy itself or other dumb things.
I think this is good to get people initially on board, but I worry that people will start to falsely think that tasks unrelated to writing code are safe.
Honest question: what’s the easiest x-risk scenario that doesn’t involve generating code at some point? I’m struggling to think of any that aren’t pretty convoluted.
(I agree with the point, but think it’s easier to make once our foot is in the door.)
IMO the point of no return will be passed before recursive self improvement probably. All we need is a sufficiently charismatic chatbot to start getting strategic about what it says to people.
I don’t especially disagree that it’s most likely the AI to end the world will be one that writes code? But if you keep throwing optimization power into a reasonably general AI that has no direct code experience it’ll still end the world eventually.
If the AI isn’t a code-writing one I don’t have any particular next guess.
Somewhere in the late-2021 MIRI conversations Eliezer opines that non-recursively-self-improving AI are definitely dangerous. I can search for it if anyone is interested.
Yes please!
From Discussion with Eliezer Yudkowsky on AGI interventions:
From Ngo and Yudkowsky on alignment difficulty:
And yeah I realize now that my summary of what Eliezer wrote is not particularly close to what he actually wrote.
Depends what you mean by generate code. Can it have a prebaked function that copies itself (like computer viruses)? Does it count if it generates programs to attack other systems? If it changes its own source code? Its code stored in memory? You could argue that changing anything in memory is in a certain sense generating code.
If it can’t generate code, it’ll be a 1 shot type of thing. Which means that it must be preprogrammed with the tools to do its job. I can’t come up with any way for it to take control, but it doesn’t seem that hard to come up with some doomsday machine scenarios. E.g. smashing a comet into earth, or making a virus that sterilizes everyone. Or a Shiri’s Scissor could do the trick. The idea being to make something that doesn’t have to learn or improve itself too much.
I was thinking of “something that can understand and write code at the level of a 10x SWE”. I’m further assuming that human designers didn’t give it functions to copy itself or other dumb things.