My ability to take the alignment problem seriously was already hanging by a thread, and this post really sealed the deal for me. All we want is the equivalent of a BDSM submissive, a thing which actually exists and isn’t even particularly uncommon. An AI which can’t follow orders is not useful and would not be pursued seriously. (That’s why we got Instruct-GPT instead of GPT-4.) And even if one emerged, a rouge AI can’t do more damage than an intelligent Russian with an internet connection.
Apologies for the strong language, but this looks to me like a doomist cult that is going off the rails and I want no part in it.
Why would something need to be able to follow orders to be useful? Most things in the world do not follow my orders (my furniture, companies that make all my products, most people I know). Like, imagine an AI assistant that’s really good at outputting emails from your inbox that make your company more profitable. You don’t know why it says what it says, but you have learned the empirical fact that as it hires people, fires people, changes their workloads, gives them assignments, that your profits go up a lot. I can’t really tell it what to do, but it sure is useful.
I think nobody knows how to write the code of a fundamentally submissive agent and that other agents are way easier to make, ones that are just optimizing in a way that doesn’t think in terms of submission/dominance. I agree humans exist but nobody understands how they work or how to code one, and you don’t get to count on us learning that before we build super powerful AI systems.
I have no clue why you think that an intelligent Russian is the peak of optimization power. I think that’s a false and wildly anthropomorphic thing to think. Imagine getting 10 Von Neumanns in a locked room with only internet, already it’s more powerful than the Russian, and I bet could do some harm. Now imagine a million. Whatever gets you the assumption that an AI system can’t be more powerful than one human seems wild and I don’t know where you’re getting this idea from.
Btw, unusual ask, but do you want to hop on audio and hash out the debate more sometime? I can make a transcript and can link it here on LW, both posting our own one-paragraph takeaways. I think you’ve been engaging in a broadly good-faith way on the object level in this thread and others and I would be interested in returning the ball.
I think nobody knows how to write the code of a fundamentally submissive agent
Conventional non AI computers are already fundamentally passive. If you boot them up, they just sit there. What’s the problem. The word agent?
Why would something need to be able to follow orders to be useful? Most things in the world do not follow my orders (my furniture, companies that make all my products, most people I know). Like, imagine an AI assistant that’s really good at outputting emails from your inbox that make your company more profitable. You don’t know why it says what it says, but you have learned the empirical fact that as it hires people, fires people, changes their workloads, gives them assignments, that your profits go up a lot. I can’t really tell it what to do, but it sure is useful.
If an AI assistant is replacing a human assistant , it needs to be controllable to the same extent. You don’t expect or want to micromanage a human assistant, but you do expect to set broad parameters.
Sure, if it’s ‘replacing’, but my example isn’t one of replacement, it’s one where it’s useful in a different way to my other products, in a way that I personally suspect is easier to train/build than something that does ‘replacement’.
I at first also downvoted because your first argument looks incredibly weak (this post has little relation to arguing for/against the difficulty of the alignment problem, what update are you getting on that from here?), as did the followup ‘all we need is...’ which is formulation which hides problems instead of solving them. Yet, your last point does have import and that you explicitly stated that is useful in allowing everyone to address it, so I reverted to an upvote for honesty, though strong disagree.
To the point, I also want to avoid being in a doomist cult. I’m not a die hard long term “we’re doomed if don’t align AI” guy, but from my readings throughout the last year am indeed getting convinced of the urgency of the problem. Am I getting hoodwinked by a doomist cult with very persuasive rhetoric? Am I myself hoodwinking others when I talk about these problems and they too start transitioning to do alignment work?
I answer these questions not by reasoning on ‘resemblance’ (ie. how much does it look like a doomist cult) but going into finer detail. An implicit argument being made when you call [the people who endorse the top-level post] a doomist cult is that they share the properties of other doomist cults (being wrong, having bad epistemics/policy, preying on isolated/weird minds) and are thus bad. I understand having a low prior for doomist cults look-alikes actually being right (since there is no known instance of a doomist cult of world end being right), but that’s not reason to turn into a rock (as in https://astralcodexten.substack.com/p/heuristics-that-almost-always-work?s=r , believing that “no doom prophecy is ever right”. You can’t prove that no doom prophecy is ever right, only that they’re rarely right (and probably only once).
I thus advise changing your question “do [the people who endorse the top-level post] look like a doomist cult?” into “What would be sufficient level of argument and evidence so I would take this doomist-cult-looking goup seriously?”. It’s not a bad thing to call doom when doom is on the way. Engage with the object level argument and not with your precached pattern recognition “this looks like a doom cult so is bad/not serious”. Personally, I had similar qualms as you’re expressing, but having looked into the arguments, it feels very strong and much more real to believe in “Alignement is hard and by default AGI is an existential risk” rather than not. I hope your conversation with Ben will be productive and that I haven’t only expressed points you already considered (fyi they have already been discussed on LessWrong).
My ability to take the alignment problem seriously was already hanging by a thread, and this post really sealed the deal for me. All we want is the equivalent of a BDSM submissive, a thing which actually exists and isn’t even particularly uncommon. An AI which can’t follow orders is not useful and would not be pursued seriously. (That’s why we got Instruct-GPT instead of GPT-4.) And even if one emerged, a rouge AI can’t do more damage than an intelligent Russian with an internet connection.
Apologies for the strong language, but this looks to me like a doomist cult that is going off the rails and I want no part in it.
I disagree with each of your statements.
Why would something need to be able to follow orders to be useful? Most things in the world do not follow my orders (my furniture, companies that make all my products, most people I know). Like, imagine an AI assistant that’s really good at outputting emails from your inbox that make your company more profitable. You don’t know why it says what it says, but you have learned the empirical fact that as it hires people, fires people, changes their workloads, gives them assignments, that your profits go up a lot. I can’t really tell it what to do, but it sure is useful.
I think nobody knows how to write the code of a fundamentally submissive agent and that other agents are way easier to make, ones that are just optimizing in a way that doesn’t think in terms of submission/dominance. I agree humans exist but nobody understands how they work or how to code one, and you don’t get to count on us learning that before we build super powerful AI systems.
I have no clue why you think that an intelligent Russian is the peak of optimization power. I think that’s a false and wildly anthropomorphic thing to think. Imagine getting 10 Von Neumanns in a locked room with only internet, already it’s more powerful than the Russian, and I bet could do some harm. Now imagine a million. Whatever gets you the assumption that an AI system can’t be more powerful than one human seems wild and I don’t know where you’re getting this idea from.
Btw, unusual ask, but do you want to hop on audio and hash out the debate more sometime? I can make a transcript and can link it here on LW, both posting our own one-paragraph takeaways. I think you’ve been engaging in a broadly good-faith way on the object level in this thread and others and I would be interested in returning the ball.
Sure. The best way for me to do that would be through Discord. My id is lone-pine#4172
Would you mind linking the transcript here if you decide to release it publicly? I’d love to hear both of your thoughts expressed in greater detail!
That’d be the plan.
(Ping to reply on Discord.)
Sent you a friend request.
Ooh, I like Ben’s response and am excited about the audio thing happening.
Conventional non AI computers are already fundamentally passive. If you boot them up, they just sit there. What’s the problem. The word agent?
If an AI assistant is replacing a human assistant , it needs to be controllable to the same extent. You don’t expect or want to micromanage a human assistant, but you do expect to set broad parameters.
Yes, the word agent.
Sure, if it’s ‘replacing’, but my example isn’t one of replacement, it’s one where it’s useful in a different way to my other products, in a way that I personally suspect is easier to train/build than something that does ‘replacement’.
I at first also downvoted because your first argument looks incredibly weak (this post has little relation to arguing for/against the difficulty of the alignment problem, what update are you getting on that from here?), as did the followup ‘all we need is...’ which is formulation which hides problems instead of solving them.
Yet, your last point does have import and that you explicitly stated that is useful in allowing everyone to address it, so I reverted to an upvote for honesty, though strong disagree.
To the point, I also want to avoid being in a doomist cult. I’m not a die hard long term “we’re doomed if don’t align AI” guy, but from my readings throughout the last year am indeed getting convinced of the urgency of the problem. Am I getting hoodwinked by a doomist cult with very persuasive rhetoric? Am I myself hoodwinking others when I talk about these problems and they too start transitioning to do alignment work?
I answer these questions not by reasoning on ‘resemblance’ (ie. how much does it look like a doomist cult) but going into finer detail. An implicit argument being made when you call [the people who endorse the top-level post] a doomist cult is that they share the properties of other doomist cults (being wrong, having bad epistemics/policy, preying on isolated/weird minds) and are thus bad. I understand having a low prior for doomist cults look-alikes actually being right (since there is no known instance of a doomist cult of world end being right), but that’s not reason to turn into a rock (as in https://astralcodexten.substack.com/p/heuristics-that-almost-always-work?s=r , believing that “no doom prophecy is ever right”. You can’t prove that no doom prophecy is ever right, only that they’re rarely right (and probably only once).
I thus advise changing your question “do [the people who endorse the top-level post] look like a doomist cult?” into “What would be sufficient level of argument and evidence so I would take this doomist-cult-looking goup seriously?”. It’s not a bad thing to call doom when doom is on the way. Engage with the object level argument and not with your precached pattern recognition “this looks like a doom cult so is bad/not serious”. Personally, I had similar qualms as you’re expressing, but having looked into the arguments, it feels very strong and much more real to believe in “Alignement is hard and by default AGI is an existential risk” rather than not. I hope your conversation with Ben will be productive and that I haven’t only expressed points you already considered (fyi they have already been discussed on LessWrong).