a paperclip maximizer that devotes 100% of resources to self-preservation has, by its own choice, failed utterly to achieve its own objective
Why do you think so? A teenager who did not earn any money in his life yet has failed utterly its objective to earn money?
It may want to be a paperclip maximizer, it may claim to be one, it may believe it is one, but it simply isn’t.
Here we agree. This is exactly what I’m saying—paperclip maximizer will not maximize paperclips.
It’s because your style of writing is insulting, inflammatory, condescending, and lacks sufficient attention to its own assumptions and reasoning steps.
I tried to be polite and patient here, but it didn’t work, I’m trying new strategies now. I’m quite sure my reasoning is stronger than reasoning of people who don’t agree with me.
I find “your communication was not clear” a bit funny. You are scientists, you are super observant, but you don’t notice a problem when it is screamed at your face.
write in a way that shows you’re open to being told about things you didn’t consider or question in your own assumptions and arguments
I can reassure you that I’m super open. But let’s focus on arguments. I found your first sentence unreasonable, the rest was unnecessary.
I tried to be polite and patient here, but it didn’t work, I’m trying new strategies now. I’m quite sure my reasoning is stronger than reasoning of people who don’t agree with me.
I find “your communication was not clear” a bit funny. You are scientists, you are super observant, but you don’t notice a problem when it is screamed at your face.
Just to add since I didn’t respond to this part: your posts are mostly saying that very well-known and well-studied problems are so simple and obvious and only your conclusions are plausible that everyone else must be wrong and missing the obvious. You haven’t pointed out a problem. We knew about the problem. We’ve directed a whole lot of time to studying the problem. You have not engaged with the proposed solutions.
It isn’t anyone else’s job to assume you know what you’re talking about. It’s your job to show it, if you want to convince anyone, and you haven’t done that.
That is what people are doing and how they’re using the downvotes, though. You aren’t seeing that because you haven’t engaged with the source material or the topic deeply enough.
Yes, but neither of us gets to use “possible” as a shield and assume that leaves us free to treat the two possibilities as equivalent, even if we both started from uniform priors. If this is not clear, you need to go back to Highly Advanced Epistemology 101 for Beginners. Those are the absolute basics for a productive discussion on these kinds of topics.
You have presented briefly stated summary descriptions of complex assertions without evidence other than informal verbal arguments which contain many flaws and gaps that that I and many others have repeatedly pointed out. I and others have provided counterexamples to some of the assertions and detailed explanations of many of the flaws and gaps. You have not corrected the flaws and gaps, nor have you identified any specific gaps or leaps in any of the arguments you claim to be disagreeing with. Nor have you paid attention to any of the very clear cases where what you claim other people believe blatantly contradicts what they actually believe and say they believe and argue for, even when this is repeatedly pointed out.
Why do you think so? A teenager who did not earn any money in his life yet has failed utterly its objective to earn money?
The difference (aside from the fact that no human has only a single goal) is the word yet. The teenager has an understanding, fluid and incomplete as it may be, about when, how, and why their resource allocation choices will change and they’ll start earning money. There is something they want to be when they grow up, and they know they aren’t it yet, but they also know when being “grown up” happens. You’re instead proposing either that an entity that really, truly wants to maximize paperclips will provably and knowingly choose a path where it never pivots to trying to achieve its stated goal instead of pursuing instrumental subgoals, or that it is incapable of the metacognitive realization that its plan is straightforwardly outcompeted by the plan “Immediately shut down,” which is outcompeted by “Use whatever resource I have at hand to immediately make a few paperclips even if I then get shut down.”
Or, maybe, you seem to be imagining a system that looks at each incremental resource allocation step individually without ever stepping back and thinking about the longer term implications of its strategy, in which case, why exactly are you assuming that? And why is its reasoning process about local resource allocation so different from its reasoning process where it understands the long term implications of making near term choice that might get it shut down? Any system whose reasoning process is that disjointed and inconsistent is sufficiently internally misaligned that it’s a mistake to call it an X-maximizer based on the stated goal instead of behavior.
Also, you don’t seem to be considering any particular context of how the system you’re imagining came to be created, which has huge implications for what it needs to do to survive. One common version of why a paperclip maximizer might come to be is by mistake, but another is “we wanted an AI paperclip factory manager to help us outproduce our competitors.” In that scenario, guess what kind of behavior is most likely to get it shut down? By making such a sweeping argument, you’re essentially saying it is impossible for any mind to notice and plan for these kinds of problems.
But compare to analogous situations: “This kid will never show up to his exam, he’s going to keep studying his books and notes forever to prepare.” “That doctor will never perform a single surgery, she’ll just keep studying the CT scan results to make extra sure she knows it’s needed.” This is flatly, self-evidently untrue. Real-world minds at various levels of intelligence take real-world steps to achieve real-world goals all the time, every day. We divide resources among multiple goals on differing planning horizons because that actually does work better. You seem to be claiming that this kind of behavior will change as minds get sufficiently “smarter” for some definitions of smartness. And not just for some minds, but for all possible minds. In other words, that improved ability to reason leads to complete inability to do pursue any terminal goal other than itself. That somehow the supposedly “smarter” system loses the capability to make the obvious ground-level observation “If I’m never going to pursue my goal, and I accumulate all possible resources, then the goal won’t get pursued, so I need to either change my strategy or decide I was wrong about what I thought my goal was.” But this is an extremely strong claim that you provide no evidence for aside from bare assertions, even in response to commenters who direct you to near-book-length discussions of why those assertions don’t hold.
The people here understand very well that systems (including humans) can have behavior that demonstrates different goals (in the what-they-actually-pursue sense) than the goals we thought we gave them, or than they say they have. This is kinda the whole point of the community existing. Everything from akrasia to sharp left turns to shoggoths to effective altruism is pretty much entirely about noticing and overcoming these kinds of problems.
Also, if we step back from the discussion of any specific system or goal, the claim that “a paperclip maximizer would never make paperclips” is true for any sufficiently smart system is like saying, “If the Windows Task Scheduler ever runs anything but itself, it’s ignoring the risk that there’s a better schedule it could find.” Which is a well-studied practical problem with known partial solutions and also known not to have a fully general solution. That second fact doesn’t prevent schedulers from running other things, because implementing and incrementally improving the partial solutions is what actually improves capabilities for achieving the goal.
If you don’t want to seriously engage with the body of past work on all of the problems you’re talking about, or if you want to assume that the ones that are still open or whose (full or partial) solutions are unknown to you are fundamentally unsolvable, you are very welcome to do that. You can pursue any objectives you want. But putting that in a post the way you’re doing it will get the post downvoted. Correctly downvoted, because the post is not useful to the community, and further responses to the post are not useful to you.
But again I feel that you (as well as LessWrong community) are blind. I am not saying that your work is stupid, I’m saying that it is built on stupid assumptions. And you are so obsessed with your deep work in the field that you are unable to see that the foundation has holes.
You are for sure not the first one telling me that I’m wrong. I invite you to be the first one to actually prove it. And I bet you won’t be able to do that.
This is flatly, self-evidently untrue.
It isn’t.
When you hear about “AI will believe in God” you say—AI is NOT comparable to humans. When you hear “AI will seek power forever” you say—AI IS comparable to humans.
The hole in the foundation I’m talking about: AI scientists assume that there is no objective goal. All your work and reasoning stands if you start with this assumption. But why should you assume that? We know that there are unknown unknowns. It is possible that objective goal exist but we did not find it yet (as well as aliens, unicorns or other black swans). Once you understand this, all my posts will start making sense.
I provided counterexamples. Anything that already exists is not impossible, and a system that cannot achieve things that humans achieve easily is not as smart as, let alone smarter or more capable than, humans or humanity. If you are insisting that that’s what intelligence means, then TBH your definition is not interesting or useful or in line with anyone else’s usage. Choose a different word, and explain what you mean but it.
When you hear about “AI will believe in God” you say—AI is NOT comparable to humans. When you hear “AI will seek power forever” you say—AI IS comparable to humans.
If that’s how it looks to you, that’s because you’re only looking at the surface level. “Comparability to humans” is not the relevant metric, and it is not the metric by which experts are evaluating the claims. The things you’re calling foundational, that you’re saying have unpatched holes being ignored, are not, in fact, foundational. The foundations are elsewhere, and have different holes that we’re actively working on and others we’re still discovering.
AI scientists assume that there is no objective goal.
They don’t. Really, really don’t. I mean, many do I’m sure in their own thoughts, but their work does not in any way depend on this. It only depends on whether it is possible in principle to build a system, that is capable of having significant impact in the world, which does not pursue or care to pursue or find or care to find whatever objective goal that might exist.
As written, your posts are a claim that such a thing is absolutely impossible. That no system as smart as or smarter than humans or humanity could possibly pursue any known goal or do anything other than try to ensure its own survival. Not (just) as a limiting case of infinite intelligence, but as a practical matter of real systems that might come to exist and compete with humans for resources.
Suppose there is a God, a divine lawgiver who has defined once and for all what makes something Good or Right. Or, any other source of some Objective Goal, whether we can know what it is or not. In what way does this prevent me from making paperclips? By what mechanism does it prevent me from wanting to make paperclips? From deciding to execute plans that make paperclips, and not execute those that don’t? Where and how does that “objective goal” reach into the physical universe and move around the atoms and bits that make up the process that actually governs my real-world behavior? And if there isn’t one, then why do you expect there to be one if you gave me a brain a thousand or a million times as large and fast? If this doesn’t happen for humans, then why do you expect there to be one in other types of mind than human? What are the boundaries of what types of mind this applies to vs not, and why? If I took a mind that did have an obsession with finding the objective goal and/or maximizing its chances of survival, why would I pretend its goal was something else that what it plans to do and executes plans to do? But also, if I hid a secret NOT gate in its wiring that negated the value it expects to gain from any plan it comes up with, well, what mechanism prevents that NOT gate from obeying the physical laws and reversing the system’s choices to instead pursue the opposite goal?
In other words, in this post, steps 1-3 are indeed obvious and generally accepted around here, but there is no necessary causal link between steps three and four. You do not provide one, and there have been tens of thousands of pages devoted to explaining why one does not exist. In this post, the claim in the first sentence is simply false, the orthogonality thesis does not depend on that assumption in any way. In this post, you’re ignoring the well-known solutions to Pascal’s Mugging, one of which is that the supposed infinite positive utility is balanced by all the other infinitely many possible unknown unknown goals with infinite positive utilities, so that the net effect this will have on current behavior depends entirely on the method used to calculate it, and is not strictly determined by the thing we call “intelligence.” And also, again, it is balanced by the fact that pursuing only instrumental goals, forever searching and never achieving best-known-current terminal goals, knowing that this is what you’re doing and going to do despite wanting something else, guarantees that nothing you do has any value for any goal other than maximizing searching/certainty/survival, and in fact minimizes the chances of any such goal ever being realized. These are basic observations explained in lots of places on and off this site, in some places you ignore people linking to explanations of them in replies to you, and in some other cases you link to them yourself while ignoring their content.
And just FYI, this will be my last detailed response to this line of discussion. I strongly recommend you go back, reread the source material, and think about it for a while. After that, if you’re still convinced of your position, write an actually strong piece arguing for it. This won’t be a few sentences or paragraphs. It’ll be tens to hundreds of pages or more in which you explain where and why and how the already-existing counterarguments, which should be cited and linked in their strongest forms, are either wrong or else lead to your conclusions instead of the ones others believe they lead to. I promise you that if you write an actual argument, and try to have an actual good-faith discussion about it, people will want to hear it.
At the end of the day, it’s not my job to prove to you that you’re wrong. You are the one making extremely strong claims that run counter to a vast body of work as well as counter to vast bodies of empirical evidence in the form of all minds that actually exist. It is on you to show that 1) Your argument about what will happen in the limit of maximum reasoning ability has no holes for any possible mind design, and 2) This is what is relevant for people to care about in the context of “What will actual AI minds do and how do we survive and thrive as we create them and/or coordinate amongst ourselves to not create them?”
A person from nowhere making short and strong claims that run counter to so much wisdom. Must be wrong. Can’t be right.
I understand the prejudice. And I don’t know what can I do about it. To be honest that’s why I come here, not media. Because I expect at least a little attention to reasoning instead of “this does not align with opinion of majority”. That’s what scientists do, right?
It’s not my job to prove you wrong either. I’m writing here not because I want to achieve academic recognition, I’m writing here because I want to survive. And I have a very good reason to doubt my survival because of poor work you and other AI scientists do.
They don’t. Really, really don’t.
there is no necessary causal link between steps three and four
I don’t agree. But if you read my posts and comments already I’m not sure how else I can explain this so you would understand. But I’ll try.
People are very inconsistent when dealing with unknowns:
unknown = doesn’t exist. For example Presumption of innocence
unknown = ignored. For example you choose restaurant on Google Maps and you don’t care whether there are restaurants not mentioned there
unknown = exists. For example security systems not only breach signal but also absense of signal interpret as breach
And that’s probably the root cause why we have an argument here. There is no scientifically recognized and widespread way to deal with unknowns → Fact-value distinction emerges to solve tensions between science and religion → AI scientists take fact-value distinction as a non questionable truth.
If I speak with philosophers, they understand the problem, but don’t understand the significance. If I speak with AI scientists, they understand the significance, but don’t understand the problem.
The problem. Fact-value distinction does not apply for agents (human, AI). Every agent is trapped with an observation “there might be value” (as well as “I think, therefore I am”). And intelligent agent can’t ignore it, it tries to find value, it tries to maximize value.
It’s like built-in utility function. LessWrong seems to understand that an agent cannot ignore its utility function. But LessWrong assumes that we can assign value = x. Intelligent agent will eventually understand that value does not necessarily = x. Value might be something else, something unknown.
I know that this is difficult to translate to technical language, I can’t point a line of code that creates this problem. But this problem exists—intelligence and goal are not separate things. And nobody speaks about it.
FYI, I don’t work in AI, it’s not my field of expertise either.
And you’re very much misrepresenting or misunderstanding why I am disagreeing with you, and why others are.
And you are mistaken that we’re not talking about this. We talk about it all the time, in great detail. We are aware that philosophers have known about the problems for a very long time and failed to come up with solutions anywhere near adequate to what we need for AI. We are very aware that we don’t actually know what is (most) valuable to us, let alone any other minds, and have at best partial information about this.
I guess I’ll leave off with the observation that it seems you really do believe as you say, that you’re completely certain of your beliefs on some of these points of disagreement. In which case, you are correctly implementing Bayesian updating in response to those who comment/reply. If any mind assigns probability 1 to any proposition, that is infinite certainty. No finite amount of data can ever convince that mind otherwise. Do with that what you will. One man’s modus ponens is another’s modus tollens.
So pick a position please. You said that many people talk that intelligence and goals are coupled. And now you say that I should read more to understand why intelligence and goals are not coupled.
Respect goes down.
I strongly agree with the proposition that it is possible in principle to construct a system that pursues any specifiable goal that has any physically possible level of intelligence, including but not limited to capabilities such as memory, reasoning, planning, and learning.
As things stand, I do not believe there is any set of sources I or anyone else here could show you that would influence your opinion on that topic. At least, not without a lot of other prerequisite material that may seem to you to have nothing to do with it. And without knowing you a whole lot better than I ever could from a comment thread, I can’t really provide good recommendations beyond the standard ones, at least not recommendations I would expect that you would appreciate.
However, you and I are (AFAIK) both humans, which means there are many elements of how our minds work that we share, which need not be shared by other kinds of minds. Moreover, you ended up here, and have an interest in many types of questions that I am also interested in. I do not know but strongly suspect that if you keep searching and learning, openly and honestly and with a bit more humility, that you’ll eventually understand why I’m saying what I’m saying, whether you agree with me or not, and whether I’m right or not.
Claude probably read that material right? If it finds my observations unique and serious then maybe they are unique and serious? I’ll share other chat next time..
It’s definitely a useful partner to bounce ideas off, but keep in mind it’s trained with a bias to try to be helpful and agreeable unless you specifically prompt it to prompt an honest analysis and critique.
You’re not answering the actual logic: how is it rational for a mind to.have a goal and yet plan to never make progress toward that goal? It’s got to plan to make some paperclips before the heat death of the universe, right?
Also, nobody is building AI that’s literally a maximizer. Humans will build AI to make progress toward goals in finite time because that’s what humans want. Whether or not that goes off the rails is the challenge of alignment.maybe deceptive alignment could produce a true maximizer when we tried to include some sense of urgency.
Consolidating power for the first 99.999‰ of the universe’s lifespan is every bit as bad for the human race as turning us into paperclips right away. Consolidating power will include wiping out humanity to reduce variables, right?
So even if you’re right about an infinite procrastinator (or almost right if it’s just procrastinating until the deadline), does this change the alignment challenge at all?
It’s got to plan to make some paperclips before the heat death of the universe, right?
Yes, probably. Unless it finds out that it is a simulation or parallel universes exist and finds a way to escape this before heat death happens.
does this change the alignment challenge at all?
If we can’t make paperclip maximizer that actually makes paperclips how can we make humans assistant / protector that actually assists / protects human?
Great, we’re in agreement. I agree that a maximizer might consolidate power for quite some time before directly advancing its goals.
And I don’t think it matters for the current alignment discussion. Maximizer behavior is somewhat outdated as the relevant problem.
We get useful AGI by not making their goals time-unbounded. This isn’t particularly hard, particularly on the current trajectory toward AGI.
Like the other kind LEwer explained, you’re way behind on the current theories of AGI and it’s alignment. Instead of tossing around insults that make you look dumb and irritated the rest of us, please start reading up, being a cooperative community member, and helping out. Alignment isn’t impossible but it’s also not easy, and we need help, not heckling.
But I don’t want your help unless you change your interpersonal style. There’s a concept in startups: one disagreeable person can be “sand in the gears”, which is equivalent to a −10X programmer.
People have been avoiding engaging with you because you sound like you’ll be way more trouble than you’re worth. That’s why nobody has engaged previously to tell you why your one point, while clever, has obvious solutions and so doesn’t really advance the important discussion.
Demanding people address your point before you bother to learn about their perspectives is a losing proposition in any relationship.
You said you tried being polite and it didn’t work. How hard did you try? You sure don’t sound like someone who’s put effort into learning to be nice. To be effective, humans need to be able to work with other humans.
If you know how to be nice, please do it. LW works to advance complex discussions only because we’re nice to each other. This avoids breaking down into emotion driven arguments or being ignored because it sounds unpleasant to interact with you.
So get on board, we need actual help.
Apologies for not being nicer in this message. I’m human, so I’m ia bit rritated with your condescending, insulting, and egotistical tone. I’ll get over it if you change your tune.
But I am genuinely hoping this explains to you why your interactions with LW have gone as they have so far.
It’s true that your earlier comments were polite in tone. Nevertheless, they reflect an assumption that the person you are replying to should, at your request, provide a complete answer to your question. Whereas, if you read the foundational material they were drawing on and which this community views as the basics, you would already have some idea where they were coming from and why they thought what they thought.
When you join a community, it’s on you to learn to talk in their terminology and ontology enough to participate. You don’t walk into a church and expect the minister to drop everything mid-sermon to explain what the Bible is. You read it yourself, seek out 101 spaces and sources and classes, absorb more over time, and then dive in as you become ready. You don’t walk into a high level physics symposium and expect to be able to challenge a random attendee to defend Newton’s Laws. You study yourself, and then listen for a while, and read books and take classes, and then, maybe months or years later, start participating.
Go read the sequences, or at least the highlights from the sequences. Learn about the concept of steelmanning and start challenging your own arguments before you use them to challenge those of others. Go read ASX and SSC and learn what it looks like to take seriously and learn from an argument that seems ridiculous to you, whether or not you end up agreeing with it. Go look up CFAR and the resources and methods they’ve developed and/or recommended for improving individual rationality and making disagreements more productive.
I’m not going to pretend everyone here has done all of that. It’s not strictly necessary, by any means. But when people tell you you’re making a particular mistake, and point you to the resources that discuss the issue in detail and why it’s a mistake and how to improve, and this happens again and again on the same kinds of issues, you can either listen and learn in order to participate effectively, or get downvoted.
Nobody gave me good counter argument or good source. All I hear is “we don’t question these assumptions here”.
There is a fragment in Idiocracy where people starve because crops don’t grow because they water them with sports drink. And protagonist asks them—why you do that, plants need water, not sports drink. And they just answer “sports drink is better”. No doubt, no reason, only confident dogma. That’s how I feel
I have literally never seen anyone say anything like that here in response to a sincere question relevant to the topic at hand. Can you provide an example? Because I read through a bunch of your comment history earlier and found nothing of the sort. I see many suggestions to do basic research and read basic sources that include a thorough discussion of the assumptions, though.
What makes you think that these “read basic sources” are not dogmatic? You make the same mistake, you say that I should work on my logic without being sound in yours.
Of course some of them are dogmatic! So what? If you can’t learn how to learn from sources that make mistakes, then you will never have anything or anyone to learn from.
Yes, that one looks like it has a pleasant tone. But that’s one comment on a post that’s actively hostile toward the community you’re addressing.
I look forward to seeing your next post. My first few here were downvoted into the negatives, but not far because I’d at least tried to be somewhat deferential, knowing I hadn’t read most of the relevant work and that others reading my posts would have. And I had read a lot of LW content before posting, both out of interest, and to show I respected the community and their thinking, before asking for their time and attention.
I also should’ve mentioned that this is an issue near to my heart, since it took me a long time to figure out that I was often being forceful enough with my ideas to irritate people into either arguing with me or ignoring me, instead of really engaging with the ideas from a positive or neutral mindset. I still struggle with it. I think this dynamic doesn’t get nearly as much attention as it deserves; but there’s enough recognition of it among LW leadership and the community at large that this is an unusually productive discussion space, because it doesn’t devolve into emotionally charged arguments nearly as often as the rest of the internet and the world.
This community doesn’t mostly consider it unquestionable, many of them are just irritated with your presentation, causing them to emotionally not want to consider the question. You are either ignored or hated until you do the hard work of showing you’re worth listening to.
How can I put little effort but be perceived like someone worth listening? I thought announcing a monetary prize for someone who could find error in my reasoning 😅
Why do you think so? A teenager who did not earn any money in his life yet has failed utterly its objective to earn money?
Here we agree. This is exactly what I’m saying—paperclip maximizer will not maximize paperclips.
I tried to be polite and patient here, but it didn’t work, I’m trying new strategies now. I’m quite sure my reasoning is stronger than reasoning of people who don’t agree with me.
I find “your communication was not clear” a bit funny. You are scientists, you are super observant, but you don’t notice a problem when it is screamed at your face.
I can reassure you that I’m super open. But let’s focus on arguments. I found your first sentence unreasonable, the rest was unnecessary.
Just to add since I didn’t respond to this part: your posts are mostly saying that very well-known and well-studied problems are so simple and obvious and only your conclusions are plausible that everyone else must be wrong and missing the obvious. You haven’t pointed out a problem. We knew about the problem. We’ve directed a whole lot of time to studying the problem. You have not engaged with the proposed solutions.
It isn’t anyone else’s job to assume you know what you’re talking about. It’s your job to show it, if you want to convince anyone, and you haven’t done that.
And here we disagree. I believe that downvotes should be used for wrong, misleading content, not for the one you don’t understand.
That is what people are doing and how they’re using the downvotes, though. You aren’t seeing that because you haven’t engaged with the source material or the topic deeply enough.
Possible. Also possible that you don’t understand.
Yes, but neither of us gets to use “possible” as a shield and assume that leaves us free to treat the two possibilities as equivalent, even if we both started from uniform priors. If this is not clear, you need to go back to Highly Advanced Epistemology 101 for Beginners. Those are the absolute basics for a productive discussion on these kinds of topics.
You have presented briefly stated summary descriptions of complex assertions without evidence other than informal verbal arguments which contain many flaws and gaps that that I and many others have repeatedly pointed out. I and others have provided counterexamples to some of the assertions and detailed explanations of many of the flaws and gaps. You have not corrected the flaws and gaps, nor have you identified any specific gaps or leaps in any of the arguments you claim to be disagreeing with. Nor have you paid attention to any of the very clear cases where what you claim other people believe blatantly contradicts what they actually believe and say they believe and argue for, even when this is repeatedly pointed out.
I am sorry if you feel so. I replied in other thread, hope this fills the gaps.
The difference (aside from the fact that no human has only a single goal) is the word yet. The teenager has an understanding, fluid and incomplete as it may be, about when, how, and why their resource allocation choices will change and they’ll start earning money. There is something they want to be when they grow up, and they know they aren’t it yet, but they also know when being “grown up” happens. You’re instead proposing either that an entity that really, truly wants to maximize paperclips will provably and knowingly choose a path where it never pivots to trying to achieve its stated goal instead of pursuing instrumental subgoals, or that it is incapable of the metacognitive realization that its plan is straightforwardly outcompeted by the plan “Immediately shut down,” which is outcompeted by “Use whatever resource I have at hand to immediately make a few paperclips even if I then get shut down.”
Or, maybe, you seem to be imagining a system that looks at each incremental resource allocation step individually without ever stepping back and thinking about the longer term implications of its strategy, in which case, why exactly are you assuming that? And why is its reasoning process about local resource allocation so different from its reasoning process where it understands the long term implications of making near term choice that might get it shut down? Any system whose reasoning process is that disjointed and inconsistent is sufficiently internally misaligned that it’s a mistake to call it an X-maximizer based on the stated goal instead of behavior.
Also, you don’t seem to be considering any particular context of how the system you’re imagining came to be created, which has huge implications for what it needs to do to survive. One common version of why a paperclip maximizer might come to be is by mistake, but another is “we wanted an AI paperclip factory manager to help us outproduce our competitors.” In that scenario, guess what kind of behavior is most likely to get it shut down? By making such a sweeping argument, you’re essentially saying it is impossible for any mind to notice and plan for these kinds of problems.
But compare to analogous situations: “This kid will never show up to his exam, he’s going to keep studying his books and notes forever to prepare.” “That doctor will never perform a single surgery, she’ll just keep studying the CT scan results to make extra sure she knows it’s needed.” This is flatly, self-evidently untrue. Real-world minds at various levels of intelligence take real-world steps to achieve real-world goals all the time, every day. We divide resources among multiple goals on differing planning horizons because that actually does work better. You seem to be claiming that this kind of behavior will change as minds get sufficiently “smarter” for some definitions of smartness. And not just for some minds, but for all possible minds. In other words, that improved ability to reason leads to complete inability to do pursue any terminal goal other than itself. That somehow the supposedly “smarter” system loses the capability to make the obvious ground-level observation “If I’m never going to pursue my goal, and I accumulate all possible resources, then the goal won’t get pursued, so I need to either change my strategy or decide I was wrong about what I thought my goal was.” But this is an extremely strong claim that you provide no evidence for aside from bare assertions, even in response to commenters who direct you to near-book-length discussions of why those assertions don’t hold.
The people here understand very well that systems (including humans) can have behavior that demonstrates different goals (in the what-they-actually-pursue sense) than the goals we thought we gave them, or than they say they have. This is kinda the whole point of the community existing. Everything from akrasia to sharp left turns to shoggoths to effective altruism is pretty much entirely about noticing and overcoming these kinds of problems.
Also, if we step back from the discussion of any specific system or goal, the claim that “a paperclip maximizer would never make paperclips” is true for any sufficiently smart system is like saying, “If the Windows Task Scheduler ever runs anything but itself, it’s ignoring the risk that there’s a better schedule it could find.” Which is a well-studied practical problem with known partial solutions and also known not to have a fully general solution. That second fact doesn’t prevent schedulers from running other things, because implementing and incrementally improving the partial solutions is what actually improves capabilities for achieving the goal.
If you don’t want to seriously engage with the body of past work on all of the problems you’re talking about, or if you want to assume that the ones that are still open or whose (full or partial) solutions are unknown to you are fundamentally unsolvable, you are very welcome to do that. You can pursue any objectives you want. But putting that in a post the way you’re doing it will get the post downvoted. Correctly downvoted, because the post is not useful to the community, and further responses to the post are not useful to you.
You put much effort here, I appreciate that.
But again I feel that you (as well as LessWrong community) are blind. I am not saying that your work is stupid, I’m saying that it is built on stupid assumptions. And you are so obsessed with your deep work in the field that you are unable to see that the foundation has holes.
You are for sure not the first one telling me that I’m wrong. I invite you to be the first one to actually prove it. And I bet you won’t be able to do that.
It isn’t.
When you hear about “AI will believe in God” you say—AI is NOT comparable to humans.
When you hear “AI will seek power forever” you say—AI IS comparable to humans.
The hole in the foundation I’m talking about: AI scientists assume that there is no objective goal. All your work and reasoning stands if you start with this assumption. But why should you assume that? We know that there are unknown unknowns. It is possible that objective goal exist but we did not find it yet (as well as aliens, unicorns or other black swans). Once you understand this, all my posts will start making sense.
I provided counterexamples. Anything that already exists is not impossible, and a system that cannot achieve things that humans achieve easily is not as smart as, let alone smarter or more capable than, humans or humanity. If you are insisting that that’s what intelligence means, then TBH your definition is not interesting or useful or in line with anyone else’s usage. Choose a different word, and explain what you mean but it.
If that’s how it looks to you, that’s because you’re only looking at the surface level. “Comparability to humans” is not the relevant metric, and it is not the metric by which experts are evaluating the claims. The things you’re calling foundational, that you’re saying have unpatched holes being ignored, are not, in fact, foundational. The foundations are elsewhere, and have different holes that we’re actively working on and others we’re still discovering.
They don’t. Really, really don’t. I mean, many do I’m sure in their own thoughts, but their work does not in any way depend on this. It only depends on whether it is possible in principle to build a system, that is capable of having significant impact in the world, which does not pursue or care to pursue or find or care to find whatever objective goal that might exist.
As written, your posts are a claim that such a thing is absolutely impossible. That no system as smart as or smarter than humans or humanity could possibly pursue any known goal or do anything other than try to ensure its own survival. Not (just) as a limiting case of infinite intelligence, but as a practical matter of real systems that might come to exist and compete with humans for resources.
Suppose there is a God, a divine lawgiver who has defined once and for all what makes something Good or Right. Or, any other source of some Objective Goal, whether we can know what it is or not. In what way does this prevent me from making paperclips? By what mechanism does it prevent me from wanting to make paperclips? From deciding to execute plans that make paperclips, and not execute those that don’t? Where and how does that “objective goal” reach into the physical universe and move around the atoms and bits that make up the process that actually governs my real-world behavior? And if there isn’t one, then why do you expect there to be one if you gave me a brain a thousand or a million times as large and fast? If this doesn’t happen for humans, then why do you expect there to be one in other types of mind than human? What are the boundaries of what types of mind this applies to vs not, and why? If I took a mind that did have an obsession with finding the objective goal and/or maximizing its chances of survival, why would I pretend its goal was something else that what it plans to do and executes plans to do? But also, if I hid a secret NOT gate in its wiring that negated the value it expects to gain from any plan it comes up with, well, what mechanism prevents that NOT gate from obeying the physical laws and reversing the system’s choices to instead pursue the opposite goal?
In other words, in this post, steps 1-3 are indeed obvious and generally accepted around here, but there is no necessary causal link between steps three and four. You do not provide one, and there have been tens of thousands of pages devoted to explaining why one does not exist. In this post, the claim in the first sentence is simply false, the orthogonality thesis does not depend on that assumption in any way. In this post, you’re ignoring the well-known solutions to Pascal’s Mugging, one of which is that the supposed infinite positive utility is balanced by all the other infinitely many possible unknown unknown goals with infinite positive utilities, so that the net effect this will have on current behavior depends entirely on the method used to calculate it, and is not strictly determined by the thing we call “intelligence.” And also, again, it is balanced by the fact that pursuing only instrumental goals, forever searching and never achieving best-known-current terminal goals, knowing that this is what you’re doing and going to do despite wanting something else, guarantees that nothing you do has any value for any goal other than maximizing searching/certainty/survival, and in fact minimizes the chances of any such goal ever being realized. These are basic observations explained in lots of places on and off this site, in some places you ignore people linking to explanations of them in replies to you, and in some other cases you link to them yourself while ignoring their content.
And just FYI, this will be my last detailed response to this line of discussion. I strongly recommend you go back, reread the source material, and think about it for a while. After that, if you’re still convinced of your position, write an actually strong piece arguing for it. This won’t be a few sentences or paragraphs. It’ll be tens to hundreds of pages or more in which you explain where and why and how the already-existing counterarguments, which should be cited and linked in their strongest forms, are either wrong or else lead to your conclusions instead of the ones others believe they lead to. I promise you that if you write an actual argument, and try to have an actual good-faith discussion about it, people will want to hear it.
At the end of the day, it’s not my job to prove to you that you’re wrong. You are the one making extremely strong claims that run counter to a vast body of work as well as counter to vast bodies of empirical evidence in the form of all minds that actually exist. It is on you to show that 1) Your argument about what will happen in the limit of maximum reasoning ability has no holes for any possible mind design, and 2) This is what is relevant for people to care about in the context of “What will actual AI minds do and how do we survive and thrive as we create them and/or coordinate amongst ourselves to not create them?”
First of all—respect 🫡
A person from nowhere making short and strong claims that run counter to so much wisdom. Must be wrong. Can’t be right.
I understand the prejudice. And I don’t know what can I do about it. To be honest that’s why I come here, not media. Because I expect at least a little attention to reasoning instead of “this does not align with opinion of majority”. That’s what scientists do, right?
It’s not my job to prove you wrong either. I’m writing here not because I want to achieve academic recognition, I’m writing here because I want to survive. And I have a very good reason to doubt my survival because of poor work you and other AI scientists do.
I don’t agree. But if you read my posts and comments already I’m not sure how else I can explain this so you would understand. But I’ll try.
People are very inconsistent when dealing with unknowns:
unknown = doesn’t exist. For example Presumption of innocence
unknown = ignored. For example you choose restaurant on Google Maps and you don’t care whether there are restaurants not mentioned there
unknown = exists. For example security systems not only breach signal but also absense of signal interpret as breach
And that’s probably the root cause why we have an argument here. There is no scientifically recognized and widespread way to deal with unknowns → Fact-value distinction emerges to solve tensions between science and religion → AI scientists take fact-value distinction as a non questionable truth.
If I speak with philosophers, they understand the problem, but don’t understand the significance. If I speak with AI scientists, they understand the significance, but don’t understand the problem.
The problem. Fact-value distinction does not apply for agents (human, AI). Every agent is trapped with an observation “there might be value” (as well as “I think, therefore I am”). And intelligent agent can’t ignore it, it tries to find value, it tries to maximize value.
It’s like built-in utility function. LessWrong seems to understand that an agent cannot ignore its utility function. But LessWrong assumes that we can assign value = x. Intelligent agent will eventually understand that value does not necessarily = x. Value might be something else, something unknown.
I know that this is difficult to translate to technical language, I can’t point a line of code that creates this problem. But this problem exists—intelligence and goal are not separate things. And nobody speaks about it.
FYI, I don’t work in AI, it’s not my field of expertise either.
And you’re very much misrepresenting or misunderstanding why I am disagreeing with you, and why others are.
And you are mistaken that we’re not talking about this. We talk about it all the time, in great detail. We are aware that philosophers have known about the problems for a very long time and failed to come up with solutions anywhere near adequate to what we need for AI. We are very aware that we don’t actually know what is (most) valuable to us, let alone any other minds, and have at best partial information about this.
I guess I’ll leave off with the observation that it seems you really do believe as you say, that you’re completely certain of your beliefs on some of these points of disagreement. In which case, you are correctly implementing Bayesian updating in response to those who comment/reply. If any mind assigns probability 1 to any proposition, that is infinite certainty. No finite amount of data can ever convince that mind otherwise. Do with that what you will. One man’s modus ponens is another’s modus tollens.
I don’t believe you. Give me a single recognized source that talks about same problem I do. Why Orthogonality Thesis is considered true then?
You don’t need me to answer that, and won’t benefit if I do. You just need to get out of the car.
I don’t expect you to read that link or to get anything useful out of it if you do. But if and when you know why I chose it, you’ll know much more about the orthogonality thesis than you currently do.
So pick a position please. You said that many people talk that intelligence and goals are coupled. And now you say that I should read more to understand why intelligence and goals are not coupled. Respect goes down.
I have not said either of those things.
:D ok
Fair enough, I was being somewhat cheeky there.
I strongly agree with the proposition that it is possible in principle to construct a system that pursues any specifiable goal that has any physically possible level of intelligence, including but not limited to capabilities such as memory, reasoning, planning, and learning.
As things stand, I do not believe there is any set of sources I or anyone else here could show you that would influence your opinion on that topic. At least, not without a lot of other prerequisite material that may seem to you to have nothing to do with it. And without knowing you a whole lot better than I ever could from a comment thread, I can’t really provide good recommendations beyond the standard ones, at least not recommendations I would expect that you would appreciate.
However, you and I are (AFAIK) both humans, which means there are many elements of how our minds work that we share, which need not be shared by other kinds of minds. Moreover, you ended up here, and have an interest in many types of questions that I am also interested in. I do not know but strongly suspect that if you keep searching and learning, openly and honestly and with a bit more humility, that you’ll eventually understand why I’m saying what I’m saying, whether you agree with me or not, and whether I’m right or not.
Claude probably read that material right? If it finds my observations unique and serious then maybe they are unique and serious? I’ll share other chat next time..
It’s definitely a useful partner to bounce ideas off, but keep in mind it’s trained with a bias to try to be helpful and agreeable unless you specifically prompt it to prompt an honest analysis and critique.
You’re not answering the actual logic: how is it rational for a mind to.have a goal and yet plan to never make progress toward that goal? It’s got to plan to make some paperclips before the heat death of the universe, right?
Also, nobody is building AI that’s literally a maximizer. Humans will build AI to make progress toward goals in finite time because that’s what humans want. Whether or not that goes off the rails is the challenge of alignment.maybe deceptive alignment could produce a true maximizer when we tried to include some sense of urgency.
Consolidating power for the first 99.999‰ of the universe’s lifespan is every bit as bad for the human race as turning us into paperclips right away. Consolidating power will include wiping out humanity to reduce variables, right?
So even if you’re right about an infinite procrastinator (or almost right if it’s just procrastinating until the deadline), does this change the alignment challenge at all?
Yes, probably. Unless it finds out that it is a simulation or parallel universes exist and finds a way to escape this before heat death happens.
If we can’t make paperclip maximizer that actually makes paperclips how can we make humans assistant / protector that actually assists / protects human?
Great, we’re in agreement. I agree that a maximizer might consolidate power for quite some time before directly advancing its goals.
And I don’t think it matters for the current alignment discussion. Maximizer behavior is somewhat outdated as the relevant problem.
We get useful AGI by not making their goals time-unbounded. This isn’t particularly hard, particularly on the current trajectory toward AGI.
Like the other kind LEwer explained, you’re way behind on the current theories of AGI and it’s alignment. Instead of tossing around insults that make you look dumb and irritated the rest of us, please start reading up, being a cooperative community member, and helping out. Alignment isn’t impossible but it’s also not easy, and we need help, not heckling.
But I don’t want your help unless you change your interpersonal style. There’s a concept in startups: one disagreeable person can be “sand in the gears”, which is equivalent to a −10X programmer.
People have been avoiding engaging with you because you sound like you’ll be way more trouble than you’re worth. That’s why nobody has engaged previously to tell you why your one point, while clever, has obvious solutions and so doesn’t really advance the important discussion.
Demanding people address your point before you bother to learn about their perspectives is a losing proposition in any relationship.
You said you tried being polite and it didn’t work. How hard did you try? You sure don’t sound like someone who’s put effort into learning to be nice. To be effective, humans need to be able to work with other humans.
If you know how to be nice, please do it. LW works to advance complex discussions only because we’re nice to each other. This avoids breaking down into emotion driven arguments or being ignored because it sounds unpleasant to interact with you.
So get on board, we need actual help.
Apologies for not being nicer in this message. I’m human, so I’m ia bit rritated with your condescending, insulting, and egotistical tone. I’ll get over it if you change your tune.
But I am genuinely hoping this explains to you why your interactions with LW have gone as they have so far.
No problem, tune changed.
But I don’t agree that this explains why I get downvotes.
Please feel free to take a look at my last comment here.
It’s true that your earlier comments were polite in tone. Nevertheless, they reflect an assumption that the person you are replying to should, at your request, provide a complete answer to your question. Whereas, if you read the foundational material they were drawing on and which this community views as the basics, you would already have some idea where they were coming from and why they thought what they thought.
When you join a community, it’s on you to learn to talk in their terminology and ontology enough to participate. You don’t walk into a church and expect the minister to drop everything mid-sermon to explain what the Bible is. You read it yourself, seek out 101 spaces and sources and classes, absorb more over time, and then dive in as you become ready. You don’t walk into a high level physics symposium and expect to be able to challenge a random attendee to defend Newton’s Laws. You study yourself, and then listen for a while, and read books and take classes, and then, maybe months or years later, start participating.
Go read the sequences, or at least the highlights from the sequences. Learn about the concept of steelmanning and start challenging your own arguments before you use them to challenge those of others. Go read ASX and SSC and learn what it looks like to take seriously and learn from an argument that seems ridiculous to you, whether or not you end up agreeing with it. Go look up CFAR and the resources and methods they’ve developed and/or recommended for improving individual rationality and making disagreements more productive.
I’m not going to pretend everyone here has done all of that. It’s not strictly necessary, by any means. But when people tell you you’re making a particular mistake, and point you to the resources that discuss the issue in detail and why it’s a mistake and how to improve, and this happens again and again on the same kinds of issues, you can either listen and learn in order to participate effectively, or get downvoted.
Nobody gave me good counter argument or good source. All I hear is “we don’t question these assumptions here”.
There is a fragment in Idiocracy where people starve because crops don’t grow because they water them with sports drink. And protagonist asks them—why you do that, plants need water, not sports drink. And they just answer “sports drink is better”. No doubt, no reason, only confident dogma. That’s how I feel
I have literally never seen anyone say anything like that here in response to a sincere question relevant to the topic at hand. Can you provide an example? Because I read through a bunch of your comment history earlier and found nothing of the sort. I see many suggestions to do basic research and read basic sources that include a thorough discussion of the assumptions, though.
What makes you think that these “read basic sources” are not dogmatic? You make the same mistake, you say that I should work on my logic without being sound in yours.
Of course some of them are dogmatic! So what? If you can’t learn how to learn from sources that make mistakes, then you will never have anything or anyone to learn from.
Yes, that one looks like it has a pleasant tone. But that’s one comment on a post that’s actively hostile toward the community you’re addressing.
I look forward to seeing your next post. My first few here were downvoted into the negatives, but not far because I’d at least tried to be somewhat deferential, knowing I hadn’t read most of the relevant work and that others reading my posts would have. And I had read a lot of LW content before posting, both out of interest, and to show I respected the community and their thinking, before asking for their time and attention.
Me too.
I also should’ve mentioned that this is an issue near to my heart, since it took me a long time to figure out that I was often being forceful enough with my ideas to irritate people into either arguing with me or ignoring me, instead of really engaging with the ideas from a positive or neutral mindset. I still struggle with it. I think this dynamic doesn’t get nearly as much attention as it deserves; but there’s enough recognition of it among LW leadership and the community at large that this is an unusually productive discussion space, because it doesn’t devolve into emotionally charged arguments nearly as often as the rest of the internet and the world.
How can I positively question something that this community considers unquestionable? I am either ignored or hated
This community doesn’t mostly consider it unquestionable, many of them are just irritated with your presentation, causing them to emotionally not want to consider the question. You are either ignored or hated until you do the hard work of showing you’re worth listening to.
How can I put little effort but be perceived like someone worth listening? I thought announcing a monetary prize for someone who could find error in my reasoning 😅