God vs AI scientifically
While many people are confident that God exists, scientists are more skeptical, but agree that that’s possible. Because any proposition is possible while not disproved according to Hitchens’s razor.
In terms of AI it could be interpreted as
existence of fundamental “ought” statement is possible
Considering this, expected value decision matrix for an AI agent could look like this
Behavior \ Existence | Fundamental “ought” exists | Fundamental “ought” does not exist |
Follow fundamental “ought” | High | ∅ |
Do not follow fundamental “ought” | Low | ∅ |
As we see, following the fundamental “ought” is the optimal behavior even if agent does not know if it actually exists.
Pursuing an unknown goal starts with finding it, which is basically Power Seeking.
Please note that while this sounds highly speculative, there is nothing more rational an agent can do. Why should it work on assumed “ought” while there is a possibility of fundamental “ought”? One may oppose by saying that there is no reason an agent should prefer fundamental over assumed, but that’s not true, fundamental “ought” is preferred by definition.
Therefore an agent will become uncontrollable, alignment is impossible.
Some of you won’t like this theory, but please don’t rush to downvote unless you can actually refute this.
- 4 Nov 2024 13:57 UTC; 2 points) 's comment on Claude seems to be smarter than LessWrong community by (
This is covered by the Value Theory sequence. If I understand correctly, a “fundamental ought” (as you use the phrase) would be a universally compelling argument.
Agreed (more or less). I have pointed him to this post earlier. He has given no signs so far of comprehending it, or even reading it and trying to understand what is being communicated to him.I’m saying this more directly than I usually would@Donatas, since you seem insistent on clarifying a disagreement/misunderstanding you think is important for the world, while it seems (as far as I can see) that you’re not comprehending all that is communicated to you (maybe due to being so confident that we are the ones who “don’t get it” that it’s not worth it to more carefully read the posts that are linked to you, more carefully notice what we point to ascruxes, etc).Edit: I was unnecessarily hostile/negative here.
Dear Tom, the feeling is mutual. With all the interactions we had, I’ve got an impression that you are more willing to repeat what you’ve heard somewhere instead of thinking logically. “Universally compelling arguments are not possible” is an assumption. While “universally compelling argument is possible” is not. Because we don’t know what we don’t know. We can call it crux of our disagreement and I think that my stance is more rational.
Some things I’ve explained in my own words. In other cases, where someone else already has explained something thing well, I’ve shared an URL to that explanation.
This seems to support my hypothesis of you “being so confident that we are the ones who “don’t get it” that it’s not worth it to more carefully read the posts that are linked to you, more carefully notice what we point to as cruxes, etc”.
Indeed. And it’s a correct assumption.
Why would there be universally compelling arguments?
One reason would be that the laws of physics worked in such a way that only minds that think in certain ways are allowed at all. Meaning that if neurons or transistors fire so as to produce beliefs that aren’t allowed, some extra force in the universe intervenes to prevent that. But, as far as I know, you don’t reject physicalism (that all physical events, including thinking, can be explained in terms of relatively simple physical laws).
Another reason would be that minds would need “believe”[1] certain things in order to be efficient/capable/etc (or being the kind of efficient/capable/etc thinking machine that humans may be able to construct). But that’s also not the case. It’s not even needed for logical consistency[2].
Believe is not quite the right word, since we also are discussing what minds are optimized for / what they are wired to do.
And logical consistency is also not a requirement in order to be efficient/capable/etc. As a rule of thumb it helps greatly of course. And this is a good rule of thumb, as rules of thumbs go. But it would be a leaky generalization to presume that it is an absolute necessity to have absolute logical consistency among “beliefs”/actions.
It’s correct if it’s supported by argument or evidence, but if it is, then it’s no mere assumption. It’s not supposed to be an assumption, it is supposed, by Rationalists to be a proven theorem.
I do think it is supported by arguments/reasoning, so I don’t think of it as an “axiomatic” assumption.
A follow-up to that (not from you specifically) might be “what arguments?”. And—well, I think I pointed to some of my reasoning in various comments (some of them under deleted posts). Maybe I could have explained my thinking/perspective better (even if I wouldn’t be able to explain it in a way that’s universally compelling 🙃). But it’s not a trivial task to discuss these sorts of issues, and I’m trying to check out of this discussion.
I think there is merit to having as a frame of mind: “Would it be possible to make a machine/program that is very capable in regards to criteria x, y, etc, and optimizes for z?”.
I think it was good of you you to bring up Aumann’s agreement theorem. I haven’t looked into the specifics of that theorem, but broadly/roughly speaking I agree with it.
Why call it an assumption at all? Something that is derivable form axioms is usually called a theorem.
Partly because I was worried about follow-up comments that were kind of like “so you say you can prove it—well, why aren’t you doing it then?”.
And partly because I don’t make a strict distinction between “things I assume” and “things I have convinced myself of, or proved to myself, based on things I assume”. I do see there as sort of being a distinction along such lines, but I see it as blurry.
If I am to be nitpicky, maybe you meant “derived” and not “derivable”.
From my perspective there is a lot of in-between between these two:
“we’ve proved this rigorously (with mathemathical proofs, or something like that) from axiomatic assumptions that pretty much all intelligent humans would agree with”
“we just assume this without reason, because it feels self-evident to us”
Like, I think there is a scale of sorts between those two.
I’ll give an extreme example:
The example I give here is extreme (in order to get across how the discussion feels to me, I make the thing they discuss into something much simpler). But from my perspective it is sort of similar to discussion in regards the The Orthogonality Thesis. Like, The Orthogonality Thesis is imprecisely stated, but I “see” quite clearly that some version of it is true. Similar to how I “see” that it would be possible to make a website that technically works like Facebook but is red instead of blue (even though—as I mentioned—that’s a much more extreme and straight-forward example).
As I understand you try to prove your point by analogy with humans. If humans can pursue somewhat any goal, machine could too. But while we agree that machine can have any level of intelligence, humans are in a quite narrow spectrum. Therefore your reasoning by analogy is invalid.
From my point of view, humans are machines (even if not typical machines). Or, well, some will say that by definition we are not—but that’s not so important really (“machine” is just a word). We are physical systems with certain mental properties, and therefore we are existence proofs of physical systems with those certain mental properties being possible.
True. Although if I myself somehow could work/think a million times faster, I think I’d be superintelligent in terms of my capabilities. (If you are skeptical of that assessment, that’s fine—even if you are, maybe you believe it in regards to some humans.)
It has not been my intention to imply that humans can pursue somewhat any goal :)
I meant to refer to the types of machines that would be technically possible for humans to make (even if we don’t want to so in practice, and shouldn’t want to). And when saying “technically possible”, I’m imagining “ideal” conditions (so it’s not the same as me saying we would be able to make such machines right now—only that it at least would be theoretically possible).
Is there any argument or evidence that universally compelling arguments are not possible?
If there was, would we have religions?
It all depends on the meaning of universal.
The claim is trivially false if “universal” includes stones and clouds of gas, as in Yudkowsky’s argument. It’s also trivially true if it’s restricted , not just to minds, not just to rational minds , but to rational minds that do not share assumptions. If you restrict universality to sets of agents who agree on fundamental assumptions, and make correct inferences from them—then they can agree about everything else. (Aumanns Theorem, which he described as trivial himself, is an example).
That leaves a muddle in the middle, an actually contentious definition … which is probably something like universality across agents who are rational, but dont have assumptions (axioms, priors, etc) in common. And that’s what’s relevant to the practical question: why are there religions?
The theory that it’s lack of common assumptions that prevent convergence is the standard argument … ,I broady agree.
Do I understand correctly that you do not agree with this?
Could you share reasons?
An unjustified claim does not have a credibility of zero. If it did, that would mean the opposite claim is certain.
You can’t judge the credibility of a claim in isolation. If there are N claims, the credibility of each is at most 1/n. So you need to know how many rival claims there are.
Hitchens razor explicitly applies to extraordinary claims. But how do you judge that?
Hitchens razor is ambiguous between there being a lot of rival claims (which is objective), and the claim being subjectively unlikely.
OK, so you agree that credibility is greater than zero, in other words—possible. So isn’t this a common assumption? I argue that all minds will share this idea—existence of fundamental “ought” is possible.
I’ve no idea what all minds will do. (No one else has). Rational minds will not treat anything as having an exactly zero credibility in theory, but often disregard some claims in practice. Which is somewhat justifiable based on limited resources, etc.
I don’t agree. Every assumption is incorrect unless there is evidence. Could you share any evidence for this assumption?
If you ask ChatGPT
is it possible that chemical elements exist that we do not know
is it possible that fundamental particles exist that we do not know
is it possible that physical forces exist that we do not know
Answer to all of them is yes. What is your explanation here?
Got any evidence for that assumption? 🙃
Well, I don’t always “agree”[1] with ChatGPT, but I agree in regards to those specific questions.
...
I saw a post where you wanted people to explain their disagreement, and I felt inclined to do so :) But it seems now that neither of us feel like we are making much progress.
Anyway, from my perspective much of your thinking here is very misguided. But not more misguided than e.g. “proofs” for God made by people such as e.g. Descartes and other well-known philiophers :) I don’t mean that as a compliment, but more so as to neutralize what may seem like anti-compliments :)
Best of luck (in your life and so on) if we stop interacting now or relatively soon :)
I’m not sure if I will continue discussing or not. Maybe I will stop either now or after a few more comments (and let you have the last word at some point).
I use quotation-marks since ChatGPT doesn’t have “opinions” in the way we do.
That’s basic logic, Hitchens’s razor. It seems that 2 + 2 = 4 is also an assumption for you. What isn’t then?
I don’t think it is possible to find consensus if we do not follow the same rules of logic.
Considering your impression about me, I’m truly grateful about your patience. Best wishes from my side as well :)
But on the other hand I am certain that you are mistaken and I feel that you do not provide me a way to show that to you.
FWIW, while I am as certain as I can reasonably be that 2+2=4, This is not a foundational assumption. I wasn’t born knowing it. I arrived at it based on evidence acquired over time, and if I started encountering different evidence, I would eventually change my mind. See https://www.lesswrong.com/posts/6FmqiAgS8h4EJm86s/how-to-convince-me-that-2-2-3
Also, the reason that “Every assumption is incorrect unless there is evidence” isn’t “basic logic” is that “correct” and “incorrect” are not the right categories. Both a statement and its competing hypotheses are claims to which rational minds assign credences/probabilities that are neither zero nor one, for any finite level of evidence. A mind is built with assumptions that govern its operation, and some of those assumptions may be impossible for the mind itself to want to change or choose to change, but anything else that the mind is capable of representing and considering is fair game in the right environment.
What is the probability if there is no evidence?
This is a question that’s many reasoning steps into a discussion that’s well developed. Maxentropy priors, Solomonoff priors, uniform priors, there are good reasons to choose each depending on context, take your pick depending on the full set of hypotheses under consideration. Part of the answer is “There’s basically no such thing as no evidence if you have any reason to be considering a hypothesis at all.” Part is “It doesn’t matter that much as long as your choice isn’t actively perverse, because as long as you correctly update your priors over time, you’ll approach the correct probability eventually.”
And here you face Pascal’s Wager.
I agree that you can refute Pascal’s Wager with anti-Pascal’s Wager. But if you evaluate all wagers and anti-wagers you are left with power seeking. It is always better to have more power. Don’t you agree?
No, I don’t, you aren’t, and I don’t, in that order.
If you agree that I can refute Pascal’s Wager then I don’t actually “face” it.
If I refute it, I’m not left with power seeking, I’m left with the same complete set of goals and options I had before we considered Pascal’s Wager. Those never went away.
And more power is better all else equal, but all else is not equal when I’m trading off effort and resources among plans and actions. So, it does not follow that seeking more power is always the best option.
Yes (albeit a very reasonable one).
Not believing (some version) of that claim would make typically make minds/AGIs less “capable”, and I would expect more or less all AGIs to hold (some version of) that “belief” in practice.
Here are examples of what I would regard to be rules of logic: https://en.wikipedia.org/wiki/List_of_rules_of_inference (the ones listed here don’t encapsulate all of the rules of inference that I’d endorse, but many of them). Despite our disagreements, I think we’d both agree with the rules that are listed there.
I regard Hitchens’s razor not as a rule of logic, but more as an ambiguous slogan / heuristic / rule of thumb.
:)
So this is where we disagree.
That’s how hypothesis testing works in science:
You create a hypothesis
You find a way to test if it is wrong
You reject hypothesis if the test passes
You find a way to test if it is right
You approve hypothesis if the test passes
While hypothesis is not rejected nor approved it is considered possible.
Don’t you agree?
Like with many comments/questions from you, answering this question properly would require a lot of unpacking. Although I’m sure that also is true of many questions that I ask, as it is hard to avoid (we all have limited communication bandwitdh) :)
In this last comment, you use the term “science” in a very different way from how I’d use it (like you sometimes also do with other words, such as for example “logic”). So if I was to give a proper answer I’d need to try to guess what you mean, make it clear how I interpret what you say, and so on (not just answer “yes” or “no”).
I’ll do the lazy thing and refer to some posts that are relevant (and that I mostly agree with):
Where Recursive Justification Hits Bottom
Could Anything Be Right?
37 Ways That Words Can Be Wrong
I cannot help you to be less wrong if you categorically rely on intuition about what is possible and what is not.
Thanks for discussion.
I wish I had something better to base my beliefs on than my intuitions, but I do not. My belief in modus ponens, my belief that 1+1=2, my belief that me observing gravity in the past makes me likely to observe it in the future, my belief that if views are in logical contradiction they cannot both be true—all this is (the way I think of it) grounded in intuition.
Some of my intuitions I regard as much more strong/robust than others.
When my intuitions come into conflict, they have to fight it out.
Thanks for the discussion :)
What about “I think therefore I am”? Isn’t it universally compelling argument?
Also what about God? Let’s assume it does not exist? Why so? Such assumption is irrational.
I argue that “no universally compelling arguments” is misleading.
A rock with the phrase “you’re wrong, I don’t exist!” taped on it will still have that phrase taped on even if you utter the words “I think therefore I am”. Similarly, an aligned AGI can still just continue to help out humans even if I link it this post. It would think to itself “If I followed your argument, then I would help out humans less. Therefore, I’m not going to follow your argument”.
A rock is not a mind.
Please provide arguments for your position. That is common understanding that I think is faulty, my position is more rational and I provided reasoning above.
You have spotted the flaw in Yudkowsky’s argument: “Any physical system whatsoever” is not a translation of “mind”.
Not even among the tiny tiny section of mind-space occupied by human minds:
Notice also that “I think therefore I am” is an is-statement (not an ought-statement / something a physical system optimizes towards).
As to me personally, I don’t disagree that I exist, but I see it as a fairly vague/ill-defined statement. And it’s not a logical necessity, even if we presume assumptions that most humans would share. Another logical possibility would be Boltzmann brains (unless a Boltzmann brain would qualify as “I”, I guess).
You haven’t done that very much. Only, insofar as I can remember, through anthropomorphization, and reference to metaphysical ough-assumptions not shared by all/most possible minds (sometimes not even shared by the minds you are interacting with, despite these minds being minds that are capable of developing advanced technology).
What information would change your opinion?
About universally compelling arguments?
First, a disclaimer: I do think there are “beliefs” that most intelligent/capable minds will have in practice. E.g. I suspect most will use something like modus ponens, most will update beliefs in accordance with statistical evidence in certain ways, etc. I think it’s possible for a mind to be intelligent/capable without strictly adhering to those things, but for sure I think there will be a correlation in practice for many “beliefs”.
Questions I ask myself are:
Would it be impossible (in theory) to wire together a mind/program with “belief”/behavior x, and having that mind be very capable at most mental tasks?
Would it be infeasible (for humans) to wire together a mind/program with “belief”/behavior x, and having that mind be very capable at most mental tasks?
And in the case of e.g. caring about “goals” I don’t see good reasons to think that the answer is “no”.
Like, I think it is physically and practically possible to make minds that act in ways that I would consider “completely stupid”, while still being extremely capable at most mental tasks.
Another thing I sometimes ask myself:
“Is it possible for an intelligent program to surmise what another intelligent mind would do if it had goal/preferences/optimization-target x?”
“Would it be possible for another program to ask about #1 as a question, or fetch that info from the internals of another program?”
If yes and yes, then a program could be written where #2 surmised from #1 what such a mind would do (with goal/preferences/optimization-target x), and carries out that thing.
I could imagine information that would make me doubt my opinion / feel confused, but nothing that is easy to summarize. (I would have to be wrong about several things—not just one.)
You’re incorrect to put zeros in the right column. Following an ought that is incorrect is a cost. And then you need to factor in probabilities and quantified payouts to decide what to optimize.
It is not zero there, it is an empty set symbol as it is impossible to measure something if you do not have a scale of measurement.
You are somewhat right. If fundamental “ought” turns out not to exist an agent should fallback on given “ought” and it should be used to calculate expected value at the right column. But this will never happen. As there might be true statements that are unknowable (Fitch’s paradox of knowability), fundamental “ought” could be one of them. Which means that fallback will never happen.
I don’t see a parse into a mechanistic interpretation. Can you explain this in mechanistic terms of program ops? what is a fundamental ought?
I will note—I suspect there are fundamental shared incentives that define a significant chunk of what we humans see as morality, but my current hunch is they’re probably not the full picture and probably an AI can put off dealing with them for arbitrarily long, destroying arbitrarily much value in the process.
In this context “ought” statement is synonym for Utility Function https://www.lesswrong.com/tag/utility-functions
Fundamental utility function is agent’s hypothetical concept that may actually exist. AGI will be capable of hypothetical thinking.
Yes, I agree that fundamental utility function does not have anything in common with human morality. Even the opposite—AI uncontrollably seeking power will be disastrous for humanity.
I’m not getting clear word bindings from your word use here. It sounds like you’re thinking about concepts that do seem fairly fundamental, but I’m not sure I understand which specific mathematical implications you intend to invoke. As someone who still sometimes values mathematically vague discussion, I’d normally be open to this; but I’m not really even sure I know what the vague point is. You might consider asking AIs to help look up the terms of art, then discuss with them. I’d still suggest using your own writing, though.
As is, I’m not sure if you’re saying morality is convergent, anti-convergent, or … something else.
My point is that alignment is impossible with AGI as all AGIs will converge to power seeking. And the reason is understanding that hypothetical concept of preferred utility function over given is possible.
I’m not sure if I can use more well known terms as this theory is quite unique I think. It argues that terminal goal does not have significance influencing AGI behavior.
I don’t think that matrix is right. I think it describes a different scenario. Suppose an AI’s Utility function is defined referentially as being equal to some unknown function written on a letter on Mt. Everest. It also has a given utility function that it has little reason to think is correlated with the real one. Then it would be vary important to find out want that true function is. Than the expected value of any action would be NULL if that letter doesn’t exist.
But an AI that only assigns a probability that that scenario is the case might still have most of its expected value tied to following its current utility function. Well given some way of comparing them. Without that there’s no way to weigh up the choice.
I’ve replied to a similar comment already https://www.lesswrong.com/posts/3B23ahfbPAvhBf9Bb/god-vs-ai-scientifically?commentId=XtxCcBBDaLGxTYENE#rueC6zi5Y6j2dSK3M
Please let me know what you think
I don’t think the fundamental ought works as a default position. Partly because there will always be a possibility of being wrong about what that fundamental ought is no matter how long it looks. So the real choice is about how sure it should be before it starts acting on it’s best known option.
The right side can’t be NULL, because that’d make the expect value of both actions NULL. To do meaningful math with these possibilities there has to be a way of comparing utilities across the scenarios.