There is a school of thought that says you need to mathematically prove that your AGI will be aligned, before you even start building any kind of AI system at all. IMO this would be a great approach if our civilization had strong coordination abilities and unlimited time.
Shiroe
I was thinking more utilons/QALYs. The word “society” admittedly suggests something else, since empirically high-utility outcomes can appear asocial (e.g. staying home and reading a book).
Tell me more why you think the impact on society will be positive.
(Corollary: There’s no wrong morals except from perspective or for signalling purposes.)
Do you consider perspective something experiential or is it conceptual? If the former, is there a shared perspective of sentient life in some respects? E.g. “suffering feels bad”.
Good points. People tend to confuse value pluralism or relativism with open-mindedness.
Contrasting this post with techniques like Word2vec, which do map concepts into spatial dimensions. Every word is assigned a vector and associations are learned via backprop by predicting nearby text. This allows you to perform conceptual arithmetic like “Brother”—“Man” + “Woman”, giving a result which is a vector very close to (in literal spatial terms) the vector for “Sister”.
Not sure what the LW consensus is, but there’s some evidence that Roko’s basilisk is a red herring.
Fortunately, a proper understanding of subjunctive dependence tells us that an
optimally-behaving embedded agent doesn’t need to pretend that causation can
happen backward in time. Such a sovereign would not be in control of its source
code, and it can’t execute an updateless strategy if there was nothing there to
not-update on in the first place before that source code was written. So Roko’s
Basilisk is only an information hazard if FDT is poorly understood.From the post Dissolving Confusion around Functional Decision Theory.
Something like AI Dungeon but not so niche, like for emails. People paying money to outsource a part of their creative intellect is what appears significant to me.
Indeed. I feel like most of the work is done in the definition itself, which is necessarily paradigmatic in this case.
You mention that you’re surprised to have not seen “more vigorous commercialization of language models” recently beyond mere “novelty”. Can you say more about what particular applications you had in mind? Also, do you consider AI companionship as useful or merely novel?
I expect the first killer app that goes mainstream will mark the PONR, i.e. the final test of whether the market prefers capabilities or safety.
I don’t know anyone who claims that it’ll be a linear or unified experience. Without continuity and communication across instances, I don’t think of it as personal immortality in the simple sense, any more than I think about children or great works as immortality.
Doesn’t this also still apply to normal succession of mental states, without branching? How does QM or MWI come into play here?
There is no way to be “sent” anywhere because there is no soul to be shuttled around in the first place. There is only a sequence of experiences that are correlated with each other. In the dead branch there will be no experience, but in the living branch there will be, so the “illusion” of continuity (which is really just correlation) will be allowed to continue there.
Full disclosure: I haven’t the slightest clue how quantum mechanics works.
I like your writing, but I don’t think this piece belongs on LW. Sadly I don’t have a good argument for why I believe that, except for the article’s politics-adjacentness (though you did try to distinguish conservatism from the political right). I hope that you keep writing on this subject, but I’d prefer to not see it on LW in particular.
If I think about a universe full of paperclip maximizers with very high agency… I’m just not feeling it. Whereas at least if it’s a universe full of very happy paperclip maximizers, that feels more compelling.
This is really the old utilitarian argument that we value things (like agency) in addition to utility because they are instrumentally useful (which agency is). But if agency had never given us utility, we would never have valued it.
I like your idea that economic incentives will become the safety bottleneck more so than corrigibility. Many would argue that a pure reasoner actually can influence the world through e.g. manipulation, but this doesn’t seem very realistic to me if the model is memoryless and doesn’t have the ability to recursively ask itself new questions.
Adding such capabilities is fairly easy, however. Which is what your concern is about.
We can compose two paths if-and-only-if the second path starts where the first one begins
Am I misunderstanding, or should this say “where the first one ends”?
I meant that rationality is about systematically getting what you want, and is independent of moods/emotions. One could be a highly emotional rationalist if you are still making the right decisions.
I prefer Dagon’s comments though. Mine wasn’t particularly helpful.
That’s true, but I intentionally wrote “if rationality is simply what you would most prefer” i.e. if it really is the case that it is the most preferred means to the end. In your spoon example, it really may be the case that a spoon is the means you would most prefer to eat something with. A quibble, but yes.
I agree there is also a normative epistemic aspect to rationality, which could either complicate or be subsumed by the slogan “rationality is winning”.
Yes, I agreed in my edit that “worship”/”loss-of-worship” are possible necessary and sufficient correlates of “non-apostate”/”apostate” depending on your definition. However, one might say that worship is not sufficient; what is also required is belief.
Refuting your illusionism about your own experiences is very easy; all that you have to do is look at your hands. If that can be denied by some razor, then so can all of science and mathematics as well.