Muyyd comments on My Objections to “We’re All Gonna Die with Eliezer Yudkowsky”

Muyyd 21 Mar 2023 10:22 UTC
6 points
2
Discussion of human generality.
It should be named Discussion of “human generality versus Artificial General Intelligence generality”. And there is exist example of human generality much closer to ‘okay, let me just go reprogram myself a bit, and then I’ll be as adapted to this thing as I am to’ which is not “i am going to read a book or 10 on this topic” but “i am going to meditate for couple of weeks to change my reward circuitry so i will be as interested in coding after as i am interested in doing all side quests in Witcher 3 now”and “i as a human have documented thing known as “Insensitivity to prior probability” so i will go and find 1000 examples of some probabilistic inference in the internet and use them to train sensitivity”. And humans can’t do that. Imagine how “general” humans would be if they could? But if there will be a machine intelligence that can perform a set of tasks so it would be unquestionable named “general”, i would expect for it to be capable of rewriting part of its code with purpose. This is a speculation of course but so is this AGI-entity, whatever it is.

How to think about superintelligence
Here you completely glossed over topic of superintelligence. It is ironic. EY made his prediction about “current influx of money might stumble upon something” and he did not made a single argument in favor. But you wrote list with 11 entries and 5 paragraphs to argue why it is unlikely. But then EY speaks about efficient market hypothesis and chess and efficiency of action… You did not engaged with the specific argument here. I am in agreement that devs have more control over weaker AIs. But SUPER in superintelligence is one of the main points. It is SUPER big point. This is a speculation of course but you and EY both seems in agreement on danger and capabilities of current AIs (and they are not even “general”). I know i did not wrote a good argument here but i do not see a point there to argue against.

The difficulty of alignment
If you stop the human from receiving reward for eating ice cream, then the human will eat chocolate. And so on and so on. look at what stores have to offer that sugary but not ice cream or chocolate. You have to know in advance that liking apples and berries and milk AND HONEY will result in discovering and creating ice cream. In advance—that’s the point of ice cream metaphor.
And by the point that humanity understood connection between sugar and brain and evolution it made cocaine and meth. Because it is not about sugar but reward circuitry. So you have to select for reward circuitry that (and surrounding it apparatus) won’t invent cocaine before it does. In advance. Far in advance.
And some humans like cocaine so much that we could say that their value system cleanly revolves around the one single goal. Or may be there is no equal example of cocaine for AI. But then sugar is still valid. Because we are at “worm intelligence” (?) now in terms of evolution metaphor and it is hard to tell at this point in time will this thing make an ice cream truck sometime (5 to 10 years from now) in the future. But the you wrote a lot about why there is a better examples than evolution. But you also engaged with ice cream argument so i engaged with it too.

Why aren’t other people as pessimistic as Yudkowsky?
As much as i agree with EY, even for me a thought “they should spend 10 times as much in alignment research than in capability increasing research” truly alien and counterintuitive. I mean “redistribution” not “even more money and human resourses”. And for people whose money and employees i am now so brazenly boss here this kind of thinking even more alien and counterintuitive.
I can see that most of your disagreement here comes from different value theory and how fragale human value is. And it is a crux of the matter on “99% and 100%”. That’s why you wrote ^[4].
I expect there are pretty straightforward ways of leveraging a 99% successful alignment method into a near-100% successful method by e.g., ensembling multiple training runs, having different runs cross-check each other, searching for inputs that lead to different behaviors between different models, transplanting parts of one model’s activations into another model and seeing if the recipient model becomes less aligned, etc

It would be great if you are right. But you wrote ^[4] and this is prime example of “They’re waiting for reality to hit them over the head”. If you are wrong on value theory then this 1% is what differentiate “inhuman weirdtopia” from “weird utopia” of post-ASI world in the best case.

Overall. You have different views on what is AGI, and what is a superintelligence, and your shard theory of human values. But you missed “what is G means in AGI” argument. And did not engage in “how to think about superintelligence” part (and it is superimportant). And missed “ice cream” argument. The shard theory of values i did not know, may be i will read it now—is seems major point in your line of reasoning.