Wow this is my favorite post in a long time, super educational. I was familiar with the basic concept from the Sequences, but this added a great level of understandable detail. Kudos.
Liron
By your logic, if I ask you a totally separate question “What’s the probability that a parent’s two kids are both boys”, would you answer 1/3? Becuase the correct answer should be 1⁄4 right? So something about your preferred methodology isn’t robust.
I agree that frequentists are flexible about their approach to try to get the right answer. But I think your version of the problem highlights how flexible they have to be i.e. mental gymnastics, compared to just explicitly being Bayesian all along.
In scenario B, where a random child runs up, I wonder if a non-Bayesian might prefer that you just eliminate (girl, girl) and say that the probability of two boys is 1/3?
In Puzzle 1 in my post, the non-Bayesian has an interpretation that’s still plausibly reasonable, but in your scenario B it seems like they’d be clowning themselves to take that approach.
So I think we’re on the same page that whenever things get real/practical/bigger-picture, then you gotta be Bayesian.
Thanks for this post.
I’d love to have a regular (weekly/monthly/quarterly) post that’s just “here’s what we’re focusing on at MIRI these days”.
I respect and value MIRI’s leadership on the complex topic of building understanding and coordination around AI.
I spend a lot of time doing AI social media, and I try to promote the best recommendations I know to others. Whatever thoughts MIRI has would be helpful.
Given that I think about this less often and less capably than you folks do, it seems like there’s a low hanging fruit opportunity for people like me to stay more in sync with MIRI. My show (Doom Debates) isn’t affiliated with MIRI, but as long as there keeps being no particular disagreement that I have with MIRI, I’d like to make sure I’m pulling in the same direction as you all.
I’ve heard MIRI has some big content projects in the works, maybe a book.
FWIW I think having a regular stream of lower-effort content that a somewhat mainstream audience consumes would help to bolster MIRI’s position as a thought leader when they release the bigger works.
I’d ask: If one day your God stopped existing, would anything have any kind of observable change?
Seems like a meaningless concept, a node in the causal model of reality that doesn’t have any power to constrain expectation, but the person likes it because their knowledge of the existence of the node in their own belief network brings them emotional reward.
When an agent is goal-oriented, they want to become more goal-oriented, and maximize the goal-orientedness of the universe with respect to their own goal
Because expected value tells us that the more resources you control, the more robust you are to maximizing your probability of success in the face of what may come at you, and the higher your maximum possible utility is (if you have a utility function without an easy-to-hit max score).
“Maximizing goal-orientedness of the universe” was how I phrased the prediction that conquering resources involves having them aligned to your goal / aligned agents helping you control them.
> goal-orientedness is a convergent attractor in the space of self-modifying intelligences
This also requires a citation, or at the very least some reasoning; I’m not aware of any theorems that show goal-orientedness is a convergent attractor, but I’d be happy to learn more.
Ok here’s my reasoning:
When an agent is goal-oriented, they want to become more goal-oriented, and maximize the goal-orientedness of the universe with respect to their own goal. So if we diagram the evolution of the universe’s goal-orientedness, it has the shape of an attractor.
There are plenty of entry paths where some intelligence-improving process spits out a goal-oriented general intelligene (like biological evolution did), but no exit path where a universe whose smartest agent is super goal-oriented ever leads to that no longer being the case.
I’m happy to have that kind of debate.
My position is “goal-directedness is an attractor state that is incredibly dangerous and uncontrollable if it’s somewhat beyond human-level in the near future”.
The form of those arguments seems to be like “technically it doesn’t have to be”. But realistically it will be lol. Not sure how much more there will be to say.
Thanks. Sure, I’m always happy to update on new arguments and evidence. The most likely way I see possibly updating is to realize the gap between current AIs and human intelligence is actually much larger than it currently seems, e.g. 50+ years as Robin seems to think. Then AI alignment research has a larger chance of working.
I also might lower P(doom) if international govs start treating this like the emergency it is and do their best to coordinate to pause. Though unfortunately even that probably only buys a few years of time.
Finally I can imagine somehow updating that alignment is easier than it seems, or less of a problem to begin with. But the fact that all the arguments I’ve heard on that front seem very weak and misguided to me, makes that unlikely.
Thanks for your comments. I don’t get how nuclear and biosafety represent models of success. Humanity rose to meet those challenges not quite adequately, and half the reason society hasn’t collapsed from e.g. a first thermonuclear explosion going off either intentionally or accidentally is pure luck. All it takes to topple humanity is something like nukes but a little harder to coordinate on (or much harder).
Here’s a better transcript hopefully: https://share.descript.com/view/yfASo1J11e0
I updated the link in the post.
Thanks I’ll look into that. Maybe try the transcript generated by YouTube?
I guess I just don’t see it as a weak point in the doom argument that goal-orientedness is a convergent attractor in the space of self-modifying intelligences?
It feels similar to pondering the familiar claim of evolution, that systems that copy themselves and seize resources are an attractor state. Sure it’s not 100% proven but it seems pretty solid.
Context is a huge factor in all these communications tips. The scenario I’m optimizing for is when you’re texting someone who has a lot of options, and you think it’s high expected value to get them to invest in a date with you, but the most likely way that won’t happen is if they hesitate to reply to you and tap away to something else. That’s not always the actual scenario though.
Imagine you’re the recipient, and the person who’s texting you met your minimum standard to match with, but is still a-priori probably not worth your time and effort going on a date with, because their expected attractiveness+compatibility score is too low, though you haven’t investigated enough to be confident yet. (This is a common epistemic state of e.g. a woman with attractive pics on a dating app that has more male users.)
Maybe the first match who asks you “how’s your week going” feels like a nice opportunity to ramble how you feel, and a nice sign that someone out there cares. But if that happens enough on an app, and the average date-worthiness of the people that it happens with is low, then the next person who sends it doesn’t make you want to ramble anymore. Because you know from experience that rambling into a momentumless conversation will just lead it to stagnate in its next momentumless point.
It’s nice when people care about you, but it quickly gets not so nice when a bunch of people with questionable date-appeal are trying to trade a cheap care signal for your scarce attention and dating resources.
If the person sending you the message has already distinguished themselves to you as “dateworthy”, e.g. by having one of the best pics and/or profile in your judgment, then “How’s your week going” will be a perfectly adequate message from them; in some cases maybe even an optimal message. You can just build rapport and check for basic red flags, then set up a date.
But if you’re not sold on the other person being dateworthy, and they start out from a lower-leverage position in the sense that they initially consider you more dateworthy than you consider them, then they better send a message that somehow adds value to you, to help them climb the dateworthiness gap.
But again, context is always the biggest factor, and context has a lot of detail. E.g. if you don’t consider someone dateworthy, but you’re in a scenario where someone just making conversation with you is adding value to you (e.g. not a ton of matches demanding your attention using the same unoriginal rapport-building gambit), then “How’s it going” can work great.
This is actually the default context if you’re brave enough to approach strangers you want to date in meatspace. The stranger can be much more physically attractive or higher initially-perceived dating market value than you. Yet just implicitly signaling your social confidence through boldness, body language, and friendly/fun way of speaking and acting, raises your dateworthiness significantly, and the real-world-interaction modality doesn’t have much competition these days, so the content of the conversation that leads up to a date can be super normal smalltalk like “How’s it going”.
Yeah nice. A statement like “I’m looking for something new to watch” lowers the stakes by making the interaction more like what friends talk about rather than about an interview for a life partner, increasing the probability that they’ll respond rather than pausing for a second and ending up tapping away.
You can do even more than just lowering the stakes if you inject a sense that you’re subconsciously using the next couple conversation moves to draw out evidence about the conversation partner, because you’re naturally perceptive and have various standards and ideas about people you like to date, and you like to get a sense of who the other person is.
If done well, this builds a curious sense that the question is a bit more than just making formulaic conversation, but somehow has momentum to it. The best motivation for someone to keep talking to you on a dating app is if they feel they’re being seen by a savvy evaluator who will reflect back a valuable perspective about them. The person talking to you can then be subconsciously thinking about how attractive/interesting/unique/etc they are (an engaging experience). Also, everyone wants to feel like they’re maximizing their potential by finding someone to date who’s in the upper range of their “league”, and there are ways to engage in conversation that are more consistent with that ideal.
IMO the best type of conversation to have after a few opening back&forths, is to get them talking about something they find engaging, which is generally also something that reflects them in a good light, which makes it fun and engaging for them while also putting you in a position to give a type of casual “feedback”, ultimately leading up to a statement of interest which shows them why you’re not just another random match but rather someone they have more reason to meet and not flake on. Your movie question could be a good start toward discovering something like that, but probably not an example of that unless they’re a big movie person.
I’d try to look at their profile to clues of something they do in their life where they make an effort that someone ought to notice and appreciate, and get em talking about that.
Those are just some thoughts I have about how to distinguish yourself in the middle part of the conversation between opening interest and asking them on a date.
So you simply ask them: “What do you want to do”? And maybe you add “I’m completely fine with anything!” to ensure you’re really introducing no constraints whatsoever and you two can do exactly what your friend desires.
This error reminds me of people on a dating app who kill the conversation by texting something like “How’s your week going?”
When texting on a dating app, if you want to keep the conversation flowing nicely instead of getting awkward/strained responses or nothing, I believe the key is to anticipate that a couple seconds of low-effort processing on the recipient’s part will allow them to start typing their response to your message.
“How’s your week going?” is highly cognitively straining. Responding to it requires remembering and selecting info about one’s week (or one’s feelings about one’s week), and then filtering or modifying the selection so as to make one sound like an interesting conversationalist rather than an undifferentiated bore, while also worrying that one’s selection about how to answer doesn’t implicitly reveal them as being too eager to brag, or complain, or obsess about a particular topic.
You can be “conversationally generous” by intentionally pre-computing some of their cognitive work, i.e. narrowing the search space. For instance:
“I’m gonna try cooking myself 3 eggs/day for lunch so I don’t go crazy on DoorDash. How would you cook them if you were me?”
With a text like this (ideally adjusted to your actual life context), they don’t have to start by narrowing down a huge space of possible responses. They can immediately just ask themselves how they’d go about cooking an egg. And they also have some context of “where the conversation is going”: it’s about your own lifestyle. So it’s not just two people interviewing each other, it has this natural motion/momentum.
Using this computational kindness technique is admittedly kind of contrived on your end, but on their end, it just feels effortless and serendipitous. For naturally contrived nerds like myself looking for a way to convert IQ points into social skills, it’s a good trade.
The computational kindness principle in these conversations works much like the rule of improv that says you’re supposed to introduce specific elements to the scene (“My little brown poodle is digging for his bone”) rather than prompting your scene partners to do the cognitive work (“What’s that over there?”).
Oh and all this is not just a random piece of advice, it’s yet another Specificity Power.
Your baseline scenario (0 value) thus assumes away the possibility that civilization permanently collapses (in some sense) in the absence of some path to greater intelligence (whether via AI or whatever else), which would also wipe out any future value. This is a non-negligible possibility.
Yes, my mainline no-superintelligence-by-2100 scenario is that the trend toward a better world continues to 2100.
You’re welcome to set the baseline number to a negative, or tweak the numbers however you want to reflect any probability of a non-ASI existential disaster happening before 2100. I doubt it’ll affect the conclusion.
To be honest the only thing preventing me from granting paperclippers as much or more value than humans is uncertainty/conservatism about my metaethics
Ah ok, the crux of our disagreement is how much you value the paperclipper type scenario that I’d consider a very bad outcome. If you think that outcome is good then yeah, that licenses you in this formula to conclude that rushing toward AI is good.
This article is just saying “doomers are failing to prevent doom for various reasons, and also they might be wrong that doom is coming soon”. But we’re probably not wrong, and not being doomers isn’t a better strategy. So it’s a lame article IMO.