tanagrabeast

Karma: 1,767

Career educator, now a writer at MIRI

tanagrabeast May 14, 2025, 8:24 PM
52 points
3
on: Eliezer and I wrote a book: If Anyone Builds It, Everyone Dies
Other MIRI staff report that the book helped them fit the whole argument in their head better, or that it made sharp some intuitions they had that were previously vague. So you might get something out of it even if you’ve been around a while.
Can confirm! I’ve followed this stuff for forever, but always felt at the edge of my technical depth when it came to alignment. It wasn’t until I read an early draft of this book a year ago that I felt like I could trace a continuous, solid line from “superintelligence grown by a blind process...” to “...develops weird internal drives we could not have anticipated”. Before, I was like, “We don’t have justifiable confidence that we can make something that reflects our values, especially over the long haul,” and now I’m like, “Oh, you can’t get there from here. Clear as day.”
As for why this spells disaster if anyone builds it, I didn’t need any new lessons, but they are here, and they are chilling—even for someone who was already convinced we were in trouble.
Having played some small part in helping this book come together, I would like to attest to the sheer amount of iteration it has gone through over the last year. Nate and co. have been relentlessly paring and grinding this text ever closer to the kind of accessibility that won’t just help individuals understand why we must act, but will make them feel like their neighbors and political leaders can understand it, too. I think that last part counts for a lot.
The book is also pretty engaging.
The pitch I suggested we share with our friends and allies is this:
If you’ve been waiting for a book that can explain the technical roots of the problem in terms your representative and your mother can both understand, this is the one. This is the grounded, no-nonsense primer on why superintelligence built blindly via gradient descent will predictably develop human-incompatible drives; on why humanity cannot hope to endure if the fatal, invisible threshold is crossed; and on what it will take to survive the coming years and decades.

tanagrabeast May 14, 2025, 7:50 PM
8 points
2
in reply to: Thane Ruthenis’s comment on: Eliezer and I wrote a book: If Anyone Builds It, Everyone Dies
Bestseller algorithms are secret and shifty, but hardcover is generally believed to count a little more. And as for overall impact, if either format is good for you, a hardcover preorder helps more because it encourages the publisher to print a bigger initial run of physical copies, which can get pumped into stores and onto shelves where people will see them.

tanagrabeast Jan 6, 2025, 5:39 PM
4 points
2
in reply to: Martin Randall’s comment on: Communications in Hard Mode (My new job at MIRI)
¿Por qué no los dos? Different problems, both worth solving, and not especially connected, in my model.
“Start with an easier problem with similar dynamics to build up expertise” sounds to me like a fully general argument for working on anything other than the actual thing one cares about; it’s a procrastinator’s mantra, in most cases. There are plenty of exceptions, but I don’t think this is one of them.
If I thought there could only ever be one concerted attempt at an AGI ban, and that this was much less likely to fail if the promoters had previously solved some other kind of ban, I could sort of see the logic. But that doesn’t match my model. I’m not even convinced that banning bio gain-of-function is harder, having not tried it, and noticing that I’m confused about the entrenched interests that must exist for it to have continued as long as it has.
I also could see this approach backfiring, as the AGI ban promoters would be seen as the “guys who like to ban stuff” people, and as people who were more concerned about gain-of-function than AI catastrophe.

tanagrabeast Dec 17, 2024, 11:35 PM
2 points
0
in reply to: Knight Lee’s comment on: Communications in Hard Mode (My new job at MIRI)
Glad you got something out of the post! I recognize and appreciate your generous action, and will DM you with regards to your request.

tanagrabeast Dec 16, 2024, 7:47 PM
4 points
0
in reply to: Liron’s comment on: Communications in Hard Mode (My new job at MIRI)
MIRI has its monthly newsletters, though I can tell that’s not quite what you want. I predict (medium confidence) that we will be upping our level of active coordination with allied orgs and individuals on upcoming projects once we ship some of our current ones. I believe you already have channels to people at MIRI, but feel free to DM me if you want to chat.

tanagrabeast Dec 16, 2024, 7:06 PM
5 points
1
in reply to: Chris_Leong’s comment on: Communications in Hard Mode (My new job at MIRI)
Me too! I put the AI problem in the broader class of topics where apathy serves as a dual defense mechanism — not just against needing to expend effort and resources, but against emotional discomfort. You can see the same dual barrier when promoting charitable causes aimed at reducing human misery, or when teaching a subject to students who have really struggled with it in the past.
As a teacher, I attacked both of those roots more deliberately as I grew, trying hard to not make my class feel like work most days while building an atmosphere of low-stakes experimentation where failure could be fun rather than painful. (An example of what success looked like: students taking the risk of trying the more advanced writing approaches I modeled instead of endlessly rewriting the same basic meta essay they had learned in middle school.)
One tactic for eroding defensive apathy is therapeutic empathy. You see this both in many good teachers and (I imagine) relationship counselors. It’s much harder in writing, though I suppose I did a little bit of that in this post when I talked about how the reader and I have probably both felt the pull of Apathy with regards to the AI problem. I think empathy works partly because it builds a human connection, and partly because it brings the feared pain to the surface, where we find (with the help of that human connection) that it can be endured, freeing us to work on the problem that accompanies it.
Whether and how to use authentic human connections in our communications is a topic of ongoing research and debate at MIRI. It has obvious problems with regards to scientific respectability, as there’s this sense in intellectual culture that it’s impossible to be impartial about anything one has feelings about.
And sure, the science itself should be dispassionate. The universe doesn’t care how we feel about it, and our emotions will try to keep us from learning things we don’t want to be true.
But when communicating our findings? To the extent that our task is two-pronged: (1) communicating the truth as we understand it and (2) eliciting a global response to address it, I suspect we will need some human warmth and connection in the second prong even as we continue to avoid it in the first. Apathy loves the cold.

Communications in Hard Mode (My new job at MIRI)

tanagrabeastDec 13, 2024, 8:13 PM

204 points

25 comments5 min readLW link

tanagrabeast Sep 11, 2024, 7:19 AM
5 points
0
in reply to: Ben Pace’s comment on: Notifications Received in 30 Minutes of Class
From chatting with those peak students during the experiment, I think their experience is more like being in a cafeteria abuzz with the voices of friends and acquaintances. At some point, you’re not even trying to follow every conversation, but are just maintaining some vague awareness of the conversations that are taking place and jumping in when you feel like it. People can and do think about other things in a noisy cafeteria. Some even read books! The brain can filter out a constant buzz. It’s just wind blowing through the trees.
The upper middle zone where it’s still possible to try to follow everything (and maybe even reply) looked like more of an attention trap, and was where I was more likely to find that handful of students I already knew had a problem. The FOMO is probably more distracting than the notifications themselves.

tanagrabeast Sep 11, 2024, 4:29 AM
2 points
0
in reply to: Ben Pace’s comment on: Notifications Received in 30 Minutes of Class
They should not have been counting pull notifications, as they were instructed to not engage with their phones during the experiment except to maybe see what caused a vibration or ding. I don’t think students think of pull notifications as real notifications the way we were using the word. They were logging the notifications they could notice while their phone flat was flat on their desk not being touched.

tanagrabeast Jun 28, 2024, 5:57 AM
9 points
0
in reply to: snerx’s comment on: Notifications Received in 30 Minutes of Class
No. Everyone seemed to know what they were, because they all claimed to know someone who uses them. But I don’t recall anyone ever admitting to being such a someone. I sense there’s a bit of a stigma around them.

tanagrabeast May 27, 2024, 9:22 PM
11 points
3
in reply to: cata’s comment on: Notifications Received in 30 Minutes of Class
It is credible that eliminating all preventable distractions (phones, earbuds, etc.) wouldn’t improve learning much. As a teen, I bet you were distracted during class by all sorts of things contained entirely within your head. I know I was!
There’s a somewhat stronger case that video games and social media have given students more things to be preoccupied about even if you make these things inaccessible during class. But I also think that just being a hormonal teen is often distracting enough to fill in any attention vacancies faster than the median lesson can.

Notifications Received in 30 Minutes of Class

tanagrabeastMay 26, 2024, 5:02 PM

356 points

16 comments8 min readLW link

tanagrabeast Mar 10, 2024, 4:58 AM
9 points
0
on: AI Safety 101 : Capabilities—Human Level AI, What? How? and When?
This is important work.
One suggested tweak: I notice this document starts leaning on the term “loss” in section 4.2 but doesn’t tell the reader what that means in this context until 4.3
Something similar happens with the concept of “weights”, first used in section 1.3, but only sort-of-explained later, in 4.2.
Speaking of weights, I notice myself genuinely confused in section 5.2, and I’m not sure if it’s a problem with the wording or with my current mental model (which is only semi-technical). A quoted forecast reads:
“GPT-2030’s copies can share knowledge due to having identical model weights, allowing for rapid parallel learning: I estimate 2,500 human-equivalent years of learning in 1 day.”
Wouldn’t the model doing the sharing have, by definition, different weights than the recipient? (How is a model’s “knowledge” stored if not in the weights? ) My best guess: shareable “knowledge” would take the form of vectors over the models’ common foundational base weights—which should work as long as there hasn’t been too much other divergence since the fork. Is that right? And if so, is there some reason this is a forecast capability and not a current one?

tanagrabeast Apr 17, 2022, 2:07 AM
1 point
on: Convince me that humanity *isn’t* doomed by AGI
My apologies for challenging the premise, but I don’t understand how anyone could hope to be “convinced” that humanity isn’t doomed by AGI unless they’re in possession of a provably safe design that they have high confidence of being able to implement ahead of any rivals.
Put aside all of the assumptions you think the pessimists are making and simply ask whether humanity knows how to make a mind that will share our values. It it does, please tell us how. If it doesn’t, then accept that any AGI we make is, by default, alien—and building an AGI is like opening a random portal to invite an alien mind to come play with us.
What is your prior for alien intelligence playing nice with humanity—or for humanity being able to defeat it? I don’t think it’s wrong to say we’re not automatically doomed. But let’s suppose we open a portal and it turns out ok: We share tea and cookies with the alien, or we blow its brains out. Whatever. What’s to stop humanity from rolling the dice on another random portal? And another? Unless we just happen to stumble on a friendly alien that will also prevent all new portals, we should expect to eventually summon something we can’t handle.
Feel free to place wagers on whether humanity can figure out alignment before getting a bad roll. You might decide you like your odds! But don’t confuse a wager with a solution.

tanagrabeast Dec 6, 2021, 11:48 PM
9 points
in reply to: Padure’s comment on: Visible Thoughts Project and Bounty Announcement
This is about where I’m at, as well. I’ve been wrestling with the idea of starting a run myself, but one of my qualifying traits (I teach creative writing) also means I work full time and have little hope of beating out ten people who don’t. So much the better, I say, so long as the work gets done well and gets done soon...
...but if, eight months from now, much of the budget is still on the table because of quality issues, it may be because people me sat on our hands.
Hopefully, someone will emerge early to work around this issue, if it turns out to be one. I, for one, would love to be able to turn in a sample and then be offered a credible good-faith assurance that if my run is completed at same quality by such and such date, a payment of x will be earned. But as it stands, the deadline is “whenever that fastest mover(s) get there”. Who knows when that will be? Any emergent executive candidate making me a deal might be made a liar by a rival who beats them to the jackpot.

tanagrabeast Nov 30, 2021, 11:28 PM
15 points
on: Visible Thoughts Project and Bounty Announcement
My questions are mostly about the player side, and about how deeply the DM should model the player:
- Should the player be assumed to be implicitly collaborating towards a coherent, meaningful narrative, as is necessary for a long-lived TTRPG? Or might they be the type of player you often find in AI Dungeon who tries to murder and/or have sex with everything in sight?
- Should players ever try to steer the story in a genre-breaking direction, like erotica or murder-hobo power fantasy? Should DMs resist these efforts or play along? If the latter, should the DM go a step further to actively intuit what this particular player would like to see happen?
- Should players provide input that might be more sweeping than usable in narrative? (e.g. Take over the world!) If so, on what sort of level should the DM respond to these?
- Should players be assumed to be ready to end the narrative at the ~1,000-step point?

tanagrabeast Mar 8, 2021, 11:50 PM
5 points
in reply to: MichaelLowe’s comment on: Seven Years of Spaced Repetition Software in the Classroom
I don’t see as much disagreement between us as you might be thinking. Precisely because I agree with your numbered points 1 and 2, I suggested it could be beneficial to compress most of our 12 years of math instruction down to a more intensive 2-3 years. That doesn’t mean we couldn’t instill useful basic arithmetic in lower grades. If we chose a smaller set of core basics, it could be quite practical to retain them over long summers and breaks—at least for the students who stay in our system for the long haul.
I’m also glad you brought up the fact that spaced repetition doesn’t have to involve software. I should have done more to remind readers of this. I weave the spacing and testing effects into the fabric of my course in many ways that have nothing to do with software.
Carefully engineered homework assignments are great if you have motivated students. Take-home SRS could even work for that. Those students are usually fine, though. It’s the apathetic middle I have to fight for, and they won’t do homework regardless of how I try to incentivize it.
Moreover, I don’t feel good about assigning to students who would hate to do it. School is already prison for those kids. I don’t want to send prison home with them. As both a child and a parent, I have been too familiar with the toxic effects homework—especially math homework—can have on family relationships. Let kids have a light at the end of the daily tunnel, I say.
Is homework vital to a successful math program? I don’t know. But I’m glad I don’t teach math.

tanagrabeast Mar 7, 2021, 1:30 AM
29 points
0
in reply to: RedMan’s comment on: Seven Years of Spaced Repetition Software in the Classroom
Did you get IRB approval for these human studies on children?

I’m not sure which is more absurd: the IRB approval process or the very idea of high school. I’ve often asked people to consider a thought experiment where everyone on Earth suddenly forgets that our educational system as we know it ever existed. Would we really reinvent it just like it is now? Hearing how it worked, would we scream in terror and cancel anyone who had taken part? (Status quo bias much?)
When I was studying stand-up comedy, I actually developed a bit in which I play-acted a researcher proposing high school to an ethics board. It went like this:
RESEARCHER: “I was thinking we could stick 35 sleep-deprived teenagers in a room for an hour and expose them to academic stimuli. After that, we’ll do some tests on them.”
BOARD: “I see. Tell me more about your subjects.”
RESEARCHER: “Well, they’re minors, obviously.”
BOARD: “Okay…”
RESEARCHER: “And most of them will be enrolled against their will.”
BOARD: “And how long will you need them?”
RESEARCHER: “6 sessions a day for four years.”
BOARD: “Wait, hold on. Sample size? How many kids are we talking about, here?”
RESEARCHER: “All of them.”
BOARD: (mutterings among themselves) “Well, it sounds like everything is in order...”
Are you familiar with Direct Instruction, which is reminiscent of the Mennonite school?
Someone (probably on LW) pointed me to Direct Instruction a few years back, so yes, I’m acquainted with it. Because of the emphasis on staying fully reviewed on all relevant prior knowledge, I saw it as having obvious promise for technical subjects like math, in the hands of the right teacher. I was less convinced it made a good fit elsewhere, perceiving (perhaps unfairly—I didn’t dig too deeply) some big negative trade-offs:
- Like with my whole-class Anki, it seems heavily reliant on the teacher’s high-energy snake-charmer charisma. This makes it difficult to sustain for much of a class period and demands a great deal from a teacher who tries to do it all day long, day after day. This also makes it difficult to broadly among teachers with different personalities.
- It sounds brittle with regards to roster variance. Specifically, it seems pretty insistent on having everyone in the room up to speed. With careful tracking/grouping of students, this can be achieved, but in practice, kids move in to your school part way through the year and aren’t on the same page. Or you only have the one or two teachers for that grade level math, so the slowest kids are in the same boat as the sharpest. I would think that one or two stragglers would grind the class to a halt, and that this would be statistically inevitable in larger classes. (I don’t know if this makes DI math worse than the status quo, where plenty of students are fall behind and get lost, but with less fanfare and hold-up for everyone else.)
Have you ever tried SRS for muscle memory?
No. I’m not seeing how that would work, or how that would be relevant to what I do, but I’m certainly curious. Do you have examples?

tanagrabeast 6 Mar 2021 2:39 UTC
4 points
in reply to: lejuletre’s comment on: Seven Years of Spaced Repetition Software in the Classroom
Do you think that instinctive drive to listen to experts “talk shop” applies to apathetic students, though?
That’s definitely the right question. If you and another expert leap straight into fluent French, no, I don’t think your apathetic students will try to keep up—especially if they are early beginners. More helpful might be a Franc-lish hybrid conversation where you swap stories of embarrassing errors and insights largely in English while sprinkling in French words and expressions, reenacting parts of colorful encounters from your combined French-speaking experience.
I also think one of the difficulties in modeling language fluency is that the whole point of being fluent is to not need to think about the language, but to simply think in it, so I’m not sure what your vocalized monologue would be about...unless...
Ok, here’s a thought: I and the other motivated folks I learned Spanish with sometimes found ourselves slipping into a Spanglish patois outside of class where we spoke English with Spanish syntax. It felt like silly play at the time, but I now think it was an instinctive intermediate step to thinking in that language.
“It makes rain.” (It’s raining.)
“To me pleases the rain!” (I like rain.)
Perhaps you could try fostering a Franc-lish dialect in your classes by thinking out loud in that style and inviting others to join you in banter, patiently nudging them to get the grammar right instead of just talking like Yoda. From there, substituting actual French with increasing frequency could feel very natural.
You may not have immersive environments, but I imagine you’ll be creating simulated immersion: play-acting situations that give you a chance to think out loud as though you are navigating the moment for real. (Example: Going to the produce section of the store and seeing what looks good, what you could make with it, etc.) How much of that you should do in English, Frank-lish syntactic patois, or French will probably be something you will develop an expert instinct for as you become skilled at reading the room. Along the way, developing an entertaining stage presence for this play-acting would give you a powerful weapon against apathy.
Yes, yes… and you would be randomly involving students in your little improvised plays, assigning them roles, keeping them on their toes, making the non-participants want to get called on.
Yep, it sounds pretty awesome from the comfort of my not-having-to-teach-French perch :)

tanagrabeast 5 Mar 2021 14:29 UTC
10 points
0
in reply to: Kaj_Sotala’s comment on: Seven Years of Spaced Repetition Software in the Classroom
Oh, wow. Yes. That. Looks like there’s another book I don’t need to write.

The fact that the concept was so fleshed out thirty years ago kind of pisses me off. My teacher training was so the opposite of that (a bunch of student group work nonsense). And I’m not finding apprenticeship familiar to new teachers currently, though strong veterans often seem to have at least a half-baked version they’ve derived from experience. I get a lot of wide-eyed “Yes!” when I share it with them.

tanagrabeast

Com­mu­ni­ca­tions in Hard Mode (My new job at MIRI)

No­tifi­ca­tions Re­ceived in 30 Minutes of Class

Communications in Hard Mode (My new job at MIRI)

Notifications Received in 30 Minutes of Class