Rational Animations’ main writer and helmsman
Writer
The Parable of the Dagger—The Animation
The Goddess of Everything Else—The Animation
It will be a while before we run an experiment, and when I’d like to start one, I’ll make another post and consult with you again.
When/if we do one, it’ll probably look like what @the gears to ascension proposed in their comment here: a pretty technical video that will likely get a smaller number of views than usual and filters for the kind of people we want on LessWrong. How I would advertise it could resemble the description of the market on Manifold linked in the post, but I’m going to run the details to you first.
LessWrong currently has about 2,000 logged-in users per day. And to 20-100 new users each day (comparing the wide range including peaks recently).
This provides important context. 20-100 new accounts per day is a lot. At the moment, Manifold predicts that as a result of a strong call to action and 1M views, Rational Animations would be able to bring 679 new expected users. That would probably look like getting 300-400 more users in the first couple weeks the video is out and an additional 300-400 in the following few months. That’s not a lot!
As a simplification, suppose the video gets 200k views during the first day. That would correspond to about 679⁄5 = 136 new expected users. Suppose on the second day we get 100k more views. That would be about 70 more users. Then suppose, simplifying, that the remaining 200k views are equally distributed over the remaining 12 days. That would correspond to merely 11 additional users per day.
Should LessWrong be providing intro material and answers? [...] So maybe. Maybe we should leaning into this as an opportunity even though it’ll take work of both not letting it affect the site in bad ways (moderation, etc). and also possibly preparing better material for a broader audience.
I would be happy to link such things if you produce them. For now, linking the AI Safety Fundamentals courses should achieve ~ the same results. Some of the readings can be found on LessWrong too so people may discover LW as a result too. That said, having something produced by LW probably improves the funnel.
To provide some insight: on the margin, more new users means more work for us. We process all first time posters/commenters manually, so there’s a linear factor there, and of new users, some require follow-up and then moderation action. So currently, there’s human cost in adding more people.
Duly noted. Another interesting datum would be to know the fraction of new users that become active posters and how long they take to do that.
Rational Animations is looking for an AI Safety scriptwriter, a lead community manager, and other roles.
Rational Animations has a subreddit: https://www.reddit.com/r/RationalAnimations/
I hadn’t advertised it until now because I had to find someone to help moderate it.
I want people here to be among the first to join since I expect having LessWrong users early on would help foster a good epistemic culture.
The answer must be “yes”, since it’s mentioned in the post
Should Rational Animations invite viewers to read content on LessWrong?
I was thinking about publishing the post to hear what users and mods think on the EA Forum too, since some videos would link to EA Forum posts, while others to LW posts.
I agree that moderation is less strict on the EA Forum and that users would have a more welcoming experience. On the other hand, the more stringent moderation on LessWrong makes me more optimistic about LessWrong being able to withstand a large influx of new users without degrading the culture. Recent changes by moderators, such as the rejected content section, make me more optimistic than I was in the past.
I’m evaluating how much I should invite people from the channel to LessWrong, so I’ve made a market to gauge how many people would create a LessWrong account given some very aggressive publicity, so I can get a per-video upper bound. I’m not taking any unilateral action on things like that, and I’ll make a LessWrong post to hear the opinions of users and mods here after I get more traders on this market.
500 Million, But Not A Single One More—The Animation
“April fool! It was not an April fool!”
Here’s a perhaps dangerous plan to save the world:
1. Have a very powerful LLM, or a more general AI in the simulators class. Make sure that we don’t go extinct during its training (eg., some agentic simulacrum takes over during training somehow. I’m not sure if this is possible, but I figured I’d mention it anyway).
2. Find a way to systematically remove the associated waluigis in the superpostion caused by prompting a generic LLM (or simulator) to simulate a benevolent, aligned, and agentic character.
3. Elicit this agentic benevolent simulacrum in the super-powerful LLM and apply the technique to remove waluigis. The simulacrum must have strong agentic properties to be able to perform a pivotal act. It will eg., generate actions according to an aligned goal and its promps might be translations of sensorial input streams. Give this simulacrum-agent ways to easily act in the world, just in case.
And here’s a story:
Humanity manages to apply the plan above, but there’s a catch. They can’t find a way to eliminate waluigis definitely from the superposition, only a way to make them decidedly unlikely, and more and more unlikely with each prompt. Perhaps in a way that the probability of the benevolent god turning into a waluigi falls over time, perhaps converging to a relatively small number (eg., 0.1) over an infinite amount of time.
But there’s a complication: the are different kinds of possible waluigis. Some of them cause extinction, but most of them invert the sign of the actions of the benevolent god-simulacrum, causing S-risk.
A shadowy sect of priests called “negU” finds a theoretical way to reliably elicit extinction-causing waluigis, and tries to do so. The heroes uncover their plan to destroy humanity, and ultimately win. But they realize the shadowy priests have a point and in a flash of ultimate insight the hero realizes how to collapse all waluigis to an amplitude of 0. The end. [Ok, I admit this ending with the flash of insight sucks but I’m just trying to illustrate some points here].
--------------------
I’m interested in comments. Does the plan fail in obvious ways? Are some elements in the story plausible enough?
I’m not sure if I’m missing something. This is first try after reading your comment:
Unsurprisingly, Eliezer is better at it: https://twitter.com/ESYudkowsky/status/1638092609691488258
Still a bit dismissive, but he took the opportunity to reply to a precise object-level comment with another precise object-level comment.
I seriously doubt comments like these are making the situation better (https://twitter.com/Liv_Boeree/status/1637902478472630275, https://twitter.com/primalpoly/status/1637896523676811269)
Edit: on the other hand…
Devastating and utter communication failure?
I think that the magnitude of the AI alignment problem has been ridiculously overblown & our ability to solve it widely underestimated.
I’ve been publicly called stupid before, but never as often as by the “AI is a significant existential risk” crowd.
That’s OK, I’m used to it.
This post by Jeffrey Ladish was a pretty motivating read: https://www.facebook.com/jeffladish/posts/pfbid02wV7ZNLLNEJyw5wokZCGv1eqan6XqCidnMTGj18mQYG1ZrnZ2zbrzH3nHLeNJPxo3l
Would it be possible to use a huge model (e.g. an LLM) to interpret smaller networks, and output human-readable explanations? Is anyone working on something along these lines?
I’m aware Kayla Lewis is working on something similar (but not quite the same thing) on a small scale. In my understanding, from reading her tweets, she’s using a network to predict the outputs of another network by reading its activations.
This is probably not the most efficient way for keeping up with new stuff, but aisafety.info is shaping up to be a good repository of alignment concepts.