I have signed no contracts or agreements whose existence I cannot mention.
plex
Yeah, this seems worth a shot. If we do this, we should do our own pre-primary in like mid 2027 to select who to run in each party, so that we don’t split the vote and also so that we select the best candidate.
Someone I know was involved in a DIY pre-primary in the UK which unseated an extremely safe politician, and we’d get a bunch of extra press while doing this.
Humans without scaffolding can do a very finite number of sequential reasoning steps without mistakes. That’s why thinking aids like paper, whiteboards, and other people to bounce ideas off and keep the cache fresh are so useful.
With a large enough decisive strategic advantage, a system can afford to run safety checks on any future versions of itself and anything else it’s interacting with sufficient to stabilize values for extremely long periods of time.
Multipolar worlds though? Yeah, they’re going to get eaten by evolution/moloch/power seeking/pythia.
More cynical take based on the Musk/Altman emails: Altman was expecting Musk to be CEO. He set up a governance structure which would effectively be able to dethrone Musk, with him as the obvious successor, and was happy to staff the board with ideological people who might well take issue with something Musk did down the line to give him a shot at the throne.
Musk walked away, and it would’ve been too weird to change his mind on the governance structure. Altman thought this trap wouldn’t fire with high enough probability to disarm it at any time before it did.
I don’t know whether the dates line up to dis-confirm this, but I could see this kind of 5d chess move happening. Though maybe normal power and incentive psychological things are sufficient.
Looks fun!
I could also remove Oil Seeker’s protection from Pollution; they don’t need it for making Black Chips to be worthwhile for them but it makes that less of an amazing deal than it is.
Maybe have the pollution cost halved for Black, if removing it turns out to be too weak?
Seems accurate, though I think Thinking This Through A Bit involved the part of backchaining where you look at approximately where on the map the destination is, and that’s what some pro-backchain people are trying to point at. In the non-metaphor, the destination is not well specified by people in most categories, and might be like 50 ft in the air so you need a way to go up or something.
And maybe if you are assisting someone else who has well grounded models, you might be able to subproblem solve within their plan and do good, but you’re betting your impact on their direction. Much better to have your own compass or at least a gears model of theirs so you can check and orient reliably.
PS: I brought snacks!
Give me a dojo.lesswrong.com, where the people into mental self-improvement can hang out and swap techniques, maybe a meetup.lesswrong.com where I can run better meetups and find out about the best rationalist get-togethers. Let there be an ai.lesswrong.com for the people writing about artificial intelligence.
Yes! Ish! I’d be keen to have something like this for the upcoming aisafety.com/stay-informed page, where we’re looking like we’ll currently resort to linking to https://www.lesswrong.com/tag/ai?sortedBy=magic#:~:text=Posts%20tagged%20AI as there’s no simpler way to get people specifically to the AI section of the site.
I’d weakly lean towards not using a subdomain, but to using a linkable filter, but yeah seems good.
I’d also think that making it really easy and fluid to cross-post (including selectively, maybe the posts pop up in your drafts and you just have to click post if you don’t want everything cross-posted) would be a pretty big boon for LW.
I’m glad you’re trying to figure out a solution. I am however going to shoot this one down a bunch.
If these assumptions were true, this would be nice. Unfortunately, I think all three are false.
LLMs will never be superintelligent when predicting a single token.
In a technical sense, definitively false. Redwood compared human to AI token prediction and even early AIs were far superhuman. Also, in a more important sense, you can apply a huge amount of optimization on selecting a token. This video gives a decent intuition, though in a slightly different setting.
LLMs will have no state.
False in three different ways. Firstly, people are totally building in explicit state in lots of ways (test time training, context retrieval, reasoning models, etc). Secondly, there’s a feedback cycle of AI influences training data of next AI, which will become a tighter and tighter loop. Thirdly, the AI can use the environment as state in ways which would be nearly impossible to fully trace or mitigate.
not in a way that any emergent behaviour of the system as a whole isn’t reflected in the outputs of any of the constituent LLMs
alas, many well-understood systems regularly do and should be expected to have poorly understood behaviour when taken together.
a simpler LLM to detect output that looks like it’s leading towards unaligned behaviour.
Robustly detecting “unaligned behaviour” is an unsolved problem, if by aligned you mean “makes the long term future good” rather than “doesn’t embarrass the corporation”. Solving this would be massive progress, and throwing a LLM at it naively has many pitfalls.
Stepping back, I’d encourage you to drop by AI plans, skill up at detecting failure modes, and get good at both generating and red-teaming your own ideas (the Agent Foundations course and Arbital are good places to start). Get a long list of things you’ve shown how they break, and help both break and extract the insights from other’s ideas.
the extent human civilization is human-aligned, most of the reason for the alignment is that humans are extremely useful to various social systems like the economy, and states, or as substrate of cultural evolution. When human cognition ceases to be useful, we should expect these systems to become less aligned, leading to human disempowerment.
oh good, I’ve been thinking this basically word for word for a while and had it in my backlog. Glad this is written up nicely, far better than I would likely have done :)
The one thing I’m not a big fan of: I’d bet “Gradual Disempowerment” sounds like a “this might take many decades or longer” to most readers, whereas with capabilities curves this could be a few months to single digit years thing.
I think I have a draft somewhere, but never finished it. tl;dr; Quantum lets you steal private keys from public keys (so all wallets that have a send transaction). Upgrading can protect wallets where people move their coins, but it’s going to be messy, slow, and won’t work for lost-key wallets, which are a pretty huge fraction of the total BTC reserve. Once we get quantum BTC at least is going to have a very bad time, others will have a moderately bad time depending on how early they upgrade.
Nice! I haven’t read a ton of Buddhism, cool that this fits into a known framework.
I’m uncertain of how you use the word consciousness here do you mean our blob of sensory experience or something else?
Yeah, ~subjective experience.
Let’s do most of this via the much higher bandwidth medium of voice, but quickly:
Yes, qualia[1] is real, and is a class of mathematical structure.[2]
(placeholder for not a question item)
Matter is a class of math which is ~kinda like our physics.
Our part of the multiverse probably doesn’t have special “exists” tags, probably everything is real (though to get remotely sane answers you need a decreasing reality fluid/caring fluid allocation).
Math, in the sense I’m trying to point to it, is ‘Structure’. By which I mean: Well defined seeds/axioms/starting points and precisely specified rules/laws/inference steps for extending those seeds. The quickest way I’ve seen to get the intuition for what I’m trying to point at with ‘structure’ is to watch these videos in succession (but it doesn’t work for everyone):
- ^
experience/the thing LWers tend to mean, not the most restrictive philosophical sense (#4 on SEP) which is pointlessly high complexity (edit: clarified that this is not the universal philosophical definition, but only one of several meanings, walked back a little on rhetoric)
- ^
possibly maybe even the entire class, though if true most qualia would be very very alien to us and not necessarily morally valuable
give up large chunks of the planet to an ASI to prevent that
I know this isn’t your main point but.. That isn’t a kind of trade that is plausible. Misaligned superintelligence disassembles the entire planet, sun, and everything it can reach. Biological life does not survive, outside of some weird edge cases like “samples to sell to alien superintelligences that like life”. Nothing in the galaxy is safe.
Re: Ayahuasca from the ACX survey having effects like:
“Obliterated my atheism, inverted my world view no longer believe matter is base substrate believe consciousness is, no longer fear death, non duality seems obvious to me now.”
[1]There’s a cluster of subcultures that consistently drift toward philosophical idealist metaphysics (consciousness, not matter or math, as fundamental to reality): McKenna-style psychonauts, Silicon Valley Buddhist circles, neo-occultist movements, certain transhumanist branches, quantum consciousness theorists, and various New Age spirituality scenes. While these communities seem superficially different, they share a striking tendency to reject materialism in favor of mind-first metaphysics.
The common factor connecting them? These are all communities where psychedelic use is notably prevalent. This isn’t coincidental.
There’s a plausible mechanistic explanation: Psychedelics disrupt the Default Mode Network and adjusting a bunch of other neural parameters. When these break down, the experience of physical reality (your predictive processing simulation) gets fuzzy and malleable while consciousness remains vivid and present. This creates a powerful intuition that consciousness must be more fundamental than matter. Conscious experience is more fundamental/stable than perception of the material world, which many people conflate with the material world itself.
The fun part? This very intuition—that consciousness is primary and matter secondary—is itself being produced by ingesting a chemical which alters physical brain mechanisms. We’re watching neural circuitry create metaphysical intuitions in real-time.
This suggests something profound about metaphysics itself: Our basic intuitions about what’s fundamental to reality (whether materialist OR idealist) might be more about human neural architecture than about ultimate reality. It’s like a TV malfunctioning in a way that produces the message “TV isn’t real, only signals are real!”
This doesn’t definitively prove idealism wrong, but it should make us deeply suspicious of metaphysical intuitions that feel like direct insight—they might just be showing us the structure of our own cognitive machinery.
- ^
Claude assisted writing, ideas from me and edited by me.
We do not take a position on the likelihood of loss of control.
This seems worth taking a position on, the relevant people need to hear from the experts an unfiltered stance of “this is a real and perhaps very likely risk”.
Agree that takeoff speeds are more important, and expect that FrontierMath has much less affect on takeoff speed. Still think timelines matter enough that the amount of relevantly informing people that you buy from this is likely not worth the cost, especially if the org is avoiding talking about risks in public and leadership isn’t focused on agentic takeover, so the info is not packaged with the info needed for that info to have the effects which would help.
Evaluating the final model tells you where you got to. Evaluating many small models and checkpoints helps you get further faster.
Even outside of the arguing against the Control paradigm, this post (esp. The Model & The Problem & The Median Doom-Path: Slop, not Scheming) cover some really important ideas, which I think people working on many empirical alignment agendas would benefit from being aware of.
One neat thing I’ve explored is learning about new therapeutic techniques by dropping a whole book into context and asking for guiding phrases. Most therapy books do a lot of covering general principles of minds and how to work with them, with the unique aspects buried in a way which is not super efficient for someone who already has the universal ideas. Getting guiding phrases gives a good starting point for what the specific shape of a technique is, and means you can kinda use it pretty quickly. My project system prompt is:
Given the name of, and potentially documentation on, an introspective or therapeutic practice, generate a set of guiding phrases for facilitators. These phrases should help practitioners guide participants through deep exploration, self-reflection, and potential transformation. If you don’t know much about the technique or the documentation is insufficient, feel free to ask for more information. Please explain what you know about the technique, especially the core principles and things relevant to generating guiding phrases, first.
Consider the following:
Understand the practice’s core principles, goals, and methods.
Create open-ended prompts that invite reflection and avoid simple yes/no answers.
Incorporate awareness of physical sensations, emotions, and thought patterns.
Develop phrases to navigate unexpected discoveries or resistances.
Craft language that promotes non-judgmental observation of experiences.
Generate prompts that explore contradictions or conflicting beliefs.
Encourage looking beyond surface-level responses to deeper insights.
Help participants relate insights to their everyday lives and future actions.
Include questions that foster meta-reflection on the process itself.
Use metaphorical language when appropriate to conceptualize abstract experiences.
Ensure phrases align with the specific terminology and concepts of the practice.
Balance providing guidance with allowing space for unexpected insights.
Consider ethical implications and respect appropriate boundaries.Aim for a diverse set of phrases that can be used flexibly throughout the process. The goal is to provide facilitators with versatile tools that enhance the participant’s journey of self-discovery and growth.
Example (adapt based on the specific practice):“As you consider [topic], what do you notice in your body?”
“If that feeling had a voice, what might it say?”
”How does holding this belief serve you?”
“What’s alive for you in this moment?”
”How might this insight change your approach to [relevant aspect of life]?”Remember, the essence is to create inviting, open-ended phrases that align with the practice’s core principles and facilitate deep, transformative exploration.
Please store your produced phrases in an artefact.
oh yup, sorry, I meant mid 2026, like ~6 months before the primary proper starts. But could be earlier.