I have signed no contracts or agreements whose existence I cannot mention.
plex
Also I suggest that given the number of tags in each section, load more should be load all.
This is awesome! Three comments:
Please make an easy to find Recent Changes feed (maybe a thing on the home page which only appears if you’ve made wiki edits). If you want an editor community, that will be their home, and the thing they’re keeping up with and knowing to positively reinforce each other.
The concepts portal is now a slightly awkward mix of articles and tags, with potentially very high use tags being quite buried because no one’s written a good article for it (e.g Rationality Quotes has 136 pages tagged, but zero karma, so requires many clicks to reach). I’m especially thinking about the use case of wanting to know what types of articles there are to browse around. I’m not sure exactly what to do about this.. maybe having the sorting not be just about karma, but a mix of karma and number of tagged posts? Like (k+10)*(t+10) or something? Disadvantage is this is opaque and drops alphabetical much harder.
A bunch of the uncategorized ones could be categorized, but I’m not seeing a way to do this with normal permissions.
Adjusting 2 would make it much cleaner to categorize the many ones in 3 without that clogging up the normal lists.
Nice! I’ll watch through these then probably add a lot of them to the aisafety.video playlist.
I’ve heard from people I trust that:
They can be pretty great, if you know what you want and set the prompt up right
They won’t be as skilled as a human therapist, and might throw you in at the deep end or not be tracking things a human would
Using them can be very worth it as they’re always available and cheap, but they require a little intentionality. I suggest asking your human therapist for a few suggestions of kinda of work you might do with a peer or LLM assistant, and monitoring how it affects you while exploring, if you feel safe enough doing that. Maybe do it the day before a human session the first few times so you have a good safety net. Maybe ask some LWers what their system prompts are, or find some well-tested prompts elsewhere.
Looks like Tantrix:
oh yup, sorry, I meant mid 2026, like ~6 months before the primary proper starts. But could be earlier.
Yeah, this seems worth a shot. If we do this, we should do our own pre-primary in like mid 2027 to select who to run in each party, so that we don’t split the vote and also so that we select the best candidate.
Someone I know was involved in a DIY pre-primary in the UK which unseated an extremely safe politician, and we’d get a bunch of extra press while doing this.
Humans without scaffolding can do a very finite number of sequential reasoning steps without mistakes. That’s why thinking aids like paper, whiteboards, and other people to bounce ideas off and keep the cache fresh are so useful.
With a large enough decisive strategic advantage, a system can afford to run safety checks on any future versions of itself and anything else it’s interacting with sufficient to stabilize values for extremely long periods of time.
Multipolar worlds though? Yeah, they’re going to get eaten by evolution/moloch/power seeking/pythia.
More cynical take based on the Musk/Altman emails: Altman was expecting Musk to be CEO. He set up a governance structure which would effectively be able to dethrone Musk, with him as the obvious successor, and was happy to staff the board with ideological people who might well take issue with something Musk did down the line to give him a shot at the throne.
Musk walked away, and it would’ve been too weird to change his mind on the governance structure. Altman thought this trap wouldn’t fire with high enough probability to disarm it at any time before it did.
I don’t know whether the dates line up to dis-confirm this, but I could see this kind of 5d chess move happening. Though maybe normal power and incentive psychological things are sufficient.
Looks fun!
I could also remove Oil Seeker’s protection from Pollution; they don’t need it for making Black Chips to be worthwhile for them but it makes that less of an amazing deal than it is.
Maybe have the pollution cost halved for Black, if removing it turns out to be too weak?
Seems accurate, though I think Thinking This Through A Bit involved the part of backchaining where you look at approximately where on the map the destination is, and that’s what some pro-backchain people are trying to point at. In the non-metaphor, the destination is not well specified by people in most categories, and might be like 50 ft in the air so you need a way to go up or something.
And maybe if you are assisting someone else who has well grounded models, you might be able to subproblem solve within their plan and do good, but you’re betting your impact on their direction. Much better to have your own compass or at least a gears model of theirs so you can check and orient reliably.
PS: I brought snacks!
Give me a dojo.lesswrong.com, where the people into mental self-improvement can hang out and swap techniques, maybe a meetup.lesswrong.com where I can run better meetups and find out about the best rationalist get-togethers. Let there be an ai.lesswrong.com for the people writing about artificial intelligence.
Yes! Ish! I’d be keen to have something like this for the upcoming aisafety.com/stay-informed page, where we’re looking like we’ll currently resort to linking to https://www.lesswrong.com/tag/ai?sortedBy=magic#:~:text=Posts%20tagged%20AI as there’s no simpler way to get people specifically to the AI section of the site.
I’d weakly lean towards not using a subdomain, but to using a linkable filter, but yeah seems good.
I’d also think that making it really easy and fluid to cross-post (including selectively, maybe the posts pop up in your drafts and you just have to click post if you don’t want everything cross-posted) would be a pretty big boon for LW.
I’m glad you’re trying to figure out a solution. I am however going to shoot this one down a bunch.
If these assumptions were true, this would be nice. Unfortunately, I think all three are false.
LLMs will never be superintelligent when predicting a single token.
In a technical sense, definitively false. Redwood compared human to AI token prediction and even early AIs were far superhuman. Also, in a more important sense, you can apply a huge amount of optimization on selecting a token. This video gives a decent intuition, though in a slightly different setting.
LLMs will have no state.
False in three different ways. Firstly, people are totally building in explicit state in lots of ways (test time training, context retrieval, reasoning models, etc). Secondly, there’s a feedback cycle of AI influences training data of next AI, which will become a tighter and tighter loop. Thirdly, the AI can use the environment as state in ways which would be nearly impossible to fully trace or mitigate.
not in a way that any emergent behaviour of the system as a whole isn’t reflected in the outputs of any of the constituent LLMs
alas, many well-understood systems regularly do and should be expected to have poorly understood behaviour when taken together.
a simpler LLM to detect output that looks like it’s leading towards unaligned behaviour.
Robustly detecting “unaligned behaviour” is an unsolved problem, if by aligned you mean “makes the long term future good” rather than “doesn’t embarrass the corporation”. Solving this would be massive progress, and throwing a LLM at it naively has many pitfalls.
Stepping back, I’d encourage you to drop by AI plans, skill up at detecting failure modes, and get good at both generating and red-teaming your own ideas (the Agent Foundations course and Arbital are good places to start). Get a long list of things you’ve shown how they break, and help both break and extract the insights from other’s ideas.
the extent human civilization is human-aligned, most of the reason for the alignment is that humans are extremely useful to various social systems like the economy, and states, or as substrate of cultural evolution. When human cognition ceases to be useful, we should expect these systems to become less aligned, leading to human disempowerment.
oh good, I’ve been thinking this basically word for word for a while and had it in my backlog. Glad this is written up nicely, far better than I would likely have done :)
The one thing I’m not a big fan of: I’d bet “Gradual Disempowerment” sounds like a “this might take many decades or longer” to most readers, whereas with capabilities curves this could be a few months to single digit years thing.
I think I have a draft somewhere, but never finished it. tl;dr; Quantum lets you steal private keys from public keys (so all wallets that have a send transaction). Upgrading can protect wallets where people move their coins, but it’s going to be messy, slow, and won’t work for lost-key wallets, which are a pretty huge fraction of the total BTC reserve. Once we get quantum BTC at least is going to have a very bad time, others will have a moderately bad time depending on how early they upgrade.
Nice! I haven’t read a ton of Buddhism, cool that this fits into a known framework.
I’m uncertain of how you use the word consciousness here do you mean our blob of sensory experience or something else?
Yeah, ~subjective experience.
Let’s do most of this via the much higher bandwidth medium of voice, but quickly:
Yes, qualia[1] is real, and is a class of mathematical structure.[2]
(placeholder for not a question item)
Matter is a class of math which is ~kinda like our physics.
Our part of the multiverse probably doesn’t have special “exists” tags, probably everything is real (though to get remotely sane answers you need a decreasing reality fluid/caring fluid allocation).
Math, in the sense I’m trying to point to it, is ‘Structure’. By which I mean: Well defined seeds/axioms/starting points and precisely specified rules/laws/inference steps for extending those seeds. The quickest way I’ve seen to get the intuition for what I’m trying to point at with ‘structure’ is to watch these videos in succession (but it doesn’t work for everyone):
- ^
experience/the thing LWers tend to mean, not the most restrictive philosophical sense (#4 on SEP) which is pointlessly high complexity (edit: clarified that this is not the universal philosophical definition, but only one of several meanings, walked back a little on rhetoric)
- ^
possibly maybe even the entire class, though if true most qualia would be very very alien to us and not necessarily morally valuable
give up large chunks of the planet to an ASI to prevent that
I know this isn’t your main point but.. That isn’t a kind of trade that is plausible. Misaligned superintelligence disassembles the entire planet, sun, and everything it can reach. Biological life does not survive, outside of some weird edge cases like “samples to sell to alien superintelligences that like life”. Nothing in the galaxy is safe.
I had a similar experience a couple years back when running bio anchors with numbers which seemed more reasonable/less consistently slanted towards longer timelines to me, getting:
before taking into account AI accelerating AI development, which I expected to bring it a few years earlier.