I have signed no contracts or agreements whose existence I cannot mention.
plex
Responded I think resolving this with: Managed vs Unmanaged Agency
(DMed my top recommendation, someone who used to have pretty bad OCD, helped resolve someone else’s and mostly their own, and is full time doing x-risk reduction work)
I think @Eli Tyre kinda got it from one example in 2023. (second comment)
It’s Pythia, the patters that are predicted to be more effective power-seekers being selected for in the present, causing the entire system to side towards productivity maximizing at the expense of all other values.
At some point AI text gets good enough at persuasion to be actively harmful to read, just as reading text by highly manipulative people can be harmful to read. Sometime before that it pings your memetic immune system a bunch and most people read this as vaguely wanting to avoid taking in LLM text. I think we might be entering the latter phase. Not confident, but we’ll get there at some point and they do seem at about the relevant capability level in some other domains.
Yeah, this one does feel more memetically powerful in some ways, but something like less collaborative. Agree we’d probably want the pair.
I agree that’s a better ontology, this was the post I could write fast as a patch, looking forward to yours!
I might read your half baked ones and be up for writing it coauthoring the real thing is you want.
Edit: Added a note to the main post pointing at this.
It may seem unreasonable within the current paradigm, but I think it’s necessary to reach if we get strong superintelligence. You need to have a system that you can’t make destroy the entire system if you want the whole system to remain undestroyed indefinitely.
You’re that I didn’t explain why each framework fails to plausibly scale to very strong models, maybe that’s also worth it’s own post, because there are a lot and each have limits that you need to go a bit into the weeds to see.
I responded with a post: Product Alignment is not Superintelligence Alignment (and we need the latter to survive)
Simply put, I think alignment as used here and many other places is a conflation of two quite separate tech trees.
Product Alignment is not Superintelligence Alignment (and we need the latter to survive)
Under the Managed vs Unmanaged Agency frame (which I think replaces instrumental vs terminal with a conceptual split that fits reality better), I agree.
Related(ish): Should You Reverse Any Advice You Hear
Startup Lessons for AI Safety
At each timestep there is some % chance of value decay. Either that % chance falls rapidly, and the value is stable, or or does not fall rapidly, and after some timesteps you should expect values to be decayed.
why would that lead to extinction by default and not the kind of collapse when data centers stop operating before hunter gatherers?
Humanoid robots are pretty near-term: https://fxtwitter.com/KyberLabsRobots/status/2036127368088080867
There’s a few arguments here
why cannot an immune system work on that level?
It can, but needs either is robust and stable, or falls probalistically after some time.
a cancer can kill a human, but not the humanity. but by default any internal subsystem of most imagined societies that will exist is going to kill all of societies globally, not just their local hosts?
Humans have boundaries between them that cancer can’t generally cross (and when it does the species has a very bad time).
why no warning signs—do you see humanity on a trajectory of not using still-stupid LLM based systems in autonomous weapons, never deploying any system that’s too weak to kill everyone while powerful to cause major harm while also brewing up a system that’s powerful enough to escape and self-preserve and gray-goo itself on its first try?
I somewhat update on no warning signs with Moltbook, but I mostly expect that humanity won’t react anything like strongly enough to them.
Most people in this sphere have some form of impostor syndrome. I mostly don’t these days, because I’ve seen enough of the world to know that the person most people feel like they need to impersonate doesn’t exist, it’s all a bunch of messy humans with skills and weaknesses, and no one really has this situation under control.
Let’s do the best we can as flawed humans.
It starts at sudden death of yourself and everyone else, the destruction of earth and extinction of all biological life, and a sphere of darkness eating nearby stars, and gets worse from there.