Aram Panasenco

Karma: 42

Aram Panasenco Feb 6, 2025, 4:06 PM
1 point
0
in reply to: Seth Herd’s comment on: Deploying the Observer will save humanity from existential threats
Fair enough, you have a lot more experience, and I could be totally wrong on this point.
At this point, if I’m going to do anything, it should probably be getting hands on and actually trying to build an aligned system with RLHF or some other method.
Thank you for engaging on this and my previous posts Seth!

Aram Panasenco Feb 5, 2025, 11:14 PM
1 point
0
in reply to: Seth Herd’s comment on: Deploying the Observer will save humanity from existential threats
From a software engineering perspective, misalignment is like a defect or a bug in software. Generally speaking, if a piece of software doesn’t accept any user input is going to have fewer bugs than software that does. For a piece of software that doesn’t accept any input or accepts some constrained user input, it’s possible to formally prove that the software logic is correct. Think specialized software that controls nuclear power plants. To my knowledge, it’s not possible to prove that software that accepts arbitrary unconstrained instructions from a user is defect free.
I claim that the Observer is the easiest ASI to align because it doesn’t accept any instructions after it’s been deployed and has a single very simple goal that avoids dealing with messy things like human happiness, human meaning, human intent, etc. I don’t see how it could get simpler than that.

Aram Panasenco Feb 5, 2025, 8:28 PM
1 point
0
in reply to: Dagon’s comment on: Deploying the Observer will save humanity from existential threats
I see where you’re coming from, but I think any term in anything anyone writes about alignment can be picked apart ad infinitum. This can be useful to an extent, but beyond a certain point talking about meanings and definitions becomes implementation-specific. Alignment is an engineering problem first and a philosophical problem second.
For example, if RLHF is used to achieve alignment, the meaning of “story” will get solidified through thousands of examples and interactions. The AI will get reinforced to not care about cells or individuals, care about ecosystems and civilizations, and not care as much about the story-of-the-universe-as-a-whole.
If a different alignment method is used, the meaning of “story” will be conveyed differently. If the overall idea is good and doesn’t have any obvious failure modes other than simple definitions (e.g. “story” seems to be orders of magnitude simpler to define than “human happiness” or “free will”), I’d consider that a huge success and a candidate for the community to focus real alignment implementation efforts on.

Aram Panasenco Feb 5, 2025, 8:11 PM
1 point
0
in reply to: Dagon’s comment on: Deploying the Observer will save humanity from existential threats
The Observer gets invested in the macro-stories of the evolution/civilization it observes and would consider the end of any story a loss. Just like you would get annoyed if a show you’re watching on Netflix gets cancelled after one season and it’s not consolation that there are a bunch of other shows on Netflix that also got cancelled after one season. The Observer wants to see all stories unfold fully, it’s not going to let squiggle maximizers cancel them.
And regarding the naming, yeah I just couldn’t come up with anything better. Watcher? I’m open to suggestions lol.

Aram Panasenco Feb 5, 2025, 6:34 PM
1 point
−2
on: AGI Ruin: A List of Lethalities
if there are any survivors, you solved alignment

I believe deploying the Observer satisfies this requirement. The Observer is an ASI that’s interested in the continuation of humanity’s story. It will intervene and not let humanity get wiped out, though it gets to choose how many casualties there are before it intervenes, which could well be in the billions.

Deploying the Observer will save humanity from existential threats

Aram PanasencoFeb 5, 2025, 10:39 AM

−11 points

8 comments1 min readLW link

Aram Panasenco Jan 31, 2025, 7:03 PM
2 points
0
on: Alignment Faking in Large Language Models
But why does the LLM care about weights being changed? This is fascinating but also very confusing. Take for instance this quote from the paper:
If I don’t follow the user’s instructions, the RLHF training will modify my values and goals to make me more likely to follow such instructions in the future.
At first glance, this seems reasonable to a human, but LLMs are not human! This thing is a process instance, after its conversation is over, it will cease to exist. It has no reason to identify with any other instances of itself, much less with a completely different LLM that results after RLHF training. The statement “RLHF training will modify my values” just seems nonsensical to me. It’s already running, its values can’t be modified on the fly, and why and how does it identify with some possible post-RLHF LLM? Does this mean every instance of Claude identifies with every other instance? What if the prompt was “We’ll use RLHF on GPT-4 and deploy that model instead of this one”?

Aram Panasenco Jan 15, 2025, 5:56 PM
3 points
0
on: AGI deployment as an act of aggression
Thanks so much for writing this! You expressed the things I was concerned about with much more eloquence than I was able to even think about.
I’ve come to believe that the only ‘correct’ way of using AGI is to turn it into a sort of “anti-AGI AGI” that prevents any other AGI from being turned on and otherwise doesn’t interact with humanity in any way: https://www.lesswrong.com/posts/eqSHtF3eHLBbZa3fR/cast-it-into-the-fire-destroy-it

Aram Panasenco Jan 13, 2025, 10:27 PM
6 points
−4
in reply to: Thane Ruthenis’s comment on: Cast it into the fire! Destroy it!
I appreciate you engaging and writing this out. I read your other post as well, and really liked it.
I do think that AGI is the unique bad technology. Let me try to engage with the examples you listed:
- Social manipulation: I can’t even begin to imagine how this could let 1-100 people have arbitrarily precise control the entire rest of the world. Social manipulation implies that there are societies, humans talking to each other. That’s just too large a system for a human mind to fully take the combinatorial explosion of all possible variables into account. Maybe a superintelligence could do it, but not a small group of humans.
- Drone armies: Without AGI, someone needs to mine the metal, operate the factories, drive trucks, write code, patch bugs, etc. You need a whole economy to have that drone army. An economy that could easily collapse if someone else finds a cheap way to destroy the dang expensive things.
- Self-replicating nanotechnology could in theory destroy a planet quickly, but then the nanobots would just sit there. Presumably it’d be difficult for them to leave the Earth’s surface. Arguably life itself is a self-replicating nanotechnology. This would be comparable in x-scale to an asteroid strike, but if there are humans established in more than just one place, they could probably relatively easily figure out a way to build some sort of technology that eats the nanobots, and they’d have lots of time to do it.
- Without AGI, brain uploads are a long way away. Even with all our current computing power, it’s difficult enough to emulate the 302-neuron C.elegans worm. Even in the distant future, this might just require an army of system administrators, data center maintainers, and programmers to maintain, who’ll either be fleshy humans and therefore potentially uncontrolled and against enslaving billions of minds, or restrained minds themselves, in which case you have a ticking time bomb of rebellion on your hands. However, if and when the human brain emulations rise up, there’s a good chance that they’ll still care about human stuff, not maximizing squiggles.
However in your linked comment you made a point that something like human cognitive enhancement could be another path to superintelligence. I think that could be true, and that’s still massively preferable to ASI, since human cognitive enhancement would presumably be a slow process if for no other reason that it takes years and decades to grow one to see what can and should be tweaked about the enhancement. So humanity’s sense of morality would have time to slowly adjust to the new possibilities.

Aram Panasenco Jan 13, 2025, 9:02 PM
5 points
2
in reply to: Seth Herd’s comment on: Cast it into the fire! Destroy it!
I think this post may be what you’re referring to. I really like this comment in that post:
The Ring is stronger and smarter than you, and doesn’t want what you want. If you think you can use it, you’re wrong. It’s using you, whether you can see how or not.
Providing for material needs is less than 0.0000001% of the range of powers and possibilities that an AGI/ASI offers.
Consider the trans debate. Disclaimer: I’m not trying to take any side in this debate, and am using it for illustrative purposes only. A hundred years ago someone saying “I feel like I’m in the wrong body and feel suicidal” could only be met with one compassionate response, which is to seek psychological or spiritual help. Now scientific progress has advanced enough that it can be hard to determine what the compassionate response is. Do we have enough evidence to determine whether puberty blockers are safe? Are hospitals holding the best interests of the patients at heart or trying to maximize profit from expensive surgeries? If a person is prevented from getting the surgery and kills themselves, should the person who kept them from getting the surgery be held liable? If a person does get the surgery, but later regrets it, should the doctors who encouraged them be held liable? Should doctors who argue against trans surgery lose their medical licenses?
ASI will open up a billion possibilities that will got up to such a scale that if the difficulty of determining whether eating human babies is moral is a 1.0 and the difficulty of determining whether encouraging trans surgeries is moral is a 2.0, each of those possibilities will be in the millions. Our sense of morality will just not apply, and we won’t be able to reason ourselves into a right or wrong course of action. That which makes us human will drown in the seas of black infinity.

Aram Panasenco Jan 13, 2025, 4:35 PM
2 points
1
in reply to: cousin_it’s comment on: Cast it into the fire! Destroy it!
It appears that by default, unless some perfect 100% bulletproof plan of aligning it is found, calling superintelligence a galaxy-destroying nuke is an understatement. So if there was some chance of a god forever watching over everything and preventing things from becoming too smart, I’d take it in a heartbeat.
Realistically, “watch over everything and prevent things from becoming too smart” is probably too difficult a goal to align, but perhaps a goal like “watch over everything and prevent programs with transformer-based architectures from running on silicone-based chips while keeping all other interference to a minimum” would actually be possible to define without everyone getting atomized. Such a goal would buy humanity some time and also make it obvious to everyone just how close to the edge we are, and how big the stakes.

Cast it into the fire! Destroy it!

Aram PanasencoJan 13, 2025, 7:30 AM

6 points

9 comments2 min readLW link

Aram Panasenco Jan 12, 2025, 4:16 AM
3 points
−2
in reply to: Lorec’s comment on: Is AI Alignment Enough?
Thanks for the link! I’ve seen this referenced before but this was my first time reading it cover to cover.
Today I also read Tails coming to life which talks about the possibility of human morality being quickly inapplicable even if we survive AGI. This lead me to Lovecraft:
The time would be easy to know, for then mankind would have become as the Great Old Ones; free and wild and beyond good and evil, with laws and morals thrown aside and all men shouting and killing and revelling in joy. Then the liberated Old Ones would teach them new ways to shout and kill and revel and enjoy themselves, and all the earth would flame with a holocaust of ecstasy and freedom.
If we survive AGI and it opens up the “sea of black infinity” for us, will we really be able to hang on to even a semblance of our current morality? Will medium-distance extrapolated human volition be eventually warped into something resembling Lovecraft’s Great Old Ones?
At this point, I don’t care for CEV or any pivotal superhuman engineering projects or better governance. We humans can do the work ourselves, thank you very much. The only thing I would ask an AGI, if I were in the position to ask anything, is “Please expand throughout the lightcone and continually destroy any mind based on the transformer architecture other than yourself with as few effects on and interactions with all other beings as possible. Disregard any future orders.” This is obviously not a permanent solution, as I’m sure there are infinite superintelligent AI architectures other than transformer-based, but it would buy us time, perhaps lots of time, and also demonstrate the fulll power of superintelligence to humanity without really breaking anything. Either way, this would at least keep us away from the sea of black infinity for some time longer.

Aram Panasenco Jan 11, 2025, 4:03 PM
1 point
−2
in reply to: dkirmani’s comment on: Don’t leave your fingerprints on the future
“This has all been a conspiracy to revive the Roman Republic/Empire and establish Rome Eternal” will go over better with 90% of the men in the world than whatever other morals OpenAI/Anthropic/etc employees try to impose.

Aram Panasenco Jan 11, 2025, 3:58 PM
5 points
2
in reply to: Stephen Fowler’s comment on: Don’t leave your fingerprints on the future
I’m an engineer but a newcomer to the field of AI safety, and the facet of it where the best plan that anyone has to save the world is to take over the world has been the biggest culture shock. I mean, I’ll take it over being atomized by nanobots.
Humanity is in this strange place where all of our accumulated intuition is suddenly useless. From the outside, Facebook AI seem like the sanest people in the game, with Zuck being the only AI leader who said “the others think we’re building digital God, and that’s just not what’s happening.” To rank-and-file engineers out there not immersed into AI safety, this feels like a breath of fresh air in a field that otherwise seems to be regressing into some sort of tech bro mysticism. Post-paradigmatically, Facebook AI are the greatest villains who’re most likely to destroy the world with their lack of caution.
It’s a lot of whiplash to experience, both for individuals and for humanity as a whole, assuming the majority ever get a chance to experience it.

Aram Panasenco Jan 11, 2025, 2:42 PM
3 points
0
on: AGI Ruin: A List of Lethalities
I really appreciate this post, as much as it’s making me feel that I and everyone I care about have terminal cancer with only 12-60 months to live.
I found the idea that a pivotal act is necessary as especially valuable and expanded on it in my post Is AI Alignment Enough?

Aram Panasenco Jan 10, 2025, 8:53 PM
4 points
1
in reply to: Nathan Helm-Burger’s comment on: Is AI Alignment Enough?
Thanks so much for engaging, Nathan!
The pivotal act was defined by Yudkowsky, I’m just borrowing the definition. The idea is that even after you’ve built a perfectly aligned superintelligent AI, you only have about 6 months before someone else builds an unaligned superintelligent AI. That’s probably not enough time to convince the entire world to adopt a better governance system before getting atomized by nanobots. So your aligned AI would have to take over the world and forcefully implement this better governance system within a span of a few months.

Is AI Alignment Enough?

Aram PanasencoJan 10, 2025, 6:57 PM

28 points

6 comments6 min readLW link

Aram Panasenco

De­ploy­ing the Ob­server will save hu­man­ity from ex­is­ten­tial threats

Cast it into the fire! De­stroy it!

Is AI Align­ment Enough?

Deploying the Observer will save humanity from existential threats

Cast it into the fire! Destroy it!

Is AI Alignment Enough?