Jeffrey Ladish

Karma: 1,979

Jeffrey Ladish Feb 24, 2023, 10:05 AM
8 points
−1
on: AGI systems & humans will both need to solve the alignment problem
Oh it occurs to me some of the original thought train that led me here may have come from @Ezra Newman
https://twitter.com/EzraJNewman/status/1628848563211112448

Jeffrey Ladish Feb 24, 2023, 10:04 AM
3 points
1
in reply to: aleph_four’s comment on: AGI systems & humans will both need to solve the alignment problem
Yeah it seems possible that some AGI systems would be willing to risk value drift, or just not care that much. In theory you could have an agent that didn’t care if its goals changed, right? Shoshannah pointed out to me recently that humans have a lot of variance in how much they care if they’re goals are changed. Some people are super opposed to wireheading, some think it would be great. So it’s not obvious to me how much ML-based AGI systems of around human level intelligence would care about this. Like maybe this kind of system converges pretty quickly to coherent goals, or maybe it’s the kind of system that can get quite a bit more powerful than humans before converging, I don’t know how to guess at that.

AGI systems & humans will both need to solve the alignment problem

Jeffrey LadishFeb 24, 2023, 3:29 AM

59 points

14 comments4 min readLW link

Jeffrey Ladish Feb 22, 2023, 9:21 PM
3 points
1
in reply to: Vincent Fagot’s comment on: Vincent Fagot’s Shortform
I think that would be a really good thing to have! I don’t know if anything like that exists, but I would love to see one

Jeffrey Ladish Feb 22, 2023, 9:20 PM
4 points
on: landfish’s Shortform
I think the AI situation is pretty dire right now. And at the same time, I feel pretty motivated to pull together and go out there and fight for a good world / galaxy / universe.
Nate Soares has a great post called “detach the grim-o-meter”, where he recommends not feeling obligated to feel more grim when you realize world is in deep trouble.
It turns out feeling grim isn’t a very useful response, because your grim-o-meter is a tool evolved for you to use to respond to things being harder *in your local environment* rather than the global state of things.
So what do you do when you find yourself learning the world is in a dire state? I find that a thing that helps me is finding stories that match the mood of what I’m trying to do, like Andy Weir’s The Martian.
You’re trapped in a dire situation and you’re probably going to die, but perhaps if you think carefully about your situation, apply your best reasoning and engineering skills, you might grow some potatoes, ducktape a few things together, and use your limited tools to escape an extremely tricky situation.
In real life the lone astronaut trapped on Mars doesn’t usually make it. I’m not saying to make up fanciful stories that aren’t justified by the evidence. I’m saying, be that stubborn bastard that *refuses to die* until you’ve tried every last line of effort.
I see this as one of the great virtues of humanity. We have a fighting spirit. We are capable of charging a line of enemy swords and spears, running through machine gun fire and artillery even though it terrifies us.
No one gets to tell you how to feel about this situation. You can feel however you want. I’m telling you how I want to feel about this situation, and inviting you to join me if you like.
Because I’m not going to give up. Neither am I going to rush to foolhardy action that will make things worse. I’m going to try to carefully figure this out, like I was trapped on Mars with a very slim chance of survival and escape.
Perhaps you, like me, are relatively young and energetic. You haven’t burnt out, and you’re interested in figuring out creative solutions to the most difficult problems of our time. Well I say hell yes, let’s do this thing. Let’s actually try to figure it out. 🔥
Maybe there is a way to grow potatoes using our own shit. Maybe someone on earth will send a rescue mission our way. Lashing out in panic won’t improve our changes, giving up won’t help us survive. The best shot we have is careful thinking, pressing forward via the best paths we can find, stubbornly carrying on in the face of everything.
And unlike Mark Watney, we’re not alone. When I find my grim-o-meter slipping back to tracking the dire situation, I look around me and see a bunch of brilliant people working to find solutions the best they can.
So welcome to the hackathon for the future of the lightcone, grab some snacks and get thinking. When you zoom in, you might find the problems are actually pretty cool.
Deep learning actually works, it’s insane. But how does it work? What the hell is going on in those transformers and how does something as smart of ChatGPT emerge from that?? Do LLMs have inner optimizers? How do we find out?
And on that note, I’ve got some blog posts to write, so I’m going to get back to it. You’re all invited to this future-lightcone-hackathon, can’t wait to see what you come up with! 💡
What links here?
- jam_brand's comment on Writer’s Shortform by Writer (Mar 21, 2023, 2:16 AM; 2 points)

Jeffrey Ladish Jan 5, 2023, 5:53 AM
2 points
0
in reply to: Zach Stein-Perlman’s comment on: When you plan according to your AI timelines, should you put more weight on the median future, or the median future | eventual AI alignment success? ⚖️
I sort of agree with this abstractly and disagree on practice. I think we’re just very limited in what kinds of circumstances we can reasonably estimate / guess at. Even the above claim, “a big proportion of worlds where we survived, AGI probably gets delayed” is hard to reason about.

But I do kind of need the know the timescale I’m operating in when thinking about health and money and skill investments, etc. so I think you need to reason about it somehow.

Jeffrey Ladish Jan 5, 2023, 5:47 AM
10 points
2
in reply to: Evenflair’s comment on: When you plan according to your AI timelines, should you put more weight on the median future, or the median future | eventual AI alignment success? ⚖️
Why did you do that?

When you plan according to your AI timelines, should you put more weight on the median future, or the median future | eventual AI alignment success? ⚖️

Jeffrey LadishJan 5, 2023, 1:21 AM

25 points

10 comments2 min readLW link

Jeffrey Ladish Oct 9, 2022, 3:15 AM
17 points
6
in reply to: Max Tegmark’s comment on: Why I think there’s a one-in-six chance of an imminent global nuclear war
The Reisner et al paper (and the back and forth between Robock’s group and Reisner’s group) casts doubt on this:

https://agupubs.onlinelibrary.wiley.com/doi/full/10.1002/2017JD027331?fbclid=IwAR0SlQ_naiKY5k27PL0XlY-3jsocG3lomUXGf3J1g8GunDV8DPNd7birz1w

Jeffrey Ladish Aug 31, 2022, 8:14 PM
4 points
0
AF
in reply to: paulfchristiano’s comment on: How likely is deceptive alignment?
And then on top of that there are significant other risks from the transition to AI. Maybe a total of more like 40% total existential risk from AI this century? With extinction risk more like half of that, and more uncertain since I’ve thought less about it.
40% total existential risk, and extinction risk half of that? Does that mean the other half is some kind of existential catastrophe / bad values lock-in but where humans do survive?

Jeffrey Ladish Aug 24, 2022, 12:37 AM
2 points
on: landfish’s Shortform
This is a temporary short form, so I can link people to Scott Alexander’s book review post. I’m putting it here because Substack is down, and I’ll take it down / replace it with a Substack link once it’s back up. (also it hasn’t been archived by Waybackmachine yet, I checked)

The spice must flow.

Edit: It’s back up, link: https://astralcodexten.substack.com/p/book-review-what-we-owe-the-future

Jeffrey Ladish Jul 14, 2022, 4:25 AM
3 points
1
in reply to: lukefreeman’s comment on: Marriage, the Giving What We Can Pledge, and the damage caused by vague public commitments
Thanks for the reply!

I hope to write a longer response later, but wanted to address what might be my main criticism, the lack of clarity about how big of a deal it is to break your pledge, or how “ironclad” the pledge is intended to be.

I think the biggest easy improvement would be amending the FAQ (or preferably something called “pledge details” or similar) to present the default norms for pledge withdrawal. People could still choose to choose different norms if they preferred, but it would make it more clear what people were agreeing to, and how strong the commitment was intended to be, without adding more text to the main pledge.

Jeffrey Ladish Jul 14, 2022, 4:20 AM
4 points
4
in reply to: Rubi J. Hudson’s comment on: On how various plans miss the hard bits of the alignment challenge
I’m a little surprised that I don’t see more discussion of ways that higher bandwidth brain-computer interfaces might help, e.g. neurolink or equivalent. Like it sounds difficult but do people feel really confident it won’t work? Seems like if it could work it might be achievable on much faster timescales than superbabies.

Jeffrey Ladish Jul 14, 2022, 3:18 AM
2 points
1
in reply to: Duncan Sabien (Deactivated)’s comment on: Marriage, the Giving What We Can Pledge, and the damage caused by vague public commitments
Oh cool. I was thinking about writing some things about private non-ironclad commitments but this covers most of what I wanted to write. :)

Jeffrey Ladish Jul 12, 2022, 12:09 AM
3 points
2
in reply to: Algon’s comment on: Marriage, the Giving What We Can Pledge, and the damage caused by vague public commitments
Admittedly the title is not super clear

Jeffrey Ladish Jul 12, 2022, 12:09 AM
8 points
11
in reply to: Ben Pace’s comment on: Marriage, the Giving What We Can Pledge, and the damage caused by vague public commitments
I cannot recommend this approach on the grounds of either integrity or safety 😅

Marriage, the Giving What We Can Pledge, and the damage caused by vague public commitments

Jeffrey LadishJul 11, 2022, 7:38 PM

98 points

27 comments6 min readLW link 1 review

Jeffrey Ladish Jul 7, 2022, 3:40 AM
3 points
0
in reply to: Dagon’s comment on: My vision of a good future, part I
Yeah, I think it’s somewhat boring without without more. Solving the current problems seems very desirable to me, very good, and also really not complete / compelling / interesting. That’s why I’m intending to try to get at in part II. I think it’s the harder part.

Jeffrey Ladish Jul 6, 2022, 8:45 PM
2 points
0
in reply to: P.’s comment on: My vision of a good future, part I
Rot13: Ab vg’f Jbegu gur Pnaqyr

My vision of a good future, part I

Jeffrey LadishJul 6, 2022, 1:23 AM

66 points

18 comments9 min readLW link

Jeffrey Ladish

AGI sys­tems & hu­mans will both need to solve the al­ign­ment problem

When you plan ac­cord­ing to your AI timelines, should you put more weight on the me­dian fu­ture, or the me­dian fu­ture | even­tual AI al­ign­ment suc­cess? ⚖️

Mar­riage, the Giv­ing What We Can Pledge, and the dam­age caused by vague pub­lic commitments

My vi­sion of a good fu­ture, part I

AGI systems & humans will both need to solve the alignment problem

When you plan according to your AI timelines, should you put more weight on the median future, or the median future | eventual AI alignment success? ⚖️

Marriage, the Giving What We Can Pledge, and the damage caused by vague public commitments

My vision of a good future, part I