Running Lightcone Infrastructure, which runs LessWrong and Lighthaven.space. You can reach me at habryka@lesswrong.com.
(I have signed no contracts or agreements whose existence I cannot mention, which I am mentioning here as a canary)
Running Lightcone Infrastructure, which runs LessWrong and Lighthaven.space. You can reach me at habryka@lesswrong.com.
(I have signed no contracts or agreements whose existence I cannot mention, which I am mentioning here as a canary)
Inviting someone to an event seems somewhat closer, though.
Yeah, in this case we are talking about “attending an event where someone you think is evil is invited to attend”, which is narrower, but also strikes me as an untenable position (e.g. in the case of the lab case, this would prevent me from attending almost any conference I can think of wanting to attend in the Bay Area, almost all of which routinely invite frontier lab employees as speakers or featured guests).
To be clear, I think it’s reasonable to be frustrated with Lightcone if you think we legitimize people who you think will misuse that legitimacy, but IMO refusing to attend any events where an organizer makes that kind of choice seems very intense to me (though of course, if someone was already considering attending an event as being of marginal value, such a thing could push you over the edge, though I think this would produce a different top-level comment).
I’m also not really sure what you’re hinting at with “I hope you also advocate for it when it’s harder to defend.” I assume something about what I think about working at AI labs? I feel like my position on that was fairly clear in my previous comment.
It’s mostly an expression of hope. For example, I hope it’s a genuine commitment that will result in you saying so, even if you might end up in the unfortunate position of updating negatively on Anthropic, or being friends and allies with lots of people at other organizations that you updated negatively on.
As a reason for this being hope instead of confidence: I do not remember you (or almost anyone else in your position) calling for people to leave their positions at OpenAI when it became more clear the organization was likely harming the world, though maybe I just missed it. I am not intending this to be some confident “gotcha”, just me hinting that people often like to do moral grandstanding in this domain without actual backing deep commitments.
To be clear, this wasn’t an intention to drag the whole topic into this conversation, but was trying to be a low-key and indirect expression of me viewing some of the things you say here with some skepticism. I don’t super want to put you on the spot to justify your whole position here, but also would have felt amiss to not give any hints of how I relate to them. So feel free to not respond, as I am sure we will find better contexts in which we can discuss these things.
I think John’s comment, in the context of this thread, was describing a level of “working with” that was in the reference class of “attending an event with” and less “working for an organization” and the usual commitments and relationship that entails, so extending it to that case feels a bit like a non-sequitur. He explicitly mentioned attending an event as the example of the kind of “working with” he was talking about, so responding to only a non-central case of it feels weird.
It is also otherwise the case that in our social circle, the position of “work for organizations that you think are very bad for the world in order to make it better” is a relatively common take (though in that case I think we two appear be in rough agreement that it’s rarely worth it), and I hope you also advocate for it when it’s harder to defend.
Given common beliefs about AI companies in our extended social circle, I think it illustrates pretty nicely why extending an attitude about association-policing that extends all the way to “mutual event attendance” would void a huge number of potential trades and opportunities for compromise and surface area to change one’s mind, and is a bad idea.
Attending an event with someone else is not “collaborating with evil”!
I think people working at frontier companies are causing vastly more harm and are much stronger candidates for being moral monsters than Cremieux is (even given his recent IMO quite dickish behavior). I think it would be quite dumb of me to ban all frontier lab employees from Lightcone events, and my guess is you would agree with this even if you agreed with my beliefs on frontier AI labs.
Many events exist to negotiate and translate between different worldviews and perspectives. LessOnline more so than most. Yes, think about when you are supporting evil, or giving it legitimacy, and it’s messy, but especially given your position at a leading frontier lab, I don’t think you would consider a blanket position of “don’t collaborate with evil” in a way that would extend as far as “attending an event with someone else” as tenable.
I was using double.finance and their backtesting tool.
Promoted to curated: Concrete examples are great. This post is a list of specific examples. Therefore this post is great.
Just kidding, but I do quite like this post. I feel like it does a good job introducing a specific useful concept handle, and explains it with a good mixture of specific examples and general definitions. It doesn’t end up too opinionated about how the concept should be used, or some political agenda, and that makes it a good reference post that I expect to link to in a relatively wide range of scenarios.
Thank you!
I have reservations about ControlAI in-particular, but also endorse this as a general policy. I think there are organizations that themselves would be more likely to be robustly trustworthy and would be more fine to link to, though I think it’s actually very hard and rare, and I would still avoid it in-general (the same way LW has a general policy of not frontpaging advertisements or job postings for specific organizations, independent of the organization)[1].
Also, I want to make sure I understand what you mean by “betraying people’s trust.” Is it something like, “If in the future ControlAI does something bad, then, from the POV of our viewers, that means that they can’t trust what they watch on the channel anymore?”
Yeah, something like that. I don’t think “does something bad” is really the category, more something like “will end up engaging with other media by ControlAI which will end up doing things like riling them up about deepfakes in a bad faith manner, i.e. not actually thinking deepfakes are worth banning but the banning of deepfake being helpful for slowing down AI progress without being transparent about that, and then they will have been taken advantage of, and then this will make a lot of coordination around AI x-risk stuff harder”.
We made an exception with our big fundraising post because Lightcone disappearing does seem of general interest to everyone on the site, but it made me sad and I wish we could have avoided it
I think it’s a great video! I do also wish it would have bound itself less to one specific organization, I feel like it would end up standing the test of time better (and be less likely to end up betraying people’s trust) if it had given a general overview on what we can do about AI risk, instead of ending with a call to action to support /join ControlAI in-particular.
I would not characterize Dustin as straightforwardly “pushing back” in the relevant comment thread, more “expressing frustration with specific misinterprations but confirming the broad strokes”. I do think he would likely take offense to some of this framing, but a lot of it is really quite close to what Dustin said himself (and my model is more that Dustin is uncomfortable owning all the implications of the things he said, though this kind of thing is hard).
I don’t currently believe this, and don’t think I said so. I do think the GV constraints are big, but also my overall assessment of the net-effect of Open Phil actions is net bad, even if you control for GV, though the calculus gets a lot messier and I am much less confident. Some of that is because of the evidential update from how they handled the GV situation, but also IMO Open Phil has made many other quite grievous mistakes.
My guess is an Open Phil that was continued to be run by Holden would probably be good for the world. I have many disagreements with Holden, and it’s definitely still a high variance situation, but I’ve historically been impressed with his judgement on many issues that I’ve seen OP mess up in recent years.
Yeah, we gotta fix something about handling the Substack formatted content. It really looks ugly sometimes, though I haven’t yet chased down when.
No, I ended up getting sick that week and other deadlines then pushed the work later. It will still happen, but maybe only closer to LessOnline (i.e. in about a month).
I agree that there is some ontological mismatch here, but I think your position is still in pretty clear conflict to what Neel said, which is what I was objecting to:
My understanding is that eg Jaime is sincerely motivated by reducing x risk (though not 100% motivated by it), just disagrees with me (and presumably you) about various empirical questions about how to go about it, what risks are most likely, what timelines are, etc.
“Not 100% motivated by it” IMO sounds like an implication that “being motivated by reducing x-risk would make up something like 30%-70% of the motivation”. I don’t think that’s true, and I think various things that Jaime has said make that relatively clear.
This is a great thread and I appreciate you both having it, and posting it here!
I am not saying Jaime in-principle could not be motivated by existential risk from AI, but I do think the evidence suggests to me strongly that concerns about existential risk from AI are not among the primary motivations for his work on Epoch (which is what I understood Neel to be saying).
Maybe it is because he sees the risk as irreducible, maybe it is because the only ways of improving things would cause collateral damage for other things he cares about. I also think it should be our dominant prior that someone is not motivated by reducing x-risk unless they directly claim they do.
(This aligns with what I intended. I feel like my comment is making a fine point, even despite having missed the specific section.)
My understanding is that eg Jaime is sincerely motivated by reducing x risk (though not 100% motivated by it), just disagrees with me (and presumably you) about various empirical questions about how to go about it, what risks are most likely
I don’t think this is true. My sense is he views his current work as largely being good on non x-risk grounds, and thinks that even if it might slightly increase x-risk, he wouldn’t think it would be worth it for him to stop working on it, since he thinks it’s unfair to force the current generation to accept a slightly higher risk of not achieving longevity escape velocity and more material wealth in exchange for a small increase in existential risk.
He says it so plainly that it seems as straightforwardly of a rejection of AI x-risk concerns that I’ve heard:
I selfishly care about me, my friends and family benefitting from AI. For some of my older relatives, it might make a big difference to their health and wellbeing whether AI-fueled explosive growth happens in 10 vs 20 years.
[...]
I wont endanger the life of my family, myself and the current generation for a small decrease of the chances of AI going extremely badly in the long term. And I don’t think it’s fair of anyone to ask me to do that. Not that it should be my place to unilaterally make such a decision anyway.
It seems very clear that Jaime thinks that AI x-risk, is unimportant relative to almost any other issue, given his non-interest in trading off x-risk against those other issues.
It is true that Jaime might think that AI x-risk could hypothetically be motivating to him, but at least my best interpretations of what is going on, suggest to me he de-facto does not consider it as an important input into his current strategic choices, or the choices of Epoch.
A lot of new user submissions these days to LW are clearly some poor person who was sycophantically encouraged by an AI to post their crazy theory of cognition or consciousness or recursion or social coordination on LessWrong after telling them their ideas are great. When we send them moderation messages we frequently get LLM-co-written responses, and sometimes they send us quotes from an AI that has evaluated their research as promising and high-quality as proof that they are not a crackpot.
Ah, indeed! I think the “consistent” threw me off a bit there and so I misread it on first reading, but that’s good.
Sorry for missing it on first read, I do think that is approximately the kind of clause I was imagining (of course I would phrase things differently and would put an explicit emphasis on coordinating with other actors in ways beyond “articulation”, but your phrasing here is within my bounds of where objections feel more like nitpicking).
Each time we go through the core loop of catching a warning sign for misalignment, adjusting our training strategy to try to avoid it, and training again, we are applying a bit of selection pressure against our bumpers. If we go through many such loops and only then, finally, see a model that can make it through without hitting our bumpers, we should worry that it’s still dangerously misaligned and that we have inadvertently selected for a model that can evade the bumpers.
How severe of a problem this is depends on the quality and diversity of the bumpers. (It also depends, unfortunately, on your prior beliefs about how likely misalignment is, which renders quantitative estimates here pretty uncertain.) If you’ve built excellent implementations of all of the bumpers listed above, it’s plausible that you can run this loop thousands of times without meaningfully undermining their effectiveness.[8] If you’ve only implemented two or three, and you’re unlucky, even a handful of iterations could lead to failure.
This seems like the central problem of this whole approach, and indeed it seems very unlikely to me that we would end up with a system that we feel comfortable scaling to superintelligence after 2-3 iterations on our training protocols. This plan really desperately needs a step that is something like “if the problem appears persistent, or we are seeing signs that the AI systems are modeling our training process in a way that suggests that upon further scaling they would end up looking aligned independently of their underlying alignment, we stop halt and advocate for much larger shifts in our training process, which likely requires some kind of coordinated pause or stop with other actors”.
Nope, I primarily use o3 these days, and have made tweaks to my system prompt, but because it’s a thinking model my use of it is a lot less conversational and so the system prompt here isn’t that helpful, and I haven’t tried to make it work with o3.