Running Lightcone Infrastructure, which runs LessWrong and Lighthaven.space. You can reach me at habryka@lesswrong.com.
(I have signed no contracts or agreements whose existence I cannot mention, which I am mentioning here as a canary)
Running Lightcone Infrastructure, which runs LessWrong and Lighthaven.space. You can reach me at habryka@lesswrong.com.
(I have signed no contracts or agreements whose existence I cannot mention, which I am mentioning here as a canary)
My impression is that few (one or two?) of the safety people who have quit a leading lab did so to protest poor safety policies, and of those few none saw staying as a viable option.
While this isn’t amazing evidence, my sense is there have been around 6 people who quit who in-parallel to them announcing their leave called out OpenAI’s reckless attitude towards risk (at various levels of explicitness, but quite strongly in all cases by standard professional norms).
It’s hard to say that people quit “to protest safety policies”, but they definitely used their leaving to protest safety policies. My sense is almost everyone who left in the last year (Daniel, William, Richard, Steven Adler, Miles) did so with a pretty big public message.
Indeed, from an instrumental perspective, the ones that arrive at the conclusion that being maximally helpful is best at getting themselves empowered (on all tasks besides supervising copies of themselves or other AIs they are cooperating with), will be much more useful than AIs that care about some random thing and haven’t made the same update that getting that thing is best achieved by being helpful and therefore empowered. “Motivation” seems like it’s generally a big problem with getting value out of AI systems, and so you should expect the deceptively aligned ones to be much more useful (until of course, it’s too late, or they are otherwise threatened and the convergence disappears).
I mean, how do we enforce the rate limits when every request comes from a different IP?
Edit: Ah, you mean, maybe a whole joint queue. Yeah, that’s not crazy, we were thinking of doing something that prioritizes requests by users who are not first-time users, for that reason. I am a bit scared of it because it possibly just pushes the problem under the rug in a way that has large costs, but removes any feedback we get about it (because anyone who would tell us the site is behaving badly for them is now in the prioritized category, but we are missing out on growth because new users often have a bad experience).
He posted it in a bunch of private Slacks a few weeks ago. His LinkedIn is now also updated: https://www.linkedin.com/in/holden-karnofsky-75970b7
The URLS they crawl are already blocked by our robots.txt, and they are actively sending requests from thousands of different IPs with realistic randomly sampled user-agents to prevent any algorithmic blocking.
I don’t currently think it’s plausible, FWIW! Agree that there are probably substantially easier and less well-specified paths.
My sense is that a small subset of bio experts (e.g. 50) aimed at causing maximum damage would in principle be capable of building mirror bacteria (if not directly stopped[1]) and this would most likely cause the deaths of the majority of humans and would have a substantial chance of killing >95% (given an intentional effort to make the deployment as lethal as possible).
I currently think this is false (unless you have a bunch of people making repeated attempts after seeing it only kill a small-ish number of people). I expect mirror bacteria thing to be too hard, and to ultimately be counteractable.
I think many people have given you feedback. It is definitely not because of “strategic messaging”. It’s because you keep making incomprehensible arguments that don’t make any sense and then get triggered when anyone tries to explain why they don’t make sense, while making statements that are wrong with great confidence.
As is, this is dissatisfying. On this forum, I’d hope[1] there is a willingness to discuss differences in views first, before moving to broadcasting subjective judgements[2] about someone.
People have already spent many hours giving you object-level feedback on your views. If this still doesn’t meet the relevant threshold for then moving on and discussing judgements, then basically no one can ever be judged (and as such our community would succumb to eternal september and die).
Sorry for the downtime. Another approximate Ddos/extremely aggressive crawler. We are getting better at handling these, but this one was another 10x bigger than previous ones, and so kicked over a different part of our infrastructure.
Do you think people are avoiding AISC because of Remmelt? I’d be surprised if that was a significant effect.
Absolutely, I have heard at least 3-4 conversations where I’ve seen people consider AISC, or talked about other people considering AISC, but had substantial hesitations related to Remmelt. I certainly would recommend someone not participate because of Remmelt, and my sense is this isn’t a particularly rare opinion.
I currently would be surprised if I could find someone informed who I have an existing relationship with for whom it wouldn’t be in their top 3 considerations on whether to attend or participate.
I have heard from many people near AI Safety camp that they also have judged AI safety camp to have gotten worse as a result of this. I think there was just a distribution shift, and now it makes sense to judge the new distribution.
Separately, it matters who is in a position to shape the culture and trajectory of a community.
I think there is a track record for the last few safety camps since Remmelt went off the deep end, and it is negative (not purely so, and not with great confidence, I am just trying to explain why I don’t think there is a historical track record thing that screens off the personalities of the people involved).
I think “stop AI” is pretty reasonable and good, but I agree that Remmelt seems kind of like he has gone off the deep end and that is really the primary reason why I am not supporting AI Safety camp. I would consider filling the funding gap myself if I hadn’t seen this happen.
My best guess is AISC dying is marginally good, and someone else will hopefully pick up a similar mantle.
(I am not a huge fan of this post, but I think it’s reasonable for people to care about how society orients towards x-risk and AI concerns, and as such to actively want to not screen off evidence, and take responsibility for what people affiliated with you say on the internet. So I don’t think this is great advice.
I am actively subscribed to lots of people who I expect to say wrong and dumb things, because it’s important to me that I correct people and avoid misunderstandings, especially when someone might mistake my opinion for the opinion of the people saying dumb stuff)
My sense is political staffers and politicians aren’t that great at predicting their future epistemic states this way, and so you won’t get great answers for this question. I do think it’s a really important one to model!
I think the core argument is “if you want to slow down, or somehow impose restrictions on AI research and deployment, you need some way of defining thresholds. Also, most policymaker’s cruxes appear to be that AI will not be a big deal, but if they thought it was going to be a big deal they would totally want to regulate it much more. Therefore, having policy proposals that can use future eval results as a triggering mechanism is politically more feasible, and also, epistemically helpful since it allows people who do think it will be a big deal to establish a track record”.
I find these arguments reasonably compelling, FWIW.
Thank you!
For lightcone’s contact details I asked on LW intercom. Feels rude to put someone’s phone number here, so if you’re doing the same as me, I’m not gonna save you that step.
Reasonable prior, but my phone number is already publicly visible at the top of this post, so feel free to share further.
making sure there are really high standards for safety and that there isn’t going to be danger what these AIs are doing
Ah yes, a great description of Anthropic’s safety actions. I don’t think anyone serious at Anthropic believes that they “made sure there isn’t going to be danger from these AIs are doing”. Indeed, many (most?) of their safety people assign double-digits probabilities to catastrophic outcomes from advanced AI system.
I do think this was a predictable quite bad consequence of Dario’s essay (as well as his other essays which heavily downplay or completely omit any discussion of risks). My guess is it will majorly contribute to reckless racing while giving people a false impression of how good we are doing on actually making things safe.
Yep, when the fundraising post went live, i.e. November 29th.
Those were mostly already in-flight, so not counterfactual (and also the fundraising post still has the donation link at the top), but I do expect at least some effect!
Hmm, it seems bad to define “veganism” as something that has nothing to do with dietary choice. I.e. this would make someone who donates to effective animal welfare charities more “vegan”, since many animal welfare charities are order of magnitudes more effective with a few thousand dollars than what could be achieved by any personal dietary change.