Dan H Feb 13, 2025, 12:18 AM
16 points
3
in reply to: Zach Stein-Perlman’s comment on: Zach Stein-Perlman’s Shortform

capability thresholds be vague or extremely high

xAI’s thresholds are entirely concrete and not extremely high.

evaluation be unspecified or low-quality

They are specified and as high-quality as you can get. (If there are better datasets let me know.)

I’m not saying it’s perfect, but I wouldn’t but them all in the same bucket. Meta’s is very different from DeepMind’s or xAI’s.

Dan H Feb 10, 2025, 9:28 PM
14 points
12
in reply to: Drake Thomas’s comment on: Drake Thomas’s Shortform

though I don’t think xAI took an official position one way or the other

I assumed most of everybody assumed xAI supported it since Elon did. I didn’t bother pushing for an additional xAI endorsement given that Elon endorsed it.

AISN #47: Reasoning Models

Corin Katzke and Dan H

Feb 6, 2025, 6:52 PM

3 points

0 comments4 min readLW link

(newsletter.safe.ai)

AISN #46: The Transition

Corin Katzke and Dan H

Jan 23, 2025, 6:09 PM

8 points

0 comments5 min readLW link

(newsletter.safe.ai)

Dan H Jan 19, 2025, 1:37 AM
32 points
0
in reply to: meemi’s comment on: meemi’s Shortform
It’s probably worth them mentioning for completeness that Nat Friedman funded an earlier version of the dataset too. (I was advising at that time and provided the main recommendation that it needs to be research-level because they were focusing on Olympiad level.)

Also can confirm they aren’t giving access to the mathematicians’ questions to AI companies other than OpenAI like xAI.

AISN #45: Center for AI Safety 2024 Year in Review

Corin Katzke and Dan H

Dec 19, 2024, 6:15 PM

13 points

0 comments4 min readLW link

(newsletter.safe.ai)

Dan H Dec 3, 2024, 4:45 AM
10 points
2
on: (The) Lightcone is nothing without its people: LW + Lighthaven’s big fundraiser
and have clearly been read a non-trivial amount by Elon Musk
Nit: He heard this idea in conversation with an employee AFAICT.

AISN #44: The Trump Circle on AI Safety Plus, Chinese researchers used Llama to create a military tool for the PLA, a Google AI system discovered a zero-day cybersecurity vulnerability, and Complex Systems

Corin Katzke, Julius, andrewz and Dan H

Nov 19, 2024, 4:36 PM

9 points

0 comments5 min readLW link

(newsletter.safe.ai)

AI Safety Newsletter #43: White House Issues First National Security Memo on AI Plus, AI and Job Displacement, and AI Takes Over the Nobels

Corin Katzke, Corin Katzke, Alexa Pan and Dan H

Oct 28, 2024, 4:03 PM

6 points

0 comments6 min readLW link

(newsletter.safe.ai)

AI Safety Newsletter #42: Newsom Vetoes SB 1047 Plus, OpenAI’s o1, and AI Governance Summary

Corin Katzke, Corin Katzke, Julius, Alexa Pan, andrewz and Dan H

Oct 1, 2024, 8:35 PM

8 points

0 comments6 min readLW link

(newsletter.safe.ai)

AI Safety Newsletter #41: The Next Generation of Compute Scale Plus, Ranking Models by Susceptibility to Jailbreaking, and Machine Ethics

Corin Katzke, Corin Katzke, Julius, andrewz and Dan H

Sep 11, 2024, 7:14 PM

5 points

1 comment5 min readLW link

(newsletter.safe.ai)

AI forecasting bots incoming

Dan H and Mantas Mazeika

Sep 9, 2024, 7:14 PM

29 points

44 comments4 min readLW link

(www.safe.ai)

Dan H Aug 26, 2024, 1:03 PM
4 points
−14
on: Darwinian Traps and Existential Risks
Relevant: Natural Selection Favors AIs over Humans

universal optimization algorithm

Evolution is not an optimization algorithm (this is a common misconception discussed in Okasha, Agents and Goals in Evolution).

AI Safety Newsletter #40: California AI Legislation Plus, NVIDIA Delays Chip Production, and Do AI Safety Benchmarks Actually Measure Safety?

Corin Katzke, Julius, Alexa Pan and Dan H

Aug 21, 2024, 6:09 PM

11 points

0 comments6 min readLW link

(newsletter.safe.ai)

The Bitter Lesson for AI Safety Research

adamk, Richard Ren, Dan H and Gabe M

Aug 2, 2024, 6:39 PM

57 points

5 comments3 min readLW link

Dan H Aug 2, 2024, 3:27 PM
3 points
0
on: Unlearning via RMU is mostly shallow
We have been working for months on this issue and have made substantial progress on it: Tamper-Resistant Safeguards for Open-Weight LLMs

General article about it: https://www.wired.com/story/center-for-ai-safety-open-source-llm-safeguards/