anaguma

Karma: 136

anaguma Jun 10, 2025, 3:31 AM
3 points
0
on: When is it important that open-weight models aren’t released? My thoughts on the benefits and dangers of open-weight models in response to developments in CBRN capabilities.
COVID killed around 30 million people, and I think open-weight models at this level of capability would increase the chance of such pandemics by somewhat more than 0.1% per year.
How did you arrive at 0.1%? This seems too low to me.

anaguma Jun 2, 2025, 6:50 PM
3 points
0
in reply to: Daniel Kokotajlo’s comment on: What a 20-year-lead in military tech might look like
The second link is broken for me.

anaguma May 30, 2025, 8:19 PM
3 points
8
on: Letting Kids Be Kids
Sufficiently talented minors *are* adults’ peers at intellectual work.

One example of this is that the current world chess champion became a grandmaster at age 12 and won his title when he was 18 years old.

anaguma May 18, 2025, 12:16 AM
3 points
0
on: Can Reasoning Models Avoid the Most Forbidden Technique?
I think there’s an irrelevant link in the last bullet point.

anaguma Apr 7, 2025, 4:22 AM
1 point
0
in reply to: Paragox’s comment on: An Optimistic 2027 Timeline
My guess is that he’s referring to the fact that Blackwell offers much larger world sizes than Hopper and this makes LLM training/inference more efficient. Semianalysis has argued something similar here: https://semianalysis.com/2024/12/25/nvidias-christmas-present-gb300-b300-reasoning-inference-amazon-memory-supply-chain

anaguma Mar 21, 2025, 4:23 PM
1 point
0
in reply to: Daniel Kokotajlo’s comment on: METR: Measuring AI Ability to Complete Long Tasks
No, at some point you “jump all the way” to AGI, i.e. AI systems that can do any length of task as well as professional humans -- 10 years, 100 years, 1000 years, etc.

Isn’t the quadratic cost of context length a constraint here? Naively you’d expect that acting coherently over 100 years would require 10x the context, and therefore 100x the compute/memory, than 10 years.

anaguma Mar 11, 2025, 7:13 PM
5 points
3
in reply to: Caleb Biddulph’s comment on: OpenAI: Detecting misbehavior in frontier reasoning models
I would guess that the reason it hasn’t devolved into full neuralese is because there is a KL divergence penalty, similar to how RHLF works.

anaguma Mar 9, 2025, 10:10 PM
1 point
0
in reply to: gwern’s comment on: anaguma’s Shortform
I gave the model both the PGN and the FEN on every move with this in mind. Why do you think conditioning on high level games would help? I can see why for the base models, but I expect that the RLHFed models would try to play the moves which maximize their chances of winning, with or without such prompting.

anaguma Mar 9, 2025, 5:35 PM
3 points
0
in reply to: Caleb Biddulph’s comment on: CBiddulph’s Shortform
Do you know if there are scaling laws for DLGNs?

anaguma Mar 9, 2025, 6:47 AM
1 point
0
in reply to: GoteNoSente’s comment on: anaguma’s Shortform
“Let’s play a game of chess. I’ll be white, you will be black. On each move, I’ll provide you my move, and the board state in FEN and PGN notation. Respond with only your move.”

anaguma Mar 8, 2025, 10:02 PM
2 points
0
on: anaguma’s Shortform
GPT 4.5 is a very tricky model to play chess against. It tricked me in the opening and was much better, then I managed to recover and reach a winning endgame. And then it tried to trick me again by suggesting illegal moves which would lead to it being winning again!

anaguma Feb 23, 2025, 5:38 PM
3 points
0
in reply to: Vladimir_Nesov’s comment on: Reflections on the state of the race to superintelligence, February 2025
How large of an advantage do you think OA gets relative to its competitors from Stargate?

anaguma Jan 27, 2025, 6:34 PM
1 point
0
in reply to: Nathan Helm-Burger’s comment on: johnswentworth’s Shortform
This is interesting. Can you say more about these experiments?

anaguma Jan 26, 2025, 5:55 PM
5 points
1
in reply to: Vladimir_Nesov’s comment on: Vladimir_Nesov’s Shortform
How does Anthropic and XAi’s compute compare over this period?

anaguma Jan 23, 2025, 10:37 PM
1 point
0
in reply to: Daniel Kokotajlo’s comment on: Daniel Kokotajlo’s Shortform
Could you say more about how you think S-risks could arise from the first attractor state?

anaguma Jan 9, 2025, 10:51 PM
5 points
2
in reply to: Bogdan Ionut Cirstea’s comment on: How will we update about scheming?
An LLM trained with a sufficient amount of RL maybe could learn to compress its thoughts into more efficient representations than english text, which seems consistent with the statement. I’m not sure if this is possible in practice; I’ve asked here if anyone knows of public examples.

anaguma Jan 9, 2025, 9:50 PM
3 points
0
in reply to: ryan_greenblatt’s comment on: How will we update about scheming?
Makes sense. Perhaps we’ll know more when o3 is released. If the model doesn’t offer a summary of CoT it makes neuralese more likely.

anaguma Jan 9, 2025, 9:47 PM
1 point
0
on: anaguma’s Shortform
I’ve often heard it said that doing RL on chain of thought will lead to ‘neuralese’ (e.g. most recently in Ryan Greenblatt’s excellent post on the scheming). This seems important for alignment. Does anyone know of public examples of models developing or being trained to use neuralese?

anaguma Jan 9, 2025, 9:40 PM
1 point
0
on: How will we update about scheming?
(Based on public knowledge, it seems plausible (perhaps 25% likely) that o3 uses neuralese which could put it in this category.)

What public knowledge has led you to this estimate?

anaguma Jan 4, 2025, 7:11 AM
8 points
−10
in reply to: RohanS’s comment on: RohanS’s Shortform
I was able to replicate this result. Given other impressive results of o1, I wonder if the model is intentionally sandbagging? If it’s trained to maximize human feedback, this might be an optimal strategy when playing zero sum games.

Keyboard shortcuts

Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).

Keys shown in grey (e.g., ?) do not require any modifier keys.

General
? Show keyboard shortcuts
Esc Hide keyboard shortcuts

Site navigation
h Go to Home (a.k.a. “Frontpage”) view
f Go to Featured (a.k.a. “Curated”) view
a Go to All (a.k.a. “Community”) view
m Go to Meta view
v Go to Tags view
c Go to Recent Comments view
r Go to Archive view
q Go to Sequences view
t Go to About page
u Go to User or Login page
o Go to Inbox page

Page navigation
, Jump up to top of page
. Jump down to bottom of page
/ Jump to top of comments section
s Search

Page actions
n New post or comment
e Edit current post

Post/comment list views
. Focus next entry in list
, Focus previous entry in list
; Cycle between links in focused entry
Enter Go to currently focused entry
Esc Unfocus currently focused entry
] Go to next page
[ Go to previous page
\ Go to first page
e Edit currently focused post

Editor
k Bold text
i Italic text
l Insert hyperlink
q Blockquote text

Appearance
= Increase text size
- Decrease text size
0 Reset to default text size
′ Cycle through content width settings
1 Switch to default theme [A]
2 Switch to dark theme [B]
3 Switch to grey theme [C]
4 Switch to ultramodern theme [D]
5 Switch to simple theme [E]
6 Switch to brutalist theme [F]
7 Switch to ReadTheSequences theme [G]
8 Switch to classic Less Wrong theme [H]
9 Switch to modern Less Wrong theme [I]
; Open theme tweaker
Enter Save changes and close theme tweaker
Esc Close theme tweaker (without saving)

Slide shows
l Start/resume slideshow
Esc Exit slideshow
→↓ Next slide
←↑ Previous slide
Space Reset slide zoom

Miscellaneous
x Switch to next view on user page
z Switch to previous view on user page
` Toggle compact comment list view
g Toggle anti-kibitzer