Siebe

Karma: 305

Former community director EA Netherlands. Now disabled by long covid, ME/CFS. Worried about AGI & US democracy

Siebe Jul 10, 2025, 11:52 AM
1 point
1
on: Siebe’s Shortform
Shallow take:

I feel iffy about negative reinforcement still being widely used in AI. Both human behaviour experts (child-rearing) and animal behavior experts seem to have largely moved away from that being effective, only leading to unwanted behavior down the line

Siebe Jun 27, 2025, 6:53 AM
6 points
0
on: Siebe’s Shortform
There’s a number of priors that lead me to expect much of the current AI safety research to be low quality:
1. A lot of science is low quality. It’s the default expectation for a research field.
2. It’s pre-paradigmatic. Norms haven’t been established yet for what works in the real world, what are reliable methods and what is p-hacking etc. This makes it not only difficult to produce good work, it also makes it hard to recognize bad work and hard to get properly calibrated about how much work is bad, the way we are in established research fields.
3. It’s subject to selection effects by non-experts. It gets amplified by advocates, journalists, policy groups, the general public. This incentivizes hype, spin etc. over rigor.
4. It’s a very ideological field. Because there’s not a lot of empirical evidence to go on, and a lot of people’s opinions were formed before LLMs exploded, and people’s emotions are (rightly) strong about the topic.
5. I’m part of the in-group and I identify with—sometimes even know—the people doing the research. All tribal biases apply.
Now, some of this may be attenuated by the field being inspired by LessWrong and therefore having some norms like research integrity, open discussion & high criticism, but I don’t think those forces are strong enough to counteract the other ones.

If you believe “AI safety is fundamentally much harder than capabilities, and therefore we’re in danger”, you should also believe “AI safety is fundamentally much harder than capabilities, and therefore there’s a lot of invalid and unreliable claims”.

Also, this will vary for different subfields. Those with tighter connection to real-world outcomes, like interpretability, I would expect to be less bad. But I’m not familiar enough with the subfields to say more about specific ones.

Siebe Apr 26, 2025, 7:56 AM
1 point
0
in reply to: Siebe’s comment on: Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
More thoughts:

I thought that AlphaZero was a counterpoint, but apparently it’s significantly different. For example, it used true self-play allowing it to discover fully novel strategies.

Then again, I don’t think more sophisticated reasoning is the bottleneck to AGI (compared to executive function & tool use), so even if reasoning doesn’t really improve for a few years we could get AGI.

However, I previously thought reasoning models could be leveraged to figure out how to achieve actions, and then the best actions would be distilled into a better agent model, you know, IDA-style. But this paper makes me more skeptical of that working, because these agentic steps might require novel skills that aren’t inside the training data.

Siebe Apr 25, 2025, 1:46 PM
8 points
0
in reply to: Seth Herd’s comment on: Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Yes it matters for current model performance, but it means that RLVR isn’t actually improving the model in a way that can be used for an iterated distillation & amplification loop, because it doesn’t actually do real amplification. If this turns out right, it’s quite bearish for AI timelines

Edit: Ah someone just alerted me to the crucial consideration that this was tested using smaller models (like Qwen-2.5 (7B/14B/32B) and LLaMA-3.1-8B, which are significantly smaller than the models where RLVR has shown the most dramatic improvements (like DeepSeek-V3 → R1 or GPT-4o → o1). And given that different researchers have claimed that there’s a threshold effect, substantially weakens these findings. But they say they’re currently evaluating DeepSeek V3- & R1 so I guess we’ll see

Siebe Apr 12, 2025, 1:58 PM
3 points
0
in reply to: SoerenMind’s comment on: Learned pain as a leading cause of chronic pain
That’s good to know.

For what it’s worth, ME/CFS (a disease/cluster of specific symptoms) is quite different from idiopathic chronic fatigue (a single symptom). Confusing the two is one of the major issues in the literature. Many people with ME/CFS, like I, don’t even have ‘feeling tired’ as a symptom. Which is why I avoid the term CFS.

Siebe Apr 12, 2025, 11:32 AM
18 points
1
on: Learned pain as a leading cause of chronic pain
I haven’t looked into this literature, but it sounds remarkably similar to the literature of cognitive behavioral therapy and graded exercise therapy for ME/CFS (also sometimes referred to as ‘chronic fatigue syndrome’). I can imagine this being different for pain which could be under more direct neurological control.
Pretty much universally, this research was of low to very low quality. For example, using overly broad inclusion criteria such that many patients did not have the core symptom of ME/CFS, and only reporting subjective scores (which tend to improve) while not reporting objective scores. These treatments are also pretty much impossible to blind. Non-blinding + subjective self-report is a pretty bad combination. This, plus the general amount of bad research practices in science, gives me a skeptical prior.
Regarding the value of anecdotes—over the past couple of years as ME/CFS patient (presumably from covid) I’ve seen remission anecdotes for everything under the sun. They’re generally met with enthusiasm and a wave of people trying it, with ~no one being able te replicate it. I suspect that “I cured my condition X psychologically” is often a more prevalent story because 1) it’s tried so often, and 2) it’s an especially viral meme. Not because it has a higher succes rate than a random supplement. The reality is that spontaneous remission for any condition seems not extremely unlikely, and it’s actually very hard to trace effects to causes (which is why even for effective drugs, we need large-scale highly rigorous trials).
Lastly, ignoring symptoms can be pretty dangerous so I recommend caution with the approach and approach is like you would any other experimental treatment.

Siebe Apr 10, 2025, 1:44 PM
3 points
0
on: Siebe’s Shortform
I’m starting a discussion group on Signal to explore and understand the democratic backsliding of the US at ‘gears-level’. We will avoid simply discussing the latest outrageous thing in the news, unless that news is relevant to democratic backsliding.

Example questions:
- “how far will SCOTUS support Trump’s executive overreach?”
- “what happens if Trump commands the military to support electoral fraud?”
- “how does this interact with potentially short AGI timelines?”
- “what would an authoritarian successor to Trump look like?”
- “are there any neglected, tractable, and important interventions?”
You can join the group here: https://signal.group/#CjQKIE2jBWwjbFip5-kBnyZHqvDnxaJ2VaUYwbIpiE-Eym2hEhAy21lPlkhZ246_AH1V4-iA (If the link doesn’t work anymore in the future, DM me.)

Siebe Mar 19, 2025, 7:20 PM
2 points
−15
in reply to: Daniel Kokotajlo’s comment on: METR: Measuring AI Ability to Complete Long Tasks
One way to operationalize “160 years of human time” is “thing that can be achieved by a 160-person organisation in 1 year”, which seems like it would make sense?

Siebe Mar 6, 2025, 11:28 AM
7 points
0
on: Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs
This makes me wonder if it’s possible that “evil personas” can be entirely eliminated from distilled models, by including positive/aligned intent labels/traces throughout the whole distillation dataset

Matthew Yglesias—Misinformation Mostly Confuses Your Own Side

SiebeFeb 26, 2025, 2:55 PM

10 points

1 comment1 min readLW link

(www.slowboring.com)

Siebe Feb 2, 2025, 7:43 AM
1 point
1
in reply to: habryka’s comment on: The Failed Strategy of Artificial Intelligence Doomers
Seems to me the name AI safety is currently still widely used, no? As it covers much more than just alignment strategies, by including also stuff like control and governance

Siebe Feb 1, 2025, 1:41 PM
1 point
−4
on: The Failed Strategy of Artificial Intelligence Doomers

The AI Doomers are only one of several factions that oppose AI and seek to cripple it via weaponized regulation.

Bad faith

There are also factions concerned about “misinformation” and “algorithmic bias,” which in practice means they think chatbots must be censored to prevent them from saying anything politically inconvenient.

Bad faith

AI Doomer coalition abandoned the name “AI safety” and rebranded itself to “AI alignment.”

Seems wrong

Siebe Jan 29, 2025, 3:24 PM
8 points
0
on: Ten people on the inside
What about whistle-blowing and anonymous leaking? Seems like it would go well together with concrete evidence of risk.

Siebe Jan 23, 2025, 2:07 PM
5 points
4
on: Training on Documents About Reward Hacking Induces Reward Hacking
This is very interesting, and I had a recent thought that’s very similar:

This might be a stupid question, but has anyone considered just flooding LLM training data with large amounts of (first-person?) short stories of desirable ASI behavior?

The way I imagine this to work is basically that an AI agent would develop really strong intuitions that “that’s just what ASIs do”. It might prevent it from properly modelling other agents that aren’t trained on this, but it’s not obvious to me that that’s going to happen or that it’s such a decisively bad thing to outweigh the positives

I imagine that the ratio of descriptions of desirable vs. descriptions of undesirable behavior would matter, and perhaps an ideal approach would both (massively) increase the amount of descriptions of desirable behavior as well as filter out the descriptions of unwanted behavior?

Siebe Jan 23, 2025, 2:03 PM
4 points
2
in reply to: Milan W’s comment on: Siebe’s Shortform
Looks like Evan Hubinger has done some very similar research just recently: https://www.lesswrong.com/posts/qXYLvjGL9QvD3aFSW/training-on-documents-about-reward-hacking-induces-reward

Siebe Jan 23, 2025, 1:48 PM
3 points
0
in reply to: Milan W’s comment on: Siebe’s Shortform
I think it might make sense to do it as a research project first? Though you would need to be able to train a model from scratch

Siebe Jan 22, 2025, 12:51 PM
16 points
5
on: Siebe’s Shortform
This might be a stupid question, but has anyone considered just flooding LLM training data with large amounts of (first-person?) short stories of desirable ASI behavior?

The way I imagine this to work is basically that an AI agent would develop really strong intuitions that “that’s just what ASIs do”. It might prevent it from properly modelling other agents that aren’t trained on this, but it’s not obvious to me that that’s going to happen or that it’s such a decisively bad thing to outweigh the positives

Siebe’s Shortform

SiebeJan 22, 2025, 12:51 PM

3 points

29 comments LW link

Siebe Jan 19, 2025, 6:57 PM
42 points
24
in reply to: Tamay’s comment on: meemi’s Shortform
I think you should publicly commit to:
- full transparency about any funding from for profit organisations, including nonprofit organizations affiliated with for profit
- no access to the benchmarks to any company
- no NDAs around this stuff
If you currently have any of these with the computer use benchmark in development, you should seriously try to get out of those contractual obligations if there are any.

Ideally, you commit to these in a legally binding way, which would make it non-negotiable in any negotiation, and make you more credible to outsiders.

Siebe Jan 19, 2025, 5:09 PM
17 points
17
in reply to: Neel Nanda’s comment on: Jonathan Claybrough’s Shortform

I don’t think that all media produced by AI risk concerned people needs to mention that AI risk is a big deal—that just seems annoying and preachy. I see Epoch’s impact story as informing people of where AI is likely to go and what’s likely to happen, and this works fine even if they don’t explicitly discuss AI risk

I don’t think that every podcast episode should mention AI risk, but it would be pretty weird in my eyes to never mention it. Listeners would understandably infer that “these well-informed people apparently don’t really worry much, maybe I shouldn’t worry much either”. I think rationalists easily underestimate how much other people’s beliefs depend on what the people around them & their authority figures believe.

I think they have a strong platform to discuss risks occasionally. It also simply feels part of “where AI is likely to go and what’s likely to happen”.

Siebe

Matthew Ygle­sias—Mis­in­for­ma­tion Mostly Con­fuses Your Own Side

Siebe’s Shortform

Matthew Yglesias—Misinformation Mostly Confuses Your Own Side