RSS

Teun van der Weij

Karma: 245

How to miti­gate sandbagging

Teun van der WeijMar 23, 2025, 5:19 PM
23 points
0 comments8 min readLW link

Teun van der Weij’s Shortform

Teun van der WeijMar 14, 2025, 3:54 AM
3 points
1 comment1 min readLW link

The Elic­i­ta­tion Game: Eval­u­at­ing ca­pa­bil­ity elic­i­ta­tion techniques

Feb 27, 2025, 8:33 PM
10 points
0 comments2 min readLW link

[Paper] AI Sand­bag­ging: Lan­guage Models can Strate­gi­cally Un­der­perform on Evaluations

Jun 13, 2024, 10:04 AM
84 points
10 comments2 min readLW link
(arxiv.org)

An In­tro­duc­tion to AI Sandbagging

Apr 26, 2024, 1:40 PM
45 points
13 comments8 min readLW link

Sim­ple dis­tri­bu­tion ap­prox­i­ma­tion: When sam­pled 100 times, can lan­guage mod­els yield 80% A and 20% B?

Jan 29, 2024, 12:24 AM
39 points
5 comments4 min readLW link

List of pro­jects that seem im­pact­ful for AI Governance

Jan 14, 2024, 4:53 PM
14 points
0 comments13 min readLW link

Eval­u­at­ing Lan­guage Model Be­havi­ours for Shut­down Avoidance in Tex­tual Scenarios

May 16, 2023, 10:53 AM
26 points
0 comments13 min readLW link