RSS

Govind Pimpale

Karma: 302

Fore­cast­ing Fron­tier Lan­guage Model Agent Capabilities

Feb 24, 2025, 4:51 PM
35 points
0 comments5 min readLW link
(www.apolloresearch.ai)

Do mod­els know when they are be­ing eval­u­ated?

Feb 17, 2025, 11:13 PM
54 points
3 comments12 min readLW link

Cur­rent safety train­ing tech­niques do not fully trans­fer to the agent setting

Nov 3, 2024, 7:24 PM
158 points
9 comments5 min readLW link

~80 In­ter­est­ing Ques­tions about Foun­da­tion Model Agent Safety

Oct 28, 2024, 4:37 PM
46 points
4 comments15 min readLW link

An­a­lyz­ing Deep­Mind’s Prob­a­bil­is­tic Meth­ods for Eval­u­at­ing Agent Capabilities

Jul 22, 2024, 4:17 PM
69 points
0 comments16 min readLW link