porby

Karma: 1,882

Soft Prompts for Evaluation: Measuring Conditional Distance of Capabilities

porbyFeb 2, 2024, 5:49 AM

47 points

1 comment4 min readLW link

(arxiv.org)

FAQ: What the heck is goal agnosticism?

porbyOct 8, 2023, 7:11 PM

66 points

38 comments28 min readLW link

A plea for more funding shortfall transparency

porbyAug 7, 2023, 9:33 PM

73 points

4 comments2 min readLW link

Using predictors in corrigible systems

porbyJul 19, 2023, 10:29 PM

21 points

6 comments27 min readLW link

One path to coherence: conditionalization

porbyJun 29, 2023, 1:08 AM

28 points

4 comments4 min readLW link

One implementation of regulatory GPU restrictions

porbyJun 4, 2023, 8:34 PM

42 points

6 comments5 min readLW link

porby’s Shortform

porbyMay 24, 2023, 9:34 PM

6 points

20 comments LW link

Implied “utilities” of simulators are broad, dense, and shallow

porbyMar 1, 2023, 3:23 AM

45 points

7 comments3 min readLW link

Instrumentality makes agents agenty

porbyFeb 21, 2023, 4:28 AM

20 points

7 comments6 min readLW link

[Question] How would you use video gamey tech to help with AI safety?

porbyFeb 9, 2023, 12:20 AM

9 points

5 comments1 min readLW link

Against Boltzmann mesaoptimizers

porbyJan 30, 2023, 2:55 AM

77 points

6 comments4 min readLW link

FFMI Gains: A List of Vitalities

porbyJan 12, 2023, 4:48 AM

26 points

3 comments7 min readLW link

Simulators, constraints, and goal agnosticism: porbynotes vol. 1

porbyNov 23, 2022, 4:22 AM

38 points

2 comments35 min readLW link

Am I secretly excited for AI getting weird?

porbyOct 29, 2022, 10:16 PM

116 points

4 comments4 min readLW link

Why I think strong general AI is coming soon

porbySep 28, 2022, 5:40 AM

337 points

141 comments34 min readLW link 1 review

Private alignment research sharing and coordination

porbySep 4, 2022, 12:01 AM

62 points

13 comments5 min readLW link