RSS

Jozdien

Karma: 1,729

Lighthaven Se­quences Read­ing Group #24 (Tues­day 03/​04)

Mar 3, 2025, 7:13 PM
6 points
0 comments1 min readLW link

Lighthaven Se­quences Read­ing Group #23 (Tues­day 02/​25)

Feb 23, 2025, 5:01 AM
8 points
0 comments1 min readLW link

Lighthaven Se­quences Read­ing Group #22 (Tues­day 02/​18)

Feb 16, 2025, 3:51 AM
7 points
1 comment1 min readLW link

Lighthaven Se­quences Read­ing Group #21 (Tues­day 02/​11)

Feb 6, 2025, 8:49 PM
8 points
0 comments1 min readLW link

BIG-Bench Ca­nary Con­tam­i­na­tion in GPT-4

JozdienOct 22, 2024, 3:40 PM
123 points
14 comments4 min readLW link

Gra­di­ent Des­cent on the Hu­man Brain

Apr 1, 2024, 10:39 PM
58 points
5 comments2 min readLW link

Difficulty classes for al­ign­ment properties

JozdienFeb 20, 2024, 9:08 AM
34 points
5 comments2 min readLW link

The Poin­ter Re­s­olu­tion Problem

JozdienFeb 16, 2024, 9:25 PM
41 points
20 comments3 min readLW link

Cri­tiques of the AI con­trol agenda

JozdienFeb 14, 2024, 7:25 PM
48 points
14 comments9 min readLW link

The case for more am­bi­tious lan­guage model evals

JozdienJan 30, 2024, 12:01 AM
112 points
30 comments5 min readLW link

Thoughts On (Solv­ing) Deep Deception

JozdienOct 21, 2023, 10:40 PM
71 points
6 comments6 min readLW link

High-level in­ter­pretabil­ity: de­tect­ing an AI’s objectives

Sep 28, 2023, 7:30 PM
71 points
4 comments21 min readLW link

The Com­pleat Cybornaut

May 19, 2023, 8:44 AM
65 points
2 comments16 min readLW link

AI Safety via Luck

JozdienApr 1, 2023, 8:13 PM
81 points
7 comments11 min readLW link

Gra­di­ent Filtering

Jan 18, 2023, 8:09 PM
55 points
16 comments13 min readLW link

[ASoT] Si­mu­la­tors show us be­havi­oural prop­er­ties by default

JozdienJan 13, 2023, 6:42 PM
35 points
3 comments3 min readLW link

Try­ing to iso­late ob­jec­tives: ap­proaches to­ward high-level interpretability

JozdienJan 9, 2023, 6:33 PM
48 points
14 comments8 min readLW link

[ASoT] Fine­tun­ing, RL, and GPT’s world prior

JozdienDec 2, 2022, 4:33 PM
44 points
8 comments5 min readLW link

Con­di­tion­ing Gen­er­a­tive Models for Alignment

JozdienJul 18, 2022, 7:11 AM
59 points
8 comments20 min readLW link

Gam­ing Incentives

JozdienJul 29, 2021, 1:51 PM
10 points
4 comments6 min readLW link