RSS

Paul Colognese

Karma: 391

Personal website

Ex­plain­ing the AI Align­ment Prob­lem to Ti­be­tan Bud­dhist Monks

Paul CologneseMar 7, 2024, 9:00 AM
20 points
3 comments6 min readLW link

Ano­ma­lous Con­cept De­tec­tion for De­tect­ing Hid­den Cognition

Paul CologneseMar 4, 2024, 4:52 PM
24 points
3 comments10 min readLW link

Hid­den Cog­ni­tion De­tec­tion Meth­ods and Bench­marks

Paul CologneseFeb 26, 2024, 5:31 AM
22 points
11 comments4 min readLW link

Notes on In­ter­nal Ob­jec­tives in Toy Models of Agents

Paul CologneseFeb 22, 2024, 8:02 AM
16 points
0 comments8 min readLW link

In­ter­nal Tar­get In­for­ma­tion for AI Oversight

Paul CologneseOct 20, 2023, 2:53 PM
15 points
0 comments5 min readLW link

[Question] Po­ten­tial al­ign­ment tar­gets for a sovereign su­per­in­tel­li­gent AI

Paul CologneseOct 3, 2023, 3:09 PM
29 points
4 comments1 min readLW link

High-level in­ter­pretabil­ity: de­tect­ing an AI’s objectives

Sep 28, 2023, 7:30 PM
71 points
4 comments21 min readLW link

[Linkpost] Fron­tier AI Task­force: first progress report

Paul CologneseSep 7, 2023, 7:06 PM
21 points
0 comments4 min readLW link
(www.gov.uk)

Aligned AI via mon­i­tor­ing ob­jec­tives in Au­toGPT-like systems

Paul CologneseMay 24, 2023, 3:59 PM
27 points
4 comments4 min readLW link

Towards a solu­tion to the al­ign­ment prob­lem via ob­jec­tive de­tec­tion and eval­u­a­tion

Paul CologneseApr 12, 2023, 3:39 PM
9 points
7 comments12 min readLW link