Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Paul Colognese
Karma:
389
Personal website
All
Posts
Comments
New
Top
Old
Explaining the AI Alignment Problem to Tibetan Buddhist Monks
Paul Colognese
7 Mar 2024 9:00 UTC
20
points
3
comments
6
min read
LW
link
Anomalous Concept Detection for Detecting Hidden Cognition
Paul Colognese
4 Mar 2024 16:52 UTC
24
points
3
comments
10
min read
LW
link
Hidden Cognition Detection Methods and Benchmarks
Paul Colognese
26 Feb 2024 5:31 UTC
22
points
11
comments
4
min read
LW
link
Notes on Internal Objectives in Toy Models of Agents
Paul Colognese
22 Feb 2024 8:02 UTC
16
points
0
comments
8
min read
LW
link
Internal Target Information for AI Oversight
Paul Colognese
20 Oct 2023 14:53 UTC
15
points
0
comments
5
min read
LW
link
[Question]
Potential alignment targets for a sovereign superintelligent AI
Paul Colognese
3 Oct 2023 15:09 UTC
29
points
4
comments
1
min read
LW
link
High-level interpretability: detecting an AI’s objectives
Paul Colognese
and
Jozdien
28 Sep 2023 19:30 UTC
69
points
4
comments
21
min read
LW
link
[Linkpost] Frontier AI Taskforce: first progress report
Paul Colognese
7 Sep 2023 19:06 UTC
21
points
0
comments
4
min read
LW
link
(www.gov.uk)
Aligned AI via monitoring objectives in AutoGPT-like systems
Paul Colognese
24 May 2023 15:59 UTC
27
points
4
comments
4
min read
LW
link
Towards a solution to the alignment problem via objective detection and evaluation
Paul Colognese
12 Apr 2023 15:39 UTC
9
points
7
comments
12
min read
LW
link
Decision Transformer Interpretability
Joseph Bloom
and
Paul Colognese
6 Feb 2023 7:29 UTC
84
points
13
comments
24
min read
LW
link
Paul Colognese’s Shortform
Paul Colognese
2 Feb 2023 19:15 UTC
2
points
1
comment
1
min read
LW
link
Auditing games for high-level interpretability
Paul Colognese
1 Nov 2022 10:44 UTC
33
points
1
comment
7
min read
LW
link
Deception?! I ain’t got time for that!
Paul Colognese
18 Jul 2022 0:06 UTC
55
points
5
comments
13
min read
LW
link
Back to top