Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Paul Colognese
Karma:
391
Personal website
All
Posts
Comments
New
Top
Old
Explaining the AI Alignment Problem to Tibetan Buddhist Monks
Paul Colognese
Mar 7, 2024, 9:00 AM
20
points
3
comments
6
min read
LW
link
Anomalous Concept Detection for Detecting Hidden Cognition
Paul Colognese
Mar 4, 2024, 4:52 PM
24
points
3
comments
10
min read
LW
link
Hidden Cognition Detection Methods and Benchmarks
Paul Colognese
Feb 26, 2024, 5:31 AM
22
points
11
comments
4
min read
LW
link
Notes on Internal Objectives in Toy Models of Agents
Paul Colognese
Feb 22, 2024, 8:02 AM
16
points
0
comments
8
min read
LW
link
Internal Target Information for AI Oversight
Paul Colognese
Oct 20, 2023, 2:53 PM
15
points
0
comments
5
min read
LW
link
[Question]
Potential alignment targets for a sovereign superintelligent AI
Paul Colognese
Oct 3, 2023, 3:09 PM
29
points
4
comments
1
min read
LW
link
High-level interpretability: detecting an AI’s objectives
Paul Colognese
and
Jozdien
Sep 28, 2023, 7:30 PM
72
points
4
comments
21
min read
LW
link
[Linkpost] Frontier AI Taskforce: first progress report
Paul Colognese
Sep 7, 2023, 7:06 PM
21
points
0
comments
4
min read
LW
link
(www.gov.uk)
Aligned AI via monitoring objectives in AutoGPT-like systems
Paul Colognese
May 24, 2023, 3:59 PM
27
points
4
comments
4
min read
LW
link
Towards a solution to the alignment problem via objective detection and evaluation
Paul Colognese
Apr 12, 2023, 3:39 PM
9
points
7
comments
12
min read
LW
link
Decision Transformer Interpretability
Joseph Bloom
and
Paul Colognese
Feb 6, 2023, 7:29 AM
84
points
13
comments
24
min read
LW
link
Paul Colognese’s Shortform
Paul Colognese
Feb 2, 2023, 7:15 PM
2
points
1
comment
LW
link
Auditing games for high-level interpretability
Paul Colognese
Nov 1, 2022, 10:44 AM
33
points
1
comment
7
min read
LW
link
Deception?! I ain’t got time for that!
Paul Colognese
Jul 18, 2022, 12:06 AM
55
points
5
comments
13
min read
LW
link
Back to top
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel