ThomasCederborg

Karma: 186

My research focus is Alignment Target Analysis (ATA). I noticed that the most recently published version of CEV (Parliamentarian CEV, or PCEV) gives a large amount of extra influence to people that intrinsically value hurting other individuals. For Yudkowsky’s description of the issue you can search the CEV arbital page for ADDED 2023.

The fact that no one noticed this issue for over a decade shows that ATA is difficult. If PCEV had been successfully implemented, the outcome would have been massively worse than extinction. I think that this illustrates that scenarios where someone successfully hits a bad alignment target pose a serious risk. I also think that it illustrates that ATA can reduce these risks (noticing the issue reduced the probability of PCEV getting successfully implemented). The reason that more ATA is needed is that PCEV is not the only bad alignment target that might end up getting implemented. ATA is however very neglected. There does not exist a single research project dedicated to ATA. In other words: the reason that I am doing ATA is that it is a tractable and neglected way of reducing risks.

I am currently looking for collaborators. I am also looking for a grant or a position that would allow me to focus entirely on ATA for an extended period of time. Please don’t hesitate to get in touch if you are curious and would like to have a chat, or if you have any feedback, comments, or questions. You can for example PM me here, or PM me on the EA Forum, or email me at thomascederborgsemail@gmail.com (that really is my email address. It’s a Gavagai / Word and Object joke from my grad student days)

My background is physics as an undergrad and then AI research. Links to some papers: P1 P2 P3 P4 P5 P6 P7 P8. (no connection to any form of deep learning)

A problem shared by many different alignment targets

ThomasCederborgJan 15, 2025, 2:22 PM

12 points

18 comments36 min readLW link

Shutting down all competing AI projects might not buy a lot of time due to Internal Time Pressure

ThomasCederborgOct 3, 2024, 12:01 AM

12 points

7 comments12 min readLW link

The case for more Alignment Target Analysis (ATA)

Chi Nguyen and ThomasCederborg

Sep 20, 2024, 1:14 AM

27 points

13 comments17 min readLW link

A necessary Membrane formalism feature

ThomasCederborgSep 10, 2024, 9:33 PM

20 points

6 comments11 min readLW link

Corrigibility could make things worse

ThomasCederborgJun 11, 2024, 12:55 AM

9 points

6 comments6 min readLW link

The proposal to add a ``Last Judge″ to an AI, does not remove the urgency, of making progress on the ``what alignment target should be aimed at?″ question.

ThomasCederborgNov 22, 2023, 6:59 PM

1 point

0 comments18 min readLW link

Making progress on the ``what alignment target should be aimed at?″ question, is urgent

ThomasCederborgOct 5, 2023, 12:55 PM

2 points

0 comments18 min readLW link

A problem with the most recently published version of CEV

ThomasCederborgAug 23, 2023, 6:05 PM

10 points

8 comments8 min readLW link 1 review