RSS

ThomasCederborg

Karma: 165

My research focus is Alignment Target Analysis (ATA). I noticed that the most recently published version of CEV (Parliamentarian CEV, or PCEV) gives a large amount of extra influence to people that intrinsically value hurting other individuals. For Yudkowsky’s description of the issue you can search the CEV arbital page for ADDED 2023.

The fact that no one noticed this issue for over a decade shows that ATA is difficult. If PCEV had been successfully implemented, the outcome would have been massively worse than extinction. I think that this illustrates that scenarios where someone successfully hits a bad alignment target pose a serious risk. I also think that it illustrates that ATA can reduce these risks (noticing the issue reduced the probability of PCEV getting successfully implemented). The reason that more ATA is needed is that PCEV is not the only bad alignment target that might end up getting implemented. ATA is however very neglected. There does not exist a single research project dedicated to ATA. In other words: the reason that I am doing ATA is that it is a tractable and neglected way of reducing risks.

I am currently looking for collaborators. I am also looking for a grant or a position that would allow me to focus entirely on ATA for an extended period of time. Please don’t hesitate to get in touch if you are curious and would like to have a chat, or if you have any feedback, comments, or questions. You can for example PM me here, or PM me on the EA Forum, or email me at thomascederborgsemail@gmail.com (that really is my email address. It’s a Gavagai /​​ Word and Object joke from my grad student days)

My background is physics as an undergrad and then AI research. Links to some papers: P1 P2 P3 P4 P5 P6 P7 P8. (no connection to any form of deep learning)

Shut­ting down all com­pet­ing AI pro­jects might not buy a lot of time due to In­ter­nal Time Pressure

ThomasCederborg3 Oct 2024 0:01 UTC
12 points
7 comments12 min readLW link

The case for more Align­ment Tar­get Anal­y­sis (ATA)

20 Sep 2024 1:14 UTC
25 points
13 comments17 min readLW link

A nec­es­sary Mem­brane for­mal­ism feature

ThomasCederborg10 Sep 2024 21:33 UTC
20 points
6 comments11 min readLW link

Cor­rigi­bil­ity could make things worse

ThomasCederborg11 Jun 2024 0:55 UTC
9 points
6 comments6 min readLW link

The pro­posal to add a ``Last Judge″ to an AI, does not re­move the ur­gency, of mak­ing progress on the ``what al­ign­ment tar­get should be aimed at?″ ques­tion.

ThomasCederborg22 Nov 2023 18:59 UTC
1 point
0 comments18 min readLW link

Mak­ing progress on the ``what al­ign­ment tar­get should be aimed at?″ ques­tion, is urgent

ThomasCederborg5 Oct 2023 12:55 UTC
2 points
0 comments18 min readLW link

A prob­lem with the most re­cently pub­lished ver­sion of CEV

ThomasCederborg23 Aug 2023 18:05 UTC
10 points
7 comments8 min readLW link