apollonianblues

Karma: 33

Hello. I am Ella Markianos. I am an undergraduate student trying to understand alignment. If you like or dislike any of my posts please email me at apollonianblues@gmail.com; I’ve always wanted a pen pal.

apollonianblues Aug 21, 2022, 7:27 PM
1 point
0
in reply to: apollonianblues’s comment on: My Plan to Build Aligned Superintelligence
TBH my naive thought is that if John’s project succeeds it’ll solve most of what I think of as the hard part of alignment, and so it seems like one of the more promising approaches to me, but in my model of the world it seems quite unlikely that there are natural abstractions in the way that John seems to think there are.

apollonianblues Aug 21, 2022, 7:25 PM
0 points
0
in reply to: harsimony’s comment on: My Plan to Build Aligned Superintelligence
I have LOL thanks tho

apollonianblues 21 Aug 2022 19:24 UTC
1 point
0
in reply to: ryanhelsing’s comment on: My Plan to Build Aligned Superintelligence
My assumption is that it would do this to prevent other people from making superintelligences that are unaligned. At least Eliezer thinks you need to do this (see bullet point 6 in this post), and I think it generally comes up in conversations people have about pivotal acts. Some people think if you think of an alignment solution that’s good and easy to implement, everyone building AGI will use it, and so you won’t have to prevent other people from building unaligned AGI, but this seems unrealistic and risky to me.

My Plan to Build Aligned Superintelligence

apollonianblues21 Aug 2022 13:16 UTC

18 points

7 comments8 min readLW link

Can We Align AI by Having It Learn Human Preferences? I’m Scared (summary of last third of Human Compatible)

apollonianblues29 Jun 2022 4:09 UTC

19 points

3 comments6 min readLW link

apollonianblues

My Plan to Build Aligned Superintelligence

Can We Align AI by Hav­ing It Learn Hu­man Prefer­ences? I’m Scared (sum­mary of last third of Hu­man Com­pat­i­ble)

Can We Align AI by Having It Learn Human Preferences? I’m Scared (summary of last third of Human Compatible)