plex comments on AGI Ruin: A List of Lethalities

plex 10 Jun 2022 14:50 UTC
2 points
−1
It’s not just non-hand-codable, it is unteachable on-the-first-try because the thing you are trying to teach is too weird and complicated.
I have a terrifying hack which seems like it be possible to extract an AI which would act CEV-like way, using only True Names which might plausibly be within human reach, called Universal Alignment Test. I’m working with a small team of independent alignment researchers on it currently, feel free to book a call with me if you’d like to have your questions answered in real time. I have had “this seems interesting”-style reviews from the highest level people I’ve spoken to about it.
I failed to raise the idea with EY in 2015 at a conference, because I was afraid to be judged as a crackpot / seeming like I was making an inappropriate status claim by trying to solve too large a part of the problem. In retrospect this is a deeply ironic mistake to have made.