Noosphere89 comments on My disagreements with “AGI ruin: A List of Lethalities”

Noosphere89 25 Sep 2024 15:07 UTC
4 points
0
Good point that some race dynamics will make the in practice outcome worse than my ideal.

I agree that the problem is that the race dynamics will cause some labs to skip various precautions, but even in this world where we have 0 dignity, I have a non-trivial (though unacceptably low chance) of us succeeding at alignment, more like 20-50%, but even so, I agree that less racing would be good, because a 20-50% chance of success is very dangerous, it’s just that I believe we need 0 more insights into alignment, and there is a reasonably tractable way to get an AI to share our values in a way which requires lots of engineering to automate the data pipeline safely, but nothing in the need of new insights is required.

This isn’t a post on “how we could be safe even under arbitrarily high pressure to race”, this is a post on how early Lesswrong and MIRI got a lot of things wrong such that we can assign much higher tractability to alignment happening, and thus good outcomes happening.

You are correct that more safety culture needs to happen in labs, it’s just that we could have AI progress at some rate without getting ourselves into a catastrophe.

So I agree with you for the most part that we need to slow down the race, I just think that we don’t need to go further and introduce outright stoppages because we don’t know in technical terms how to align AIs.

Though see this post on the case for negative alignment taxes, which would really help the situation for us if we were in a race:

https://www.lesswrong.com/posts/xhLopzaJHtdkz9siQ/the-case-for-a-negative-alignment-tax

(We might disagree on how much time and money needs to be invested, but that’s a secondary crux.)