Mikhail Samin comments on My AI Model Delta Compared To Yudkowsky

Mikhail Samin 10 Jun 2024 21:59 UTC
6 points
−2
Edit: see https://www.lesswrong.com/posts/q8uNoJBgcpAe3bSBp/my-ai-model-delta-compared-to-yudkowsky?commentId=CixonSXNfLgAPh48Z and ignore the below.

This is not a doom story I expect Yudkowsky would tell or agree with.
- Re: 1, I mostly expect Yudkowsky to think humans don’t have any bargaining power anyway, because humans can’t logically mutually cooperate this way/can’t logically depend on future AI’s decisions, and so AI won’t keep its bargains no matter how important human cooperation was.
- Re: 2, I don’t expect Yudkowsky to think a smart AI wouldn’t be able to understand human value. The problem is making AI care.
On the rest of the doom story, assuming natural abstractions don’t fail the way you assume them failing here and instead things just going the way Yudkowsky expects and not the way you expect:
- I’m not sure what exactly you mean by 3b but I expect Yudkowsky to not say these words.
- I don’t expect Yudkowsky to use the words you used for 3c. A more likely problem with corrigibility isn’t that it might be an unnatural concept but that it’s hard to arrive at stable corrigible agents with our current methods. I think he places a higher probability on corrigibility being a concept with a short description length, that aliens would invent, than you think he places.
- Sure, 3d just means that we haven’t solved alignment and haven’t correctly pointed at humans, and any incorrectnesses obviously blow up.
- I don’t understand what you mean by 3e / what is its relevance here / wouldn’t expect Yudkowsky to say that.
- I’d bet Yudkowsky won’t endorse 6.
- Relatedly, a correctly CEV-aligned ASI won’t have ontology that we have, and sometimes this will mean we’ll need to figure out what we value. (https://arbital.greaterwrong.com/p/rescue_utility?l=3y6)
(I haven’t spoken to Yudkowsky about any of those, the above are quick thoughts from the top of my head, based on the impression I formed from what Yudkowsky publicly wrote.)