Wei Dai comments on MIRI 2024 Mission and Strategy Update

Wei Dai 6 Jan 2024 7:02 UTC
4 points
0

If there’s a specific argument (or e.g. a specific three arguments) you think we should be emphasizing alongside “alignment is unsolved and looks hard”, I’d be interested to hear your suggestion and your reasoning.

The items on my list are of roughly equal salience to me. I don’t have specific suggestions for people who might be interested in spreading awareness of these risks/arguments, aside from picking a few that resonate with you and are also likely to be well received by the intended audience. And maybe link back to the list (or some future version of such a list) so that people don’t think the ones you choose to talk about are the only risks.

For me personally, I tend to talk about “philosophy is hard” (which feeds into “alignment is hard” and beyond) and “humans aren’t safe” (humans suffer from all kinds of safety problems just like AIs do, including being easily persuaded of strange beliefs and bad philosophy, calling “alignment” into question even as a goal). These might not work well on a broader audience though, the kind that MIRI is presumably trying to reach. Some adjacent messages might, for example, “even if alignment succeeds, humans can’t be trusted with God-like powers yet; we need to become much wiser first” and “AI persuasion will be a big problem” (but honestly I have little idea due to lack of experience talking outside my circle).