Mati_Roy comments on Forecasting Thread: AI Timelines

Mati_Roy 15 Sep 2020 6:39 UTC
2 points
topic: AI timelines
probability nothing here is new, but it’s some insights I had
summary: alignment will likely become the bottleneck; we’ll have human-capable AIs but they won’t do every tasks because we won’t know how to specify them
epistemic status: stated more confidently than I am, but seems like a good consideration to add to my portfolio of plausible models of AI development
when I was asking my inner sim “when will we have an AI better than a human at any task”, it was returning 21% before 2100 (52% we won’t) (see: https://www.lesswrong.com/posts/hQysqfSEzciRazx8k/forecasting-thread-ai-timelines?commentId=AhA3JsvwaZ7h6JbJj). which is a low probability among AI researchers and longtermist forecasters.
but then I asked my inner sim “when will we have an AI better than a human at any game”. the timeline for this seemed much shorter.
but a game is just task that has been operationalized.
so what my inner sim was saying is not that human-level capable AI was far away, but that human-level capable AND aligned AI was far away. I was imagining AIs wouldn’t clean-up my place anytime soon not because it’s hard to do (well, not for an AI XD), but because it’s hard to specify what we mean by “don’t cause any harm in the process”.
in other words, I think alignment is likely to be the bottleneck
the main problem won’t be to create an AI that can solve a problem, it will be to operationalize the problem in a way that properly captures what we care about. it won’t be about winning games, but creating them.
I should have known; I was well familiar with the orthogonality thesis for a long time
also see David’s comment about alignment vs capabilities: https://www.lesswrong.com/posts/DmLg3Q4ZywCj6jHBL/capybaralet-s-shortform?commentId=rdGAv6S6W3SbK6eta
I discussed the above with Matthew Barnett and David Krueger
the Turing Test might be hard to pass because even if you’re as smart as a human, if you don’t already know what humans want, it seems like it could be hard to learn (as well as a human) for a human-level AI (?) (side note: learning what humans want =/= wanting what humans want; that’s a classic confusion) so maybe a better test for human-level intelligence would be: when an AI can beat a human at any game (where a game is a well operationalized task, and doesn’t include figuring out what humans want)
(2021-10-10 update: I’m not at all confident about the above paragraph, and it’s not central to this thesis. The Turing Test can be a well define game, and we could have AIs that pass it while not having AIs doing other tasks human can do simply because we haven’t been able to operationalize those other tasks)
I want to update my AI timelines. I’m now (re)reading some stuff on https://aiimpacts.org/ (I think they have a lot of great writings!) Just read this which was kind of related ^^
> Hanson thinks we shouldn’t believe it when AI researchers give 50-year timescales:
> Rephrasing the question in different ways, e.g. “When will most people lose their jobs?” causes people to give different timescales.
> People consistently give overconfident estimates when they’re estimating things that are abstract and far away.
(https://aiimpacts.org/conversation-with-robin-hanson/)
I feel like for me it was the other way around. Initially I was just thinking more abstractly about “AIs better than humans at everything”, but then thinking in terms of games seems like it’s somewhat more concrete.
x-post: https://www.facebook.com/mati.roy.09/posts/10158870283394579