“We should work on understanding principles of intelligence so that we can make sure that AIs are thinking the same way as humans do; currently we lack this level of understanding”
Roughly. I think the minimax algorithm would qualify as “something that thinks the same way an idealized human would”, where “idealized” is doing substantial work (certainly, humans don’t actually play chess using minimax).
I don’t really understand point 10, especially this part:
Consider the following procedure for building an AI:
Collect a collection of AI tasks that we think are AGI-complete (e.g. a bunch of games and ML tasks)
Search for a short program that takes lots of data from the Internet as input and produces a policy that does well on lots of these AI tasks
Run this program on substantially different tasks related to the real world
This seems very likely to result in an unaligned AI. Consider the following program:
Simulate some stochastic physics, except that there’s some I/O terminal somewhere (as described in this post)
If the I/O terminal gets used, give the I/O terminal the Internet data as input and take the policy as output
If it doesn’t get used, run the simulation again until it does
This program is pretty short, and with some non-negligible probability (say, more than 1 in 1 billion), it’s going to produce a policy that is an unaligned AGI. This is because in enough runs of physics there will be civilizations; if the I/O terminal is accessed it is probably by some civilization; and the civilization will probably have values that are not aligned with human values, so they will do a treacherous turn (if they have enough information to know how the I/O terminal is being interpreted, which they do if there’s a lot of Internet data).
Roughly. I think the minimax algorithm would qualify as “something that thinks the same way an idealized human would”, where “idealized” is doing substantial work (certainly, humans don’t actually play chess using minimax).
Consider the following procedure for building an AI:
Collect a collection of AI tasks that we think are AGI-complete (e.g. a bunch of games and ML tasks)
Search for a short program that takes lots of data from the Internet as input and produces a policy that does well on lots of these AI tasks
Run this program on substantially different tasks related to the real world
This seems very likely to result in an unaligned AI. Consider the following program:
Simulate some stochastic physics, except that there’s some I/O terminal somewhere (as described in this post)
If the I/O terminal gets used, give the I/O terminal the Internet data as input and take the policy as output
If it doesn’t get used, run the simulation again until it does
This program is pretty short, and with some non-negligible probability (say, more than 1 in 1 billion), it’s going to produce a policy that is an unaligned AGI. This is because in enough runs of physics there will be civilizations; if the I/O terminal is accessed it is probably by some civilization; and the civilization will probably have values that are not aligned with human values, so they will do a treacherous turn (if they have enough information to know how the I/O terminal is being interpreted, which they do if there’s a lot of Internet data).
Thanks, I think I understand that part of the argument now. But I don’t understand how it relates to:
“10. We should expect simple reasoning rules to correctly generalize even for non-learning problems. ”
^Is that supposed to be a good thing or a bad thing? “Should expect” as in we want to find rules that do this, or as in rules will probably do this?
It’s just meant to be a prediction (simple rules will probably generalize).