I’m not sure I understand, do you mean that considering these possibilities is too difficult because there are too many or that it’s not a priority because AIs not designed as agents are less dangerous? Or both?
Right. So, considering that the most advanced AIs of a leading AI company such as OpenAI are not agents, what do you think of the following plan to solve or help solve AI risk: keep making more and more powerful Q&A AIs that are not agents until we have ones that are smarter than us, then ask them how to solve the problem. Do you think this is a safe and reasonable pursuit? Or do you think we just won’t get to superhuman intelligence that way?
You could get to superintelligence that way, except that before that happens, someone else is going to make an AI that actively seeks out information and navigates the real world.
And it’s not all that safe in an absolute sense—large sequence models are so trustworthy specifically because we’re using them on problems where we can give lots of examples of humans solving them. By default, when you ask a big Q&A AI how to solve alignment, it will just tell you the sort of bad answer a human would give. Trying to avoid that default carries risks, and just seems like the wrong thing to be doing. Building tools to help humans solve the problem isn’t crazy, but this is different than expecting the answer to spring fully formed from a big AI that you trust without knowing much about alignment.
I’m not sure I understand, do you mean that considering these possibilities is too difficult because there are too many or that it’s not a priority because AIs not designed as agents are less dangerous? Or both?
The latter, specifically because it’s less likely.
Right. So, considering that the most advanced AIs of a leading AI company such as OpenAI are not agents, what do you think of the following plan to solve or help solve AI risk: keep making more and more powerful Q&A AIs that are not agents until we have ones that are smarter than us, then ask them how to solve the problem. Do you think this is a safe and reasonable pursuit? Or do you think we just won’t get to superhuman intelligence that way?
You could get to superintelligence that way, except that before that happens, someone else is going to make an AI that actively seeks out information and navigates the real world.
And it’s not all that safe in an absolute sense—large sequence models are so trustworthy specifically because we’re using them on problems where we can give lots of examples of humans solving them. By default, when you ask a big Q&A AI how to solve alignment, it will just tell you the sort of bad answer a human would give. Trying to avoid that default carries risks, and just seems like the wrong thing to be doing. Building tools to help humans solve the problem isn’t crazy, but this is different than expecting the answer to spring fully formed from a big AI that you trust without knowing much about alignment.
Thank you.