I wasn’t imagining this being a good thing that helps save the world; I was imagining it being a world-ending thing that someone does anyway because they don’t realize how dangerous it is.
I totally agree that the two examples you gave probably wouldn’t work. How about this though:
--Our task will be: Be a chatbot. Talk to users over the course of several months to get them to give you high marks in a user satisfaction survey.
--Pre-train the model on logs of human-to-human chat conversations so you have a reasonable starting point for making predictions about how conversations go.
--Then run the efficientzero algorithm, but with a massively larger parameter count, and talking to hundreds of thousands (millions?) of humans for several years. It would be a very expensive, laggy chatbot (but the user wouldn’t care since they aren’t paying for it and even with lag the text comes in about as fast as a human would reply)
Seems to me this would “work” in the sense that we’d all die within a few years of this happening, on the default trajectory.
In a similar conversation about non-main-actor paths to dangerous AI I came up with this as an example of a path I can imagine being plausible and dangerous: A plausible-to-me worst case scenario would be something like: A phone-scam organization employs someone to build them a online-learning reinforcement learning agent (using an open-source language model as a language-understanding-component) that functions as a scam-helper. It takes in the live transcription of the ongoing conversation between a scammer and a victim, and gives the scammer suggestions for what to say next to persuade the victim to send money. So long as it was even a bit helpful sometimes according to the team of scammers using it, more resources would be given to it and it would continue to collect useful data.
I think this scenario contains a number of dangerous aspects: being illegal and secret, not subject to ethical or safety guidance or regulation deliberately being designed to open-endedly self-improve bringing in incremental resources as it trains to continue to prove its worth (thus not needing a huge initial investment of training cost)
being agentive and directed at the specific goal of manipulating and deceiving humans
I don’t think we need 10 more years of progress in algorithms and compute for this story to be technologically feasible. A crude version of this is possibly already in use, and we wouldn’t know.
I wasn’t imagining this being a good thing that helps save the world; I was imagining it being a world-ending thing that someone does anyway because they don’t realize how dangerous it is.
I totally agree that the two examples you gave probably wouldn’t work. How about this though:
--Our task will be: Be a chatbot. Talk to users over the course of several months to get them to give you high marks in a user satisfaction survey.
--Pre-train the model on logs of human-to-human chat conversations so you have a reasonable starting point for making predictions about how conversations go.
--Then run the efficientzero algorithm, but with a massively larger parameter count, and talking to hundreds of thousands (millions?) of humans for several years. It would be a very expensive, laggy chatbot (but the user wouldn’t care since they aren’t paying for it and even with lag the text comes in about as fast as a human would reply)
Seems to me this would “work” in the sense that we’d all die within a few years of this happening, on the default trajectory.
In a similar conversation about non-main-actor paths to dangerous AI I came up with this as an example of a path I can imagine being plausible and dangerous: A plausible-to-me worst case scenario would be something like:
A phone-scam organization employs someone to build them a online-learning reinforcement learning agent (using an open-source language model as a language-understanding-component) that functions as a scam-helper. It takes in the live transcription of the ongoing conversation between a scammer and a victim, and gives the scammer suggestions for what to say next to persuade the victim to send money. So long as it was even a bit helpful sometimes according to the team of scammers using it, more resources would be given to it and it would continue to collect useful data.
I think this scenario contains a number of dangerous aspects:
being illegal and secret, not subject to ethical or safety guidance or regulation
deliberately being designed to open-endedly self-improve
bringing in incremental resources as it trains to continue to prove its worth (thus not needing a huge initial investment of training cost)
being agentive and directed at the specific goal of manipulating and deceiving humans
I don’t think we need 10 more years of progress in algorithms and compute for this story to be technologically feasible. A crude version of this is possibly already in use, and we wouldn’t know.