I would feel much more concerned about advances in reinforcement learning, rather than training on large datasets. As surprising as some of the things that GPT-3 and the like are able to do, there is a direct logical link between the capability and the task of predicting tokens. Detecting and repeating patterns, translations, storytelling, programming. I don’t see a link between predicting tokens and overthrowing the government or even manipulating a single person into doing something. There is no reward for that, I don’t particularly see any variation of this training setup where there would be reward for that.
I think it’s also safe to assume that rapid advances in one subfield of NN research don’t necessarily translate to advances in other subfields. In fact it seems like different subfields enjoy levels of success that differ by orders of magnitude. So to that end, I think tracking progress for transformer architectures might not be a good proxy for progress in other, more risky fields, like reinforcement learning.
I’d agree that equivalently rapid progress in something like deep reinforcement learning would be dramatically more concerning. If we were already getting such high quality results while constructing a gradient out of noisy samples of a sparse reward function, I’d have to shorten my timelines even more. RL does tend to more directly imply agency, and it would also hurt my estimates on the alignment side of things in the absence of some very hard work (e.g. implemented with IB-derived proof of ‘regret bound is alignment’ or somesuch).
I also agree that token predictors are less prone to developing these kinds of directly worrisome properties, particularly current architectures with all their limitations.
I’m concerned that advancements on one side will leak into others. It might not look exactly the same as most current deep RL architectures, but they might still end up serving similar purposes and having similar risks. Things like decision transformers come to mind. In the limit, it wouldn’t be too hard to build a dangerous agent out of an oracle.
Maybe there is some consolation in that if the humanity were to arrive at something approaching AGI, it would rather be better for it to do so using an architecture that’s limited in its ultimate capability, demonstrates as little natural agency as possible, ideally that’s a bit of a dead end in terms of further AI development. It could serve as a sort of vaccine if you will.
Running with the singularity scenario for a moment, I have very serious doubts that a purely theoretical research performed largely in a vacuum will yield any progress on AI safety. The history of science certainly doesn’t imply that we will solve this problem before it becomes a serious threat. So the best case scenario we can hope for is that the first crisis caused by the AGI will not be fatal due to the underlying technology’s limitations and manageable speed of improvement.
I would feel much more concerned about advances in reinforcement learning, rather than training on large datasets. As surprising as some of the things that GPT-3 and the like are able to do, there is a direct logical link between the capability and the task of predicting tokens. Detecting and repeating patterns, translations, storytelling, programming. I don’t see a link between predicting tokens and overthrowing the government or even manipulating a single person into doing something. There is no reward for that, I don’t particularly see any variation of this training setup where there would be reward for that.
I think it’s also safe to assume that rapid advances in one subfield of NN research don’t necessarily translate to advances in other subfields. In fact it seems like different subfields enjoy levels of success that differ by orders of magnitude. So to that end, I think tracking progress for transformer architectures might not be a good proxy for progress in other, more risky fields, like reinforcement learning.
I’d agree that equivalently rapid progress in something like deep reinforcement learning would be dramatically more concerning. If we were already getting such high quality results while constructing a gradient out of noisy samples of a sparse reward function, I’d have to shorten my timelines even more. RL does tend to more directly imply agency, and it would also hurt my estimates on the alignment side of things in the absence of some very hard work (e.g. implemented with IB-derived proof of ‘regret bound is alignment’ or somesuch).
I also agree that token predictors are less prone to developing these kinds of directly worrisome properties, particularly current architectures with all their limitations.
I’m concerned that advancements on one side will leak into others. It might not look exactly the same as most current deep RL architectures, but they might still end up serving similar purposes and having similar risks. Things like decision transformers come to mind. In the limit, it wouldn’t be too hard to build a dangerous agent out of an oracle.
Maybe there is some consolation in that if the humanity were to arrive at something approaching AGI, it would rather be better for it to do so using an architecture that’s limited in its ultimate capability, demonstrates as little natural agency as possible, ideally that’s a bit of a dead end in terms of further AI development. It could serve as a sort of vaccine if you will.
Running with the singularity scenario for a moment, I have very serious doubts that a purely theoretical research performed largely in a vacuum will yield any progress on AI safety. The history of science certainly doesn’t imply that we will solve this problem before it becomes a serious threat. So the best case scenario we can hope for is that the first crisis caused by the AGI will not be fatal due to the underlying technology’s limitations and manageable speed of improvement.
To people who downvote, it would be much more helpful, if you actually wrote a reply. I’m happy to be proven wrong.