Agree that we still disagree and (in my biased opinion) that claim is either more interesting or more true than you realize :) Not free for a call soon but hope eventually there is an opportunity to discuss more.
boazbarak
Within a particular genus or architecture, more neurons would be higher intelligence. Comparing between completely different neural network types is indeed problematic
I discussed Gwern’s article in another comment. My point (which also applies to Gwern’s essay on GPT3 and scaling hypothesis) is the following:
I don’t dispute that you can build agent AIs, and that they can be useful.
I don’t claim that it is possible to get the same economic benefits by restricting to tool AIs. Indeed, in my previous post with Edelman, we explicitly said that we do consider AIs that are agentic in the sense that they can take action, including self-driving, writing code, executing trades etc..
I don’t dispute that one way to build those is to take a next-token predictor such as pretrained GPT3, and then use fine-tuning, RHLF, prompt engineering or other methods to turn it into an agent AI. (Indeed, I explicitly say so in the current post.)
My claim is that it is a useful abstraction to (1) separate intelligence from agency, and (2) intelligence in AI is a monotone function of the computational resources (FLOPs, data, model size, etc.) invested into building the model.
Now if you want to take 3.6 Trillion gradient steps in a model, then you simply cannot do it by having it take actions and wait to get some reward. So I do claim that if we buy the scaling hypothesis that intelligence scales with compute, the bulk of the intelligence of models such as GPT-n, PALM-n, etc. comes from the non agentic next-token predictor.
So, I believe it is useful and more accurate to think of (for example) a stock trading agent that is built on top of GPT-4 as consisting of an “intelligence forklift” which accounts for 99.9% of the computational resources, plus various layers of adaptations, including supervised fine-tuning, RL from human feedback, and prompt engineering, to obtain the agent.
The above perspective does not mean that the problem of AI safety or alignment is solved. But I do think it is useful to think of intelligence as belonging to a system rather than an individual agent, and (as discussed briefly above) that considering it in this way changes somewhat the landscape of both problems and solutions.
I was asked about this on Twitter. Gwern’s essay deserves a fuller response than a comment but I’m not arguing for the position Gwern argues against.
I don’t argue that agent AI are not useful or won’t be built. I am not arguing that humans must always be in the loop.
My argument is that tool vs agent AI is not so much about competition but specialization. Agent AIs have their uses but if we consider the “deep learning equation” of turning FLOPs into intelligence, then it’s hard to beat training for predictions on static data. So I do think that while RL can be used forAI agents, the intelligence “heavy lifting” (pun intended) would be done by non-agentic tool but very large static models.
Even “hybrid models” like GPT3.5 can best be understood as consisting of an “intelligence forklift”—the pretrained next-token predictor on which 99.9% of the FLOPs were spent on building—and an additional light “adapter” that turns this forklift into a useful Chatbot etc
- May 21, 2023, 7:34 PM; 3 points) 's comment on GPT as an “Intelligence Forklift.” by (
That’s pretty interesting about monkeys! I am not sure I 100% buy the nyths theory, but it’s certainly the case that developing language to talk about events that are not immediate in space or times is essential to coordinate a large scale society
Thank you! You’re right. Another point is that intelligence and agency are independent, and a tool AI can be (much) more intelligent than an agentic one.
GPT as an “Intelligence Forklift.”
I don’t think it’s fair to compare parameter sizes between language models and models for other domains, such as games or vision. E.g., I believe AlphaZero is also only in the range of hundreds of millions of parameters? (quick google didn’t give me the answer)
I think there is a real difference between adversarial and natural distribution shifts, and without adversarial training, even large network struggle with adversarial shifts. So I don’t think this is a problem that would go away with scale alone. At least I don’t see evidence for it from current data (failure of defenses for small models is no evidence of success of size alone for larger ones).
One way to see this is to look at the figures in this plotting playground of “accuracy on the line”. This is the figure for natural distribution shift—the green models are the ones that are trained with more data, and they do seem to be “above the curve” (significantly so for CLIP, which are the two green dots reaching ~ 53 and ~55 natural distribution accuracy compared to ~60 and ~63 vanilla accuracy
In contrast, if you look at adversarial perturbations, then you can see that actual adversarial training (bright orange) or other robustness interactions (brown) is much more effective than more data (green) which in fact mostly underperform.
(I know you focused on “more model” but I think to first approximation “more model” and “more data” should have similar effects.)
Will read later the links—thanks! I confess I didn’t read the papers (though saw a talk partially based on the first one which didn’t go into enough details for me to know the issues) but also heard from people that I trust of similar issues with Chess RL engines (can be defeated with simple strategies if you are looking for adversarial ones). Generally it seems fair to say that adversarial robustness is significantly more challenging than the non adversarial case and it does not simply go away on its own with scale (though some types of attacks are automatically motivated with diversity of training data / scenarios).
Thank you! I think that what we see right now is that as the horizon grows, the more “tricks” we need to make end-to-end learning works, to the extent that it might not really be end to end. So while supervised learning is very successful, and seems to be quite robust to choice of architecture, loss functions, etc., in RL we need to be much more careful, and often things won’t work “out of the box” in a purely end to end fashion.
I think the question would be how performance scales with horizon, if the returns are rapidly diminishing, and the cost to train is rapidly increasing (as might well be the case because of diminishing gradient signals, and much smaller availability of data), then it could be that the “sweet spot” of what is economical to train would remain at a reasonably short horizon (far shorter than the planning needed to take over the world) for a long time.
Can you send links? In any case I do believe that it is understood that you have to be careful in a setting where you have two models A and B, where B is a “supervisor” of the output of A, and you are trying to simultaneously teach B to come up with good metric to judge A by, and teach A to come up with outputs that optimize B’s metric. There can be equilibriums where A and B jointly diverge from what we would consider “good outputs”.
This for example comes up in trying to tackle “over optimization” in instructGPT (there was a great talk by John Schulman in our seminar series a couple of weeks ago), where model A is GPT-3, and model B tries to capture human scores for outputs. Initially, optimizing for model B induces optimizing for human scores as well, but if you let model A optimize too much, then it optimizes for B but becomes negatively correlated with the human scores (i.e., “over optimizes”).
Another way to see this issue is even for powerful agents like AlphaZero are susceptible to simple adversarial strategies that can beat them: see “Adversarial Policies Beat Professional-Level Go AIs” and “Are AlphaZero-like Agents Robust to Adversarial Perturbations?”.
The bottom line is that I think we are very good at optimizing any explicit metric , including when that metric is itself some learned model. But generally, if we learn some model s.t. , this doesn’t mean that if we let then it would give us an approximate maximizer of as well. Maximizing would tend to push to the extreme parts of the input space, which would be exactly those where deviates from .
The above is not an argument against the ability to construct AGI as well, but rather an argument for establishing concrete measurable goals that our different agents try to optimize, rather than trying to learn some long-term equilibrium. So for example, in the software-writing and software-testing case, I think we don’t simply want to deploy two agents A and B playing a zero-sum game where B’s reward is the number of bugs found in A’s code.
Hi Vanesssa,
Perhaps given my short-term preference, it’s not surprising that I find it hard to track very deep comment threads, but let me just give a couple of short responses.
I don’t think the argument on hacking relied on the ability to formally verify systems. Formally verified systems could potentially skew the balance of power to the defender side, but even if they don’t exist, I don’t think balance is completely skewed to the attacker. You could imagine that, like today, there is a “cat and mouse” game, where both attackers and defenders try to find “zero day vulnerabilities” and exploit (in one case) or fix (in the other). I believe that in the world of powerful AI, this game would continue, with both sides having access to AI tools, which would empower both but not necessarily shift the balance to one or the other.
I think the question of whether a long-term planning agent could emerge from short-term training is a very interesting technical question! Of course we need to understand how to define “long term” and “short term” here. One way to think about this is the following: we can define various short-term metrics, which are evaluable using information in the short-term, and potentially correlated with long-term success. We would say that a strategy is purely long-term if it cannot be explained by making advances on any combination of these metrics.
My forecast would be that an AI that operates autonomously for long periods would be composed of pieces that make human-interpretable progress in the short term. For example, a self-driving car will be able to eventually to drive to New York to Los Angeles, but I believe it would do so by decomposing the task into many small tasks of getting from point A to B. It would not do so by sending it out to the world (or even a simulated world) and repeatedly playing a game where it gets a reward if it reaches Los Angeles, and gets nothing if it doesn’t.
Quick comment (not sure it’s realted to any broader points): total compute for N models with 2M parameters is roughly 4NM^2 (since per Chinchilla, number of inference steps scales linearly with model size, and number of floating point operations also scales linearly, see also my calculations here). So an equal total compute cost would correspond to k=4.
What I was thinking when I said “power” is that it seems that in most BIG-Bench scales, if you put the y axis some measure of performance (e.g. accuracy) then it seems to scale as some linear or polynomial way in the log of parameters, and indeed I belive the graphs in that paper usually have log parameters in the X axis. It does seem that when we start to saturate performance (error tends to zero), the power laws kick in, and its more like inverse polynomial in the total number of parameters than their log.
Thanks! Some quick comments (though I think at some point we are getting to deep in threads that it’s hard to keep track..)
When saying that GAN training issues are “well understood” I meant that it is well understood that it is a problem, not that it’s well understood how to solve that problem…
One basic issue is that I don’t like to assign probabilities to such future events, and am not sure there is a meaningful way to distinguish between 75% and 90%. See my blog post on longtermism.
The general thesis is that when making long-term strategies, we will care about improving concrete metrics rather than thinking of very complex strategies that don’t make any measurable gains in the short term. So an Amazon Engineer would need to say something like “if we implement my code X then it would reduce latency by Y”, which would be a fairly concrete and measurable goal and something that humans could understand even if they couldn’t understand the code X itself or how it came up with it. This differs from saying something like “if we implement my code X, then our competitors would respond with X’, then we could respond with X″ and so on and so forth until we dominate the market”
When thinking of AI systems and their incentives, we should separate training, fine tuning, and deployment. Human engineers might get bonuses for their performance on the job, which corresponds to mixing “fine tuning” and “deployments”. I am not at all sure that would be a good idea for AI systems. It could lead to all kinds of over-optimization issues that would be clear for people without leading to doom. So we might want to separate the two and in some sense keep the AI disinterested about the code that it actually uses in deployment.
Thanks for so many comments! I do plan to read them carefully and respond, but it might take me a while. In the meantime, Scott Aaronson also has a relevant blog https://scottaaronson.blog/?p=6821
Happy thanksgiving to all who celebrate it!
I do not claim that AI cannot set long-term strategies. My claim is that this is not where AI’s competitive advantages over humans will be. I could certainly imagine that a future AI would be 10 times better than me in proving mathematical theorems. I am not at all sure it would be 10 times better than Joe Biden in being a U.S. president, and mostly it is because I don’t think that the information-processing capabilities are really the bottleneck for that job. (Though certainly, the U.S. as a whole, including the president, would benefit greatly from future AI tools, and it is quite possible that some of Biden’s advisors would be replaced by AIs.)
As you probably imagine given my biography :) , I am never against any research, and definitely not for reasons of practical utility. So am definitely very supportive of research on alignment, and not claiming that it shouldn’t be done. In my view, one of the interesting technical questions is to what extent can long-term goals emerge from systems trained with short-term objectives, and (if it happens) whether we can prevent this while still keeping short-term performance as good. One reason I like the focus on the horizon rather than alignment with human values is that the former might be easier to define and argue about. But this doesn’t mean that we should not care about the latter.
Hi Vanessa,
Let me try to respond (note the claim numbers below are not the same as in the essay, but rather as in Vanessa’s comment):
Claim 1: Our claim is that one can separate out components—there is the predictable component which is non stationary, but is best approximated with a relatively simple baseline, and the chaotic component, which over the long run is just noise.In general, highly complex rules are more sensitive to noise (in fact, there are theorems along these lines in the field of Analysis of Boolean Functions), and so in the long run, the simpler component will dominate the accuracy.
Claim 2: Hacking is actually a fairly well-specified endeavor. People catalog, score, and classify security vulnerabilities. To hack would be to come up with a security vulnerability, and exploit code, which can be verified. Also, you seem to be envisioning a long-term AI that is then fine-tuned on a short-term task, but how did it evolve these long-term goals in the first place?
Claim 3: I would not say that there is no such thing as talent in being a CEO or presidents. I do however believe that the best leaders have been some combination of their particular characteristics and talents, and the situation they were in. Steve Jobs has led Apple to become the largest company in the world, but it is not clear that he is a “universal CEO” that would have done as good in any company (indeed he failed with NeXT). Similarly, Abraham Lincoln is typically ranked as the best U.S. president by historians, but again I think most would agree that he fit well the challenge that he had to face, rather than being someone that would have just as well handled the cold war or the 1970s energy crisis. Also, as Yafah points elsewhere here, for people to actually trust an AI with being the leader of a company or a country, it would need to not just be as good as humans or a little better, but better by a huge margin. In fact, most people’s initial suspicion is that AIs (or even humans that don’t look like them) is not “aligned” with their interests, and if you don’t convince them otherwise, their default would be to keep them from positions of power.
Claim 4: The main point is that we need to measure the powers of a system as a whole, not compare the powers of an individual human with an individual AI. Clearly, if you took a human, made their memory capacity 10 times bigger, and made their speed 10 times faster, then they could do more things. But we are comparing with the case that humans will be assisted with short-term AIs that would help them in all of the tasks that are memory and speed intensive.
Yes , the point is that once you fixed architecture and genus (eg connections etc), more neurons/synapses leads to more capabilities