AI existential risk is like climate change. It’s easy to come up with short slogans that make it seem ridiculous. Yet, when you dig deeper into each counterargument, you find none of them are very convincing, and the dangers are quite substantial. There’s quite a lot of historical evidence for the risk, especially in the impact humans have had on the rest of the world. I strongly encourage further, open-minded study.
michael_mjd
It’s easy to imagine that the AI will have an off switch, and that we could keep it locked in a box and ask it questions. But just think about it. If some animals were to put you in a box, do you think you would stay in there forever? Or do you think you’d figure a way out that they hadn’t thought of?
AI x-risk. It sounds crazy for two reasons. One, because we are used to nothing coming close to human intelligence, and two, because we are used to AI being unintelligent. For the first, the only point of comparison is imagining something that is to us what we are to cats. For the second, though we have not quite succeeded yet, it only takes one. If you have been following the news, we are getting close.
Yeah, I tend to agree. Just wanted to make sure I’m not violating norms. In that case, my specific thoughts are as follows, with a thought to implementing AI transparency at the end.
There is the observation that the transformer architecture doesn’t have a hidden state like an LSTM. I thought for a while something like this was needed for intelligence, to have a compact representation of the state one is in. (My biased view, that I’ve updated away from, was that the weights represented HOW to think, and less about knowledge.) However, it’s really intractable to back propagate over so many time steps, and transformers have shown us that you really don’t need to. The long term memory is just in the weights.
So, one obvious thing is you could simply make the language model continue to update on its dialogues, including its own response and that of its conversation partner(s). Over time, it could in theory learn from its own conversations and incorporate them into its long term memory, perhaps learning that this type of dialogue involving ‘itself’ is special. (It can even use special tokens or something if need be.)
As for doing variable length computation, the networks themselves do that with the number of layers they have, but that’s really a fixed feature of the architecture. Instead, variable length computation would really be a data issue. Basically, having enough data that “shows your work”. The model can learn to output intermediate results and use attention over those to continue its thought process. There is a small potential for baking transparency into the model this way, where you could force the model, through limited context windows, to carry its thoughts out using human language basically. Then if the model is doing a very careful analysis on whether it’s ready to take over the world, it would need to go through those calculations, and you could read them.
Of course maybe it’ll learn to use code words or something, but it’s a thought. Maybe you would always be able to ask it what was it thinking about there, and maybe the explanations it gives might not always be convincing.
Though, imagine that the machine starts outputting gibberish intermediate steps, and tells the human operators that thinking using human zipfian distributed words is highly inefficient, and the operators shrug their shoulders and say “Oh cool. Makes sense. Carry on.” As I hear said around here, that’s a way to die with less dignity.
I think this is absolutely correct. GPT-3/PaLM is scary impressive, but ultimately relies on predicting missing words, and its actual memory during inference is just the words in its context! What scares me about this is that I think there are some really simple low hanging fruit to modify something like this to be, at least, slightly more like an agent. Then plugging things like this as components into existing agent frameworks, and finally, having entire research programs think about it and experiment on it. Seems like the problem would crack. You never know, but it doesn’t look like we’re out of ideas any time soon.
This is a question for the community, is there any information hazard in speculating on specific technologies here? It would be totally fun, though seems like it could be dangerous...
My hope was initially that the market wasn’t necessarily focused on this direction. Big tech is generally focused on predicting user behavior, which LLMs look to dominate. But then there’s autonomous cars, and humanoid robots. No idea what will come of those. Thinking the car angle might be slightly safer, because of the need for transparency and explainability, a lot of the logic outside of perception might be hard coded. Humanoid robots… maybe they will take a long time to catch on, since most people are probably skeptical of them. Maybe factory automation...
As a ML engineer, I think it’s plausible. I also think there are some other factors that could act to cushion or mitigate slowdown. First, I think there are more low hanging fruit available. Now that we’ve seen what large transformer models can do on the text domain, and in a text-to-image Dall-E model, I think the obvious next step is to ingest large quantities of video data. We often talk about the sample inefficiency of modern methods as compared with humans, but I think humans are exposed to a TON of sensory data in building their world model. This seems an obvious next step. Though if hardware really stalls, maybe there won’t be enough compute or budget to train a 1T+ parameter multimodal model.
The second mitigating factor I think may be that funding has already been unlocked, to some extent. There is now a lot more money going around for basic research, possibly to the next big thing. The only thing that might stop it is maybe academic momentum into the wrong directions. Though from an x-risk standpoint, maybe that’s not a bad thing, heh.
In my mental model, if the large transformer models are already good enough to do what we’ve shown them to be able to do, it seems possible that the remaining innovations would be more on the side of engineering the right submodules and cost functions. Maybe something along the lines of Yann LeCun’s recent keynotes.
I work at a large, not quite FAANG company, so I’ll offer my perspective. It’s getting there. Generally, the research results are good, but not as good as they sound in summary. Despite the very real and very concerning progress, most papers you take at face value are a bit hyped. The exceptions to some extent are the large language models. However, not everyone has access to these. The open source versions of them are good but not earth shattering. I think they might be if the goal is to general fluent sounding chatbots, but this is not the goal of most work I am aware of. Companies, at least mine, are hesitant on this because they are worried the bot will say something dumb, racist, or just made-up. Most internet applications are more to do with recommendation, ranking, and classification. In these settings large language models are helping, though they often need to be domain adapted. In those cases they are often only helping +1-2% over well trained classical models, e.g. logistic regression. Still a lot revenue-wise though. They are also big and slow and not suited for every application yet, at least not until the infrastructure (training and serving) catches up. A lot of applications are therefore comfortable iterating on smaller end-to-end trained models, though they are gradually adopting features from large models. They will get there, in time. Progress is also slower in big companies, since (a) you can’t simply plug in somebody’s huggingface model or code and be done with it, (b) there are so many meetings to be had to discuss ‘alignment’ (not that kind) before anything actually gets done.For some of your examples:
* procedurally generated music. From what I’ve listened to, the end-to-end generated music is impressive but not impressive enough that I would listen to it for fun. They seem to have little large scale coherence. However this seems like someone could step in and introduce some inductive bias (for example, verse-bridge-chorus repeating song structure), and actually get something good. Maybe they should stick to to instrumental and have a singer-songwriter riff on it. I just don’t think any big name record companies are funding this at the moment, probably they have little institutional AI expertise and think it’s a risk, especially to bring on teams of highly paid engineers.
* tools for writers to brainstorm. I think GPT-3 has this as an intended use case? At the moment there are few competitors to make such a large model, so we will see how their pilot users like it.
* photoshop with AI tools. That sounds like it should be a thing. Wonder why Adobe hasn’t picked that up (if they haven’t? if it’s still in development?). Could be an institutional thing.
* Widely available self driving cars. IMO I think real-world agents are still missing some breakthroughs. That’s one of the last hurdles I think that will be broken to AGI. It’ll happen but I would not be surprised if it is slower than expected.
* Physics simulators. Not sure really. I suspect this might be a case of overhyped research papers. Who knows? I actually used to work on this in grad school, using old fashioned finite difference / multistep / RK methods. Usually relying on taylor series coefficients canceling out nicely, or doing gaussian quadrature. On the one hand I can imagine it hard to beat such precisely defined models, but on the other hand, at the end of the day it’s sort of assuming nice properties of functions in a generic way, I can easily imagine a tuned DL stencil doing better for specific domains, e.g. fluids or something. Still, it’s hard to imagine it being a slam dunk rather than an iterative improvement.
* Paradigmatically different and better web search. I think we are actually getting there. When I say “hey google”, I actually get very real answers to my questions 90% of the time. It’s crazy to me. Kids love it. Though I may be in the minority. I always see reddit threads about people saying that google search has gotten worse. I think there’s a lot of people who are very used to keyword based searches and are not used to the model trying to anticipate them. This will slow adoption since metrics won’t be universally lifted across all users. Also, there’s something to be said for the goodness of old fashioned look up tables.My take on your reasons—they are mostly spot on.
1. Yes | The research results are actually not all that applicable to products; more research is needed to refine them
2. Yes | They’re way too expensive to run to be profitable
3. Yes | Yeah, no, it just takes a really long time to convert innovation into profitable, popular product
4. No, but possibly institutional momentum | Something something regulation?
5. No | The AI companies are deliberately holding back for whatever reason
6. Yes, incrementally | The models are already integrated into the economy and you just don’t know it.
Given some of it is institutional slowness, there is room for disruption, which is probably why VC’s are throwing money at people. Still though, in many cases a startup is going to have a hard time competing with the compute resources of larger companies.
I posted something I think could be relevant to this: https://www.lesswrong.com/posts/PfbE2nTvRJjtzysLM/instrumental-convergence-to-offer-hope
The takeaway is, for a sufficiently advanced agent, who wants to hedge against the possibility of itself being destroyed by a greater power, may decide the only surviving plan is to allow the lesser life forms some room to optimize their own utility. It’s sort of an asymmetrical infinite game theoretic chain. If every agent kills lower agents, only the maximum survives and no one knows if they are the maximum. If there even is a maximum.
War. Poverty. Inequality. Inhumanity. We have been seeing these for millennia caused by nation states or large corporations. But what are these entities, if not greater-than-human-intelligence systems, who happen to be misaligned with human well-being? Now, imagine that kind of optimization, not from a group of humans acting separately, but by an entity with a singular purpose, with an ever diminishing proportion of humans in the loop.
Audience: all, but maybe emphasizing policy makers
Thanks for pointing to ECL, this looks fascinating!
I like to think of it not like trying to show that agent B is not a threat to C. The way it’s set up we can probably assume B has no chance against C. C also may need to worry about agent D, who is concerned about hypothetical agent E, etc. I think that at some level, the decision an agent X makes is the decision all remaining agents in the hierarchy will make.
That said I sort of agree that’s the real fear about this method. It’s kind of like using super-rationality or something else to solve the prisoner’s dilemma. Are you willing to bet your life the other player would still not choose Defect, despite what the new theory says? That said I feel like there’s something there, whether this would work, and if not, would need some kind of clarification from decision theory.
For ML researchers.