I think this is absolutely correct. GPT-3/PaLM is scary impressive, but ultimately relies on predicting missing words, and its actual memory during inference is just the words in its context! What scares me about this is that I think there are some really simple low hanging fruit to modify something like this to be, at least, slightly more like an agent. Then plugging things like this as components into existing agent frameworks, and finally, having entire research programs think about it and experiment on it. Seems like the problem would crack. You never know, but it doesn’t look like we’re out of ideas any time soon.
This is a question for the community, is there any information hazard in speculating on specific technologies here? It would be totally fun, though seems like it could be dangerous...
My hope was initially that the market wasn’t necessarily focused on this direction. Big tech is generally focused on predicting user behavior, which LLMs look to dominate. But then there’s autonomous cars, and humanoid robots. No idea what will come of those. Thinking the car angle might be slightly safer, because of the need for transparency and explainability, a lot of the logic outside of perception might be hard coded. Humanoid robots… maybe they will take a long time to catch on, since most people are probably skeptical of them. Maybe factory automation...
My opinion is that you’re not going to be able to crack the alignment problem if you have a phobia of infohazards. Essentially you need a ‘Scout Mindset’. There’s already smart people working hard on the problem, including in public such as on podcasts, so realistically the best (or worst) could do on this forum is attempt to parse out what is known publicly about the scary stuff (eg agency) from DeepMind’s papers and then figure out if there is a path forward towards alignment.
Yeah, I tend to agree. Just wanted to make sure I’m not violating norms. In that case, my specific thoughts are as follows, with a thought to implementing AI transparency at the end.
There is the observation that the transformer architecture doesn’t have a hidden state like an LSTM. I thought for a while something like this was needed for intelligence, to have a compact representation of the state one is in. (My biased view, that I’ve updated away from, was that the weights represented HOW to think, and less about knowledge.) However, it’s really intractable to back propagate over so many time steps, and transformers have shown us that you really don’t need to. The long term memory is just in the weights.
So, one obvious thing is you could simply make the language model continue to update on its dialogues, including its own response and that of its conversation partner(s). Over time, it could in theory learn from its own conversations and incorporate them into its long term memory, perhaps learning that this type of dialogue involving ‘itself’ is special. (It can even use special tokens or something if need be.)
As for doing variable length computation, the networks themselves do that with the number of layers they have, but that’s really a fixed feature of the architecture. Instead, variable length computation would really be a data issue. Basically, having enough data that “shows your work”. The model can learn to output intermediate results and use attention over those to continue its thought process. There is a small potential for baking transparency into the model this way, where you could force the model, through limited context windows, to carry its thoughts out using human language basically. Then if the model is doing a very careful analysis on whether it’s ready to take over the world, it would need to go through those calculations, and you could read them.
Of course maybe it’ll learn to use code words or something, but it’s a thought. Maybe you would always be able to ask it what was it thinking about there, and maybe the explanations it gives might not always be convincing.
Though, imagine that the machine starts outputting gibberish intermediate steps, and tells the human operators that thinking using human zipfian distributed words is highly inefficient, and the operators shrug their shoulders and say “Oh cool. Makes sense. Carry on.” As I hear said around here, that’s a way to die with less dignity.
I think this is absolutely correct. GPT-3/PaLM is scary impressive, but ultimately relies on predicting missing words, and its actual memory during inference is just the words in its context! What scares me about this is that I think there are some really simple low hanging fruit to modify something like this to be, at least, slightly more like an agent. Then plugging things like this as components into existing agent frameworks, and finally, having entire research programs think about it and experiment on it. Seems like the problem would crack. You never know, but it doesn’t look like we’re out of ideas any time soon.
This is a question for the community, is there any information hazard in speculating on specific technologies here? It would be totally fun, though seems like it could be dangerous...
My hope was initially that the market wasn’t necessarily focused on this direction. Big tech is generally focused on predicting user behavior, which LLMs look to dominate. But then there’s autonomous cars, and humanoid robots. No idea what will come of those. Thinking the car angle might be slightly safer, because of the need for transparency and explainability, a lot of the logic outside of perception might be hard coded. Humanoid robots… maybe they will take a long time to catch on, since most people are probably skeptical of them. Maybe factory automation...
My opinion is that you’re not going to be able to crack the alignment problem if you have a phobia of infohazards. Essentially you need a ‘Scout Mindset’. There’s already smart people working hard on the problem, including in public such as on podcasts, so realistically the best (or worst) could do on this forum is attempt to parse out what is known publicly about the scary stuff (eg agency) from DeepMind’s papers and then figure out if there is a path forward towards alignment.
Yeah, I tend to agree. Just wanted to make sure I’m not violating norms. In that case, my specific thoughts are as follows, with a thought to implementing AI transparency at the end.
There is the observation that the transformer architecture doesn’t have a hidden state like an LSTM. I thought for a while something like this was needed for intelligence, to have a compact representation of the state one is in. (My biased view, that I’ve updated away from, was that the weights represented HOW to think, and less about knowledge.) However, it’s really intractable to back propagate over so many time steps, and transformers have shown us that you really don’t need to. The long term memory is just in the weights.
So, one obvious thing is you could simply make the language model continue to update on its dialogues, including its own response and that of its conversation partner(s). Over time, it could in theory learn from its own conversations and incorporate them into its long term memory, perhaps learning that this type of dialogue involving ‘itself’ is special. (It can even use special tokens or something if need be.)
As for doing variable length computation, the networks themselves do that with the number of layers they have, but that’s really a fixed feature of the architecture. Instead, variable length computation would really be a data issue. Basically, having enough data that “shows your work”. The model can learn to output intermediate results and use attention over those to continue its thought process. There is a small potential for baking transparency into the model this way, where you could force the model, through limited context windows, to carry its thoughts out using human language basically. Then if the model is doing a very careful analysis on whether it’s ready to take over the world, it would need to go through those calculations, and you could read them.
Of course maybe it’ll learn to use code words or something, but it’s a thought. Maybe you would always be able to ask it what was it thinking about there, and maybe the explanations it gives might not always be convincing.
Though, imagine that the machine starts outputting gibberish intermediate steps, and tells the human operators that thinking using human zipfian distributed words is highly inefficient, and the operators shrug their shoulders and say “Oh cool. Makes sense. Carry on.” As I hear said around here, that’s a way to die with less dignity.