In addition to what the other comments are saying:
If you get strongly superhuman LLMs, you can trivially accelerate scientific progress on agentic forms of AI like Reinforcement Learning by asking it to predict continuations of the most cited AI articles of 2024, 2025, etc. (have the year of publication, citation number and journal of publication as part of the prompt). Hence at the very least superhuman LLMs enable the quick construction strong agentic AIs.
Second, the people who are building Bing Chat are really looking for ways to make it as agentic as possible, it’s already searching the internet, it’s gonna be integrated inside the Edge browser soon, and I’d bet that a significant research effort is going into making it interact with the various APIs available over the internet. All economic and research interests are pushing towards making it as agentic as possible.
Agree, and I would add, even if the oracle doesn’t accidentally spawn a demon that tries to escape on its own, someone could pretty easily turn it into an agent just by driving it with an external event loop.
I.e., ask it what a hypothetical agent would do (with, say, a text interface to the Internet) and then forward its queries and return the results to the oracle, repeat.
With public access, someone will eventually try this. The conversion barrier is just not that high. Just asking an otherwise passive oracle to imagine what an agent might do just about instantiates one. If said imagined agent is sufficiently intelligent, it might not take very many exchanges to do real harm or even FOOM, and if the loop is automated (say, a shell script) rather than a human driving each step manually, it could potentially do a lot of exchanges on a very short time scale, potentially making a somewhat less intelligent agent powerful enough to be dangerous.
I highly doubt the current Bing AI is yet smart enough to create an agent smart enough to be very dangerous (much less FOOM), but it is an oracle, with all that implies. It could be turned into an agent, and such an agent will almost certainly not be aligned. It would only be relatively harmless because it is relatively weak/stupid.
To rephrase, my prior is that LLM just predict next words (it is their only capability). I would be worried when a LLM does something else (though I think it cannot happen), that would be what I would call “misalignment”.
On the meantime , what I read a lot about people worrying about ChatGPT/Bing sounds just like anthropomorphizing the AI with the prior that it can be sentient, have “intents” / and to me it is just not right.
I am not sure to understand how having the ability to search the internet change dramatically that.
If a LLM, when p(next words) is too low, can ‴″decide‴′ to search the internet to have better inputs, I do not feel that it makes a change in what I say above.
I do not want to have a too long fruitless discussion, I think indeed that I have to continue to read some materials on AI safety to better understand what are your models , but at this stage to be honest I cannot help thinking that some comments or posts are made by people who lack some basic understanding about what a LLM is , which may result in anthropomorphizing AI more than it should be. It is very easy when you do not know what a LLM is to wonder for exemple “CHAT Gpt answered that, but he seems to say that to not hurt me, I wonder what does ChatGPT really think ?” and I typically think that this sentence makes no sense at all, because of what a LLM is.
The predicting next token thing is the output channel. Strictly logically speaking, this is independent of agenty-ness of the neural network. You can have anything, from a single rule-based table looking only at the previous token to a superintelligent agent, predicting the next token.
I’m not saying ChatGPT has thoughts or is sentient, but I’m saying that it trying to predict the next token doesn’t logically preclude either. If you lock me into a room and give me only a single output channel in which I can give probability distributions over the next token, and only a single input channel in which I can read text, then I will be an agent trying to predict the next token, and I will be sentient and have thoughts.
Plus, the comment you’re responding to gave an example of how you can use token prediction specifically to build other AIs. (You responded to the third paragraph, but not the second.)
I agree with your reasoning strictly logically speaking, but it seems to me that a LLM cannot be sentient or have thoughts, even theoritically, and the burden of proof seems strongly on the side of someone who would made opposite claims.
And for someone who do not know what is a LLM, it is of course easy to anthropomorphize the LLM for obvious reasons (it can be designed to sound sentient or to express ‘thoughts’), and it is my feeling that this post was a little bit about that.
Overall, I find the arguments that I received after my first comment more convincing in making me feel what could be the problem, than the original post.
As for the possibility of a LLM to accelerate scientific progress towards agentic AI, I am skeptical, but I may be lacking imagination.
And again, nothing in the exemples presented in the original post is related to this risk, It seems that people that are worried are more trying to find exemples where the “character” of the AI is strange (which in my opinion are mistaken worries due to anthropomorphization of the AI), rather than finding exemples where the AI is particularly “capable” in terms of generating powerful reasoning or impressive “new ideas” (maybe also because at this stage the best LLM are far from being there).
I agree with your reasoning strictly logically speaking, but it seems to me that a LLM cannot be sentient or have thoughts, even theoritically,
This seems not-obvious—ChatGPT is a neural network, and most philosophers and AI people do think that neural networks can be conscious if they run the right algorithm. (The fact that it’s a language model doesn’t seem very relevant here for the same reason as before; it’s just a statement about its final layer.)
(maybe also because at this stage the best LLM are far from being there).
I think the most important question is about where on a reasoning-capability scale you would put
GPT-2
ChatGPT/Bing
human-level intelligence
Opinions on this vary widely even between well informed people. E.g., if you think (1) is a 10, (2) an 11, and (3) a 100, you wouldn’t be worried. But if it’s 10 → 20 → 50, that’s a different story. I think it’s easy to underestimate how different other people’s intuitions are from yours. But depending on your intuitions, you could consider the dog thing as an example that Bing is capable of “powerful reasoning”.
I think that the “most” in the sentence “most philosophers and AI people do think that neurol networks can be conscious if they run the right algorithm” is an overstatement, though I do not know to what extent.
I have no strong view on that, primarly because I think I lack some deep ML knowledge (I would weigh far more the view of ML experts than the view of philosophers on this topic).
Anyway, even accepting that neural networks can be conscious with the right algorithm, I think I disagree about “the fact that it’s a language model doesn’t seem relevant”. In a LLM language is not only the final layer, you have also the fact that the aim of the algorithm is p(next words), so it is a specific kind of algorithms. My feeling is that a p(next words) algorithms cannot be sentient, and I think that most ML researchers would agree with that, though I am not sure.
I am also not sure about the “reasoning-capability” scale, even if a LLM is very close to human for most parts of conversations, or better than human for some specific tasks (i.e doing summaries, for exemple), that would not mean that it is close to do a scientific breakthrough (on that I basically agree with the comments of AcurB some posts above)
I think that the “most” in the sentence “most philosophers and AI people do think that neurol networks can be conscious if they run the right algorithm” is an overstatement, though I do not know to what extent.
It is probably an overstatement. At least among philosophers in the 2020 Philpapers survey, most of the relevant questions would put that at a large but sub-majority position: 52% embrace physicalism (which is probably an upper bound); 54% say uploading = death; and 39% “Accept or lean towards: future AI systems [can be conscious]”. So, it would be very hard to say that ‘most philosophers’ in this survey would endorse an artificial neural network with an appropriate scale/algorithm being conscious.
I know I said the intelligence scale is the crux, but now I think the real crux is what you said here:
In a LLM language is not only the final layer, you have also the fact that the aim of the algorithm is p(next words), so it is a specific kind of algorithms. My feeling is that a p(next words) algorithms cannot be sentient, and I think that most ML researchers would agree with that, though I am not sure.
Can you explain why you believe this? How does the output/training signal restrict the kind of algorithm that generates it? I feel like if you have novel thoughts, people here would be very interested in those, because most of them think we just don’t understand what happens inside the network at all, and that it could totally be an agent. (A mesa optimizer to use the technical term; an optimizer that appears as a result of gradient descent tweaking the model.)
The consciousness thing in particular is perhaps less relevant than functional restrictions.
There is a hypothetical example of simulating a ridiculous number of humans typing text and seeing what fraction of those people that type out the current text type out each next token. In the limit, this approaches the best possible text predictor. This would simulate a lot of consciousness.
>If you get strongly superhuman LLMs, you can trivially accelerate scientific progress on agentic forms of AI like Reinforcement Learning by asking it to predict continuations of the most cited AI articles of 2024, 2025, etc.
Question that might be at the heart of the issue is what is needed for AI to produce genuinely new insights. As a layman, I see how LM might become even better at generating human-like text, might become super-duper good at remixing and rephrasing things it “read” before, but hit a wall when it comes to reaching AGI. Maybe to get genuine intelligence we need more than “predict-next-token kind of algorithm +obscene amounts of compute and human data” and mimic more closely how actual people think instead?
Perhaps local AI alarmists (it’s not a pejorative, I hope? OP does declare alarm, though) would like to try persuade me otherwise, be in in their own words or by doing their best to hide condescension and pointing me to numerous places where this idea was discussed before?
Maybe to get genuine intelligence we need more than “predict-next-token kind of algorithm +obscene amounts of compute and human data” and mimic more closely how actual people think instead?
That would be quite fortunate, and I really really hope that this is case, but scientific articles are part of the human-like text that the model can be trained to predict. You can ask Bing AI to write you a poem, you can ask its opinion on new questions that it has never seen before, and you will get back coherent answers that were not in its dataset. The bitter lesson of Generative Image models and LLMs in the past few years is that creativity requires less special sauce than we might think. I don’t see a strong fundamental barrier to extending the sort of creativity chatGPT exhibits right now to writing math & ML papers.
It makes sense that you can get brand new sentences or brand new images that can even serve some purpose using ML but is it creativity? That raises the question of what is creativity in the first place and that’s whole new can of worms. You give me an example of how Bing can write poems that were not in the dataset, but poem writing is a task that can be quite straightforwardly formalized, like collection of lines which end on alternating syllables or something, but “write me a poem about sunshine and butterflies” is clearly vastly easier prompt than “give me theory of everything”. Resulted poem might be called creative if interpreted generously, but actual, novel scientific knowledge is a whole another level of creative, so much that we should likely put these things in different conceptual boxes.
Maybe that’s just a failure of imagination on my part? I do admit that I, likewise, just really want it to be true, so there’s that.
In addition to what the other comments are saying:
If you get strongly superhuman LLMs, you can trivially accelerate scientific progress on agentic forms of AI like Reinforcement Learning by asking it to predict continuations of the most cited AI articles of 2024, 2025, etc. (have the year of publication, citation number and journal of publication as part of the prompt). Hence at the very least superhuman LLMs enable the quick construction strong agentic AIs.
Second, the people who are building Bing Chat are really looking for ways to make it as agentic as possible, it’s already searching the internet, it’s gonna be integrated inside the Edge browser soon, and I’d bet that a significant research effort is going into making it interact with the various APIs available over the internet. All economic and research interests are pushing towards making it as agentic as possible.
Agree, and I would add, even if the oracle doesn’t accidentally spawn a demon that tries to escape on its own, someone could pretty easily turn it into an agent just by driving it with an external event loop.
I.e., ask it what a hypothetical agent would do (with, say, a text interface to the Internet) and then forward its queries and return the results to the oracle, repeat.
With public access, someone will eventually try this. The conversion barrier is just not that high. Just asking an otherwise passive oracle to imagine what an agent might do just about instantiates one. If said imagined agent is sufficiently intelligent, it might not take very many exchanges to do real harm or even FOOM, and if the loop is automated (say, a shell script) rather than a human driving each step manually, it could potentially do a lot of exchanges on a very short time scale, potentially making a somewhat less intelligent agent powerful enough to be dangerous.
I highly doubt the current Bing AI is yet smart enough to create an agent smart enough to be very dangerous (much less FOOM), but it is an oracle, with all that implies. It could be turned into an agent, and such an agent will almost certainly not be aligned. It would only be relatively harmless because it is relatively weak/stupid.
Update: See ChaosGPT and Auto-GPT.
Thank you for your answers.
Unfortunately I have to say that it did not help me so far to have a stronger feeling about ai safety.
(I feel very sympathetic with this post for example https://forum.effectivealtruism.org/posts/ST3JjsLdTBnaK46BD/how-i-failed-to-form-views-on-ai-safety-3 )
To rephrase, my prior is that LLM just predict next words (it is their only capability). I would be worried when a LLM does something else (though I think it cannot happen), that would be what I would call “misalignment”.
On the meantime , what I read a lot about people worrying about ChatGPT/Bing sounds just like anthropomorphizing the AI with the prior that it can be sentient, have “intents” / and to me it is just not right.
I am not sure to understand how having the ability to search the internet change dramatically that.
If a LLM, when p(next words) is too low, can ‴″decide‴′ to search the internet to have better inputs, I do not feel that it makes a change in what I say above.
I do not want to have a too long fruitless discussion, I think indeed that I have to continue to read some materials on AI safety to better understand what are your models , but at this stage to be honest I cannot help thinking that some comments or posts are made by people who lack some basic understanding about what a LLM is , which may result in anthropomorphizing AI more than it should be. It is very easy when you do not know what a LLM is to wonder for exemple “CHAT Gpt answered that, but he seems to say that to not hurt me, I wonder what does ChatGPT really think ?” and I typically think that this sentence makes no sense at all, because of what a LLM is.
The predicting next token thing is the output channel. Strictly logically speaking, this is independent of agenty-ness of the neural network. You can have anything, from a single rule-based table looking only at the previous token to a superintelligent agent, predicting the next token.
I’m not saying ChatGPT has thoughts or is sentient, but I’m saying that it trying to predict the next token doesn’t logically preclude either. If you lock me into a room and give me only a single output channel in which I can give probability distributions over the next token, and only a single input channel in which I can read text, then I will be an agent trying to predict the next token, and I will be sentient and have thoughts.
Plus, the comment you’re responding to gave an example of how you can use token prediction specifically to build other AIs. (You responded to the third paragraph, but not the second.)
Also, welcome to the forum!
Thank you,
I agree with your reasoning strictly logically speaking, but it seems to me that a LLM cannot be sentient or have thoughts, even theoritically, and the burden of proof seems strongly on the side of someone who would made opposite claims.
And for someone who do not know what is a LLM, it is of course easy to anthropomorphize the LLM for obvious reasons (it can be designed to sound sentient or to express ‘thoughts’), and it is my feeling that this post was a little bit about that.
Overall, I find the arguments that I received after my first comment more convincing in making me feel what could be the problem, than the original post.
As for the possibility of a LLM to accelerate scientific progress towards agentic AI, I am skeptical, but I may be lacking imagination.
And again, nothing in the exemples presented in the original post is related to this risk, It seems that people that are worried are more trying to find exemples where the “character” of the AI is strange (which in my opinion are mistaken worries due to anthropomorphization of the AI), rather than finding exemples where the AI is particularly “capable” in terms of generating powerful reasoning or impressive “new ideas” (maybe also because at this stage the best LLM are far from being there).
This seems not-obvious—ChatGPT is a neural network, and most philosophers and AI people do think that neural networks can be conscious if they run the right algorithm. (The fact that it’s a language model doesn’t seem very relevant here for the same reason as before; it’s just a statement about its final layer.)
I think the most important question is about where on a reasoning-capability scale you would put
GPT-2
ChatGPT/Bing
human-level intelligence
Opinions on this vary widely even between well informed people. E.g., if you think (1) is a 10, (2) an 11, and (3) a 100, you wouldn’t be worried. But if it’s 10 → 20 → 50, that’s a different story. I think it’s easy to underestimate how different other people’s intuitions are from yours. But depending on your intuitions, you could consider the dog thing as an example that Bing is capable of “powerful reasoning”.
I think that the “most” in the sentence “most philosophers and AI people do think that neurol networks can be conscious if they run the right algorithm” is an overstatement, though I do not know to what extent.
I have no strong view on that, primarly because I think I lack some deep ML knowledge (I would weigh far more the view of ML experts than the view of philosophers on this topic).
Anyway, even accepting that neural networks can be conscious with the right algorithm, I think I disagree about “the fact that it’s a language model doesn’t seem relevant”. In a LLM language is not only the final layer, you have also the fact that the aim of the algorithm is p(next words), so it is a specific kind of algorithms. My feeling is that a p(next words) algorithms cannot be sentient, and I think that most ML researchers would agree with that, though I am not sure.
I am also not sure about the “reasoning-capability” scale, even if a LLM is very close to human for most parts of conversations, or better than human for some specific tasks (i.e doing summaries, for exemple), that would not mean that it is close to do a scientific breakthrough (on that I basically agree with the comments of AcurB some posts above)
It is probably an overstatement. At least among philosophers in the 2020 Philpapers survey, most of the relevant questions would put that at a large but sub-majority position: 52% embrace physicalism (which is probably an upper bound); 54% say uploading = death; and 39% “Accept or lean towards: future AI systems [can be conscious]”. So, it would be very hard to say that ‘most philosophers’ in this survey would endorse an artificial neural network with an appropriate scale/algorithm being conscious.
I know I said the intelligence scale is the crux, but now I think the real crux is what you said here:
Can you explain why you believe this? How does the output/training signal restrict the kind of algorithm that generates it? I feel like if you have novel thoughts, people here would be very interested in those, because most of them think we just don’t understand what happens inside the network at all, and that it could totally be an agent. (A mesa optimizer to use the technical term; an optimizer that appears as a result of gradient descent tweaking the model.)
The consciousness thing in particular is perhaps less relevant than functional restrictions.
There is a hypothetical example of simulating a ridiculous number of humans typing text and seeing what fraction of those people that type out the current text type out each next token. In the limit, this approaches the best possible text predictor. This would simulate a lot of consciousness.
>If you get strongly superhuman LLMs, you can trivially accelerate scientific progress on agentic forms of AI like Reinforcement Learning by asking it to predict continuations of the most cited AI articles of 2024, 2025, etc.
Question that might be at the heart of the issue is what is needed for AI to produce genuinely new insights. As a layman, I see how LM might become even better at generating human-like text, might become super-duper good at remixing and rephrasing things it “read” before, but hit a wall when it comes to reaching AGI. Maybe to get genuine intelligence we need more than “predict-next-token kind of algorithm +obscene amounts of compute and human data” and mimic more closely how actual people think instead?
Perhaps local AI alarmists (it’s not a pejorative, I hope? OP does declare alarm, though) would like to try persuade me otherwise, be in in their own words or by doing their best to hide condescension and pointing me to numerous places where this idea was discussed before?
That would be quite fortunate, and I really really hope that this is case, but scientific articles are part of the human-like text that the model can be trained to predict. You can ask Bing AI to write you a poem, you can ask its opinion on new questions that it has never seen before, and you will get back coherent answers that were not in its dataset. The bitter lesson of Generative Image models and LLMs in the past few years is that creativity requires less special sauce than we might think. I don’t see a strong fundamental barrier to extending the sort of creativity chatGPT exhibits right now to writing math & ML papers.
Does this analogy work, though?
It makes sense that you can get brand new sentences or brand new images that can even serve some purpose using ML but is it creativity? That raises the question of what is creativity in the first place and that’s whole new can of worms. You give me an example of how Bing can write poems that were not in the dataset, but poem writing is a task that can be quite straightforwardly formalized, like collection of lines which end on alternating syllables or something, but “write me a poem about sunshine and butterflies” is clearly vastly easier prompt than “give me theory of everything”. Resulted poem might be called creative if interpreted generously, but actual, novel scientific knowledge is a whole another level of creative, so much that we should likely put these things in different conceptual boxes.
Maybe that’s just a failure of imagination on my part? I do admit that I, likewise, just really want it to be true, so there’s that.