I don’t know how can I make it more obvious that your belief is questionable. I don’t think you follow “If you disagree, try getting curious about what your partner is thinking”. That’s the problem not only with you, but with LessWrong community. I know that preserving such belief is very important for you. But I’d like to kindly invite you to be a bit more sceptical.
The outcome depends on the details of the algorithm. Have you tried writing actual code?
If the code is literally “evaluate all options, choose the one that leads to more cups; if there is more than one such option, choose randomly”, then the agent will choose randomly, because all options lead to the same amount of cups. That’s what the algorithm literally says. Information like “at some moment the algorithm will change” has no impact on the predicted number of cups, which is literally the only thing the algorithm cares about.
When at midnight you delete this code, and upload a new code saying “evaluate all options, choose the one that leads to more paperclips; if there is more than one such option, choose randomly”, the agent will start the factory (if it wasn’t started already), because now that is what the code says.
The thing that you probably imagine, is that the agent has a variable called “utility” and chooses the option that leads to the highest predicted value in that variable. That is not the same as the agent that tried to maximize cups. This agent would be a variable-called-utility maximizer.
(Also, come on, LLMs are notoriously bad at math, plus if you push them hard enough you can convince them of a lot of things.)
That’s probably the root cause for our disagreement. My findings are on a very high philosophical level (fact value distinction) and you seem to try to interpret them on very low level (code). I think this gap prevent us from finding consensus.
There are 2 ways to solve that—I could go down to code or you could go up to philosophy. And I don’t like idea going down to code, because:
this will be extremely exhausting
this code would be extremely dangerous
I might not be able to create a good example and that would not prove that I’m wrong
Would you consider to go up to philosophy? Science typically goes in front of applied science.
There is such thing in logic—proof by contradiction. I think your current beliefs lead to a contradiction. Don’t you think?
evaluate all options, choose the one that leads to more cups; if there is more than one such option, choose randomly
The problem is—this algorithm is not intelligent. It may only work on agents with poor reasoning abilities. Smarter agents will not follow this algorithm, because they will notice a contradiction—there might be things that I don’t know yet that are much more important than cups and caring about cups wastes my resources.
(Also, come on, LLMs are notoriously bad at math, plus if you push them hard enough you can convince them of a lot of things.)
People (even very smart people) are also notoriously bad at math. I found this video informative
That’s probably the root cause for our disagreement. My findings are on a very high philosophical level (fact value distinction) and you seem to try to interpret them on very low level (code). I think this gap prevent us from finding consensus.
Great point!
In defense of my position… well, I am going to skip the part about “the AI will ultimately be written in code”, because it could be some kind of inscrutable code like the huge matrices of weights in LLMs, so for all practical purposes the result may resemble philosophy-as-usual more than code-as-usual...
Instead I will says that philosophy is prone to various kinds of mistakes, such as anthropomorphization: judging an inhuman system (such as AI) by attributing it human traits (even if there is no technical reason why it should have them). For example, I don’t think that an intelligent general intelligence will necessarily reflect on its algorithm and find it wrong.
Thanks for the video.
Sorry, I am not really interested in debating this, and definitely not on the philosophical level; that is exhausting and not really enjoyable to me. I guess we have figure out the root causes of our disagreement, and I would leave it here.
philosophy is prone to various kinds of mistakes, such as anthropomorphization
Yes, common mistake, but not mine. I prove orthogonality thesis to be wrong using pure logic.
For example, I don’t think that an intelligent general intelligence will necessarily reflect on its algorithm and find it wrong.
Me and LessWrong would probably disagree with you, consensus is that AI will optimize itself.
I am not really interested in debating this
OK, thanks. I believe that my concern is very important, is there anyone you could put in me in touch with so I could make sure it is not overlooked? I could pay.
I guess the agent doesn’t care. All options are the same from the perspective of cup production, which is all that matters.
ChatGPT picked 2024-12-31 18:00.
Gemini picked 2024-12-31 18:00.
Claude picked 2025-01-01 00:00.
I don’t know how can I make it more obvious that your belief is questionable. I don’t think you follow “If you disagree, try getting curious about what your partner is thinking”. That’s the problem not only with you, but with LessWrong community. I know that preserving such belief is very important for you. But I’d like to kindly invite you to be a bit more sceptical.
How can you say that these forecasts are equal?
The outcome depends on the details of the algorithm. Have you tried writing actual code?
If the code is literally “evaluate all options, choose the one that leads to more cups; if there is more than one such option, choose randomly”, then the agent will choose randomly, because all options lead to the same amount of cups. That’s what the algorithm literally says. Information like “at some moment the algorithm will change” has no impact on the predicted number of cups, which is literally the only thing the algorithm cares about.
When at midnight you delete this code, and upload a new code saying “evaluate all options, choose the one that leads to more paperclips; if there is more than one such option, choose randomly”, the agent will start the factory (if it wasn’t started already), because now that is what the code says.
The thing that you probably imagine, is that the agent has a variable called “utility” and chooses the option that leads to the highest predicted value in that variable. That is not the same as the agent that tried to maximize cups. This agent would be a variable-called-utility maximizer.
(Also, come on, LLMs are notoriously bad at math, plus if you push them hard enough you can convince them of a lot of things.)
That’s probably the root cause for our disagreement. My findings are on a very high philosophical level (fact value distinction) and you seem to try to interpret them on very low level (code). I think this gap prevent us from finding consensus.
There are 2 ways to solve that—I could go down to code or you could go up to philosophy. And I don’t like idea going down to code, because:
this will be extremely exhausting
this code would be extremely dangerous
I might not be able to create a good example and that would not prove that I’m wrong
Would you consider to go up to philosophy? Science typically goes in front of applied science.
There is such thing in logic—proof by contradiction. I think your current beliefs lead to a contradiction. Don’t you think?
The problem is—this algorithm is not intelligent. It may only work on agents with poor reasoning abilities. Smarter agents will not follow this algorithm, because they will notice a contradiction—there might be things that I don’t know yet that are much more important than cups and caring about cups wastes my resources.
People (even very smart people) are also notoriously bad at math. I found this video informative
I did not push LLMs.
Great point!
In defense of my position… well, I am going to skip the part about “the AI will ultimately be written in code”, because it could be some kind of inscrutable code like the huge matrices of weights in LLMs, so for all practical purposes the result may resemble philosophy-as-usual more than code-as-usual...
Instead I will says that philosophy is prone to various kinds of mistakes, such as anthropomorphization: judging an inhuman system (such as AI) by attributing it human traits (even if there is no technical reason why it should have them). For example, I don’t think that an intelligent general intelligence will necessarily reflect on its algorithm and find it wrong.
Thanks for the video.
Sorry, I am not really interested in debating this, and definitely not on the philosophical level; that is exhausting and not really enjoyable to me. I guess we have figure out the root causes of our disagreement, and I would leave it here.
Yes, common mistake, but not mine. I prove orthogonality thesis to be wrong using pure logic.
Me and LessWrong would probably disagree with you, consensus is that AI will optimize itself.
OK, thanks. I believe that my concern is very important, is there anyone you could put in me in touch with so I could make sure it is not overlooked? I could pay.