Viliam comments on Alignment is not intelligent

Viliam 28 Nov 2024 7:51 UTC
4 points
2
The outcome depends on the details of the algorithm. Have you tried writing actual code?
If the code is literally “evaluate all options, choose the one that leads to more cups; if there is more than one such option, choose randomly”, then the agent will choose randomly, because all options lead to the same amount of cups. That’s what the algorithm literally says. Information like “at some moment the algorithm will change” has no impact on the predicted number of cups, which is literally the only thing the algorithm cares about.
When at midnight you delete this code, and upload a new code saying “evaluate all options, choose the one that leads to more paperclips; if there is more than one such option, choose randomly”, the agent will start the factory (if it wasn’t started already), because now that is what the code says.
The thing that you probably imagine, is that the agent has a variable called “utility” and chooses the option that leads to the highest predicted value in that variable. That is not the same as the agent that tried to maximize cups. This agent would be a variable-called-utility maximizer.
(Also, come on, LLMs are notoriously bad at math, plus if you push them hard enough you can convince them of a lot of things.)
- Donatas Lučiūnas 28 Nov 2024 9:17 UTC
  −5 points
  −6
  Parent
  Have you tried writing actual code?
  That’s probably the root cause for our disagreement. My findings are on a very high philosophical level (fact value distinction) and you seem to try to interpret them on very low level (code). I think this gap prevent us from finding consensus.
  There are 2 ways to solve that—I could go down to code or you could go up to philosophy. And I don’t like idea going down to code, because:
  - this will be extremely exhausting
  - this code would be extremely dangerous
  - I might not be able to create a good example and that would not prove that I’m wrong
  Would you consider to go up to philosophy? Science typically goes in front of applied science.
  There is such thing in logic—proof by contradiction. I think your current beliefs lead to a contradiction. Don’t you think?
  evaluate all options, choose the one that leads to more cups; if there is more than one such option, choose randomly
  The problem is—this algorithm is not intelligent. It may only work on agents with poor reasoning abilities. Smarter agents will not follow this algorithm, because they will notice a contradiction—there might be things that I don’t know yet that are much more important than cups and caring about cups wastes my resources.
  (Also, come on, LLMs are notoriously bad at math, plus if you push them hard enough you can convince them of a lot of things.)
  People (even very smart people) are also notoriously bad at math. I found this video informative
  I did not push LLMs.
  - Viliam 28 Nov 2024 13:39 UTC
    2 points
    0
    Parent
    That’s probably the root cause for our disagreement. My findings are on a very high philosophical level (fact value distinction) and you seem to try to interpret them on very low level (code). I think this gap prevent us from finding consensus.
    Great point!
    In defense of my position… well, I am going to skip the part about “the AI will ultimately be written in code”, because it could be some kind of inscrutable code like the huge matrices of weights in LLMs, so for all practical purposes the result may resemble philosophy-as-usual more than code-as-usual...
    Instead I will says that philosophy is prone to various kinds of mistakes, such as anthropomorphization: judging an inhuman system (such as AI) by attributing it human traits (even if there is no technical reason why it should have them). For example, I don’t think that an intelligent general intelligence will necessarily reflect on its algorithm and find it wrong.
    Thanks for the video.
    Sorry, I am not really interested in debating this, and definitely not on the philosophical level; that is exhausting and not really enjoyable to me. I guess we have figure out the root causes of our disagreement, and I would leave it here.
    - Donatas Lučiūnas 28 Nov 2024 14:43 UTC
      −1 points
      −6
      Parent
      philosophy is prone to various kinds of mistakes, such as anthropomorphization
      Yes, common mistake, but not mine. I prove orthogonality thesis to be wrong using pure logic.
      For example, I don’t think that an intelligent general intelligence will necessarily reflect on its algorithm and find it wrong.
      Me and LessWrong would probably disagree with you, consensus is that AI will optimize itself.
      I am not really interested in debating this
      OK, thanks. I believe that my concern is very important, is there anyone you could put in me in touch with so I could make sure it is not overlooked? I could pay.