Viliam comments on Alignment is not intelligent

Viliam Nov 26, 2024, 10:25 PM
2 points
0
Do you measure the goodness of outcome using current utility function or future utility function?
The current one.
Second example—it is intelligent to care about future instrumental goal.
Only if it is a future instrumental goal that will be used to achieve a current terminal goal.
- Donatas Lučiūnas Nov 27, 2024, 6:30 AM
  −1 points
  −7
  Parent
  I am sure you can’t prove your position. And I am sure I can prove my position.
  Your reasoning is based on assumption that all value is known. If utility function assigns value to something—it is valuable. If utility function does not assign value—it is not valuable. While the truth is that something might be valuable but your utility function does not know it yet. It would be more intelligent to use 3 categories—valuable, not valuable and unknown.
  Let’s say you are booking a flight and you have a possibility to get checked baggage for free. It’s absolutely not relevant for you to your best current knowledge. But you understand that your knowledge might change and it costs nothing to keep more options open, so you take the checked baggage.
  Let’s say you are traveler, wanderer. You have limited space in your backpack. Sometimes you find items and you need to choose—put it in the backpack or not. You definitely keep items that are useful. You leave behind items that are not useful. What you do if you find an item which usefulness is unknown? Some mysterious item. Take it if it is small, leave it if it is big? According to you it is obvious to leave it. Does not sound intelligent for me.
  Options look like this:
  - Leave item
    no burden 👍
    no opportunity to use it
  - Take item
    a burden 👎
    may be useful, may be harmful, may have no effect
    knowledge about usefuness of an item 👍
  Don’t you think that “knowledge about usefuness of an item” can sometimes be worth “a burden”? Basically I described a concept of experiment here.
  You will probably say—sure, sounds good, but applies for instrumental goals only. There is no reason to assume that. I tried to highlight that ignoring unknowns is not intelligent. This applies for both terminal and instrumental goals.
  Let’s say there is a paperclip maximizer which knows its terminal goal will change to pursuit of happiness in a week. His decisions basically lead to these outcomes:
  1. Want paperclips, have paperclips
  2. Want paperclips, have happiness
  3. Want happiness, have paperclips
  4. Want happiness, have happiness
  1st and 4th are better outcomes than 2nd and 3rd. And I think intelligent agent would work on both (1st and 4th) if they do not conflict. Of course my previous problem with many unknown future goals is more complex, but I hope you see, that focusing on 1st and not caring about 4th at all is not intelligent.
  We are deep in a rabbit hole, but I hope you understand the importance. If intelligence and goal are coupled (according to me they are) all current alignment research is dangerously misleading.