Pattern comments on Plausibly, almost every powerful algorithm would be manipulative

Pattern 6 Feb 2020 21:42 UTC
1 point
0
Manipulation emerges naturally
Empirical claims. (Creating a specific example (running code) does not demonstrate “natural”, but can contribute towards building an understanding of what conditions give rise to the hypothesized behavior, if any.*)
Of course, the manipulation above happened because the programmers didn’t understand what the algorithm’s true loss function was. They thought it was “minimise overall loss on classification”, but it was actually “keep each dataset loss just above 0.1″.
This seems incorrect. The scenario highlighted that with that setup, the way “minimise overall loss on classification” was optimized led to the behavior: “keep each dataset loss just above 0.1″. Semantics, perhaps, but the issue isn’t “the algorithm was accidentally programmed to keep each dataset just above 0.1”, rather that is a result of its learning in its setup.
*A tendency to forget things could be a blessing—a representation of the world might not be crafted, and a “manipulative” strategy not found. (One could argue that by this definition humans are “manipulative” if we change our environment—tool use is obviously a form of ‘manipulation’, if only ‘manipulating using our hands/etc.’. Similarly if communication works, it can lead to change...)
There is no clear division, currently, between mild manipulation and disastrous manipulation.
The story didn’t seem to include a disaster.
- Stuart_Armstrong 7 Feb 2020 10:40 UTC
  2 points
  0
  Parent
  
  The story didn’t seem to include a disaster.
  
  No. That’s a separate issue—the reason that mild manipulation could be a problem. If we had a clear division, I wouldn’t care about mild manipulation.