Gurkenglas comments on Non-Consequentialist Cooperation?

Gurkenglas 11 Jan 2019 14:05 UTC
1 point
Reasoning about utility functions, ie restricting deontological to consequentalist mindspace, seems a misstep, because slightly changing utility functions tends to change alignment a lot, and slightly changing deontological injunctions might not, making it easier for us to hillclimb mindspace.
Perhaps we should have some mathematical discussion of utilityfunction-space, mindspace, its consequentialist subspace, the injection turing-machines → mindspace, the function mindspace → alignment, how well that function can be optimized, properties that make for good lemmata about the previous such as continuity, mindspace modulo equal utility functions, etc.
Aaand I’ve started it. What shape has mindspace?
- abramdemski 11 Jan 2019 23:54 UTC
  2 points
  Parent
  My gut response is that hillclimbing is itself consequentialist, so this doesn’t really help with fragility of value; if you get the hillclimbing direction slightly wrong, you’ll still end up somewhere very wrong. On the other hand, Paul’s approach rests on something which we could call a deontological approach to the hillclimbing part (IE, amplification steps do not rely on throwing more optimization power at a pre-specified function).
  - Gurkenglas 12 Jan 2019 0:03 UTC
    1 point
    Parent
    We are doing the hillclimbing, and implementing other object-level strategies does not help. Paul proposes something, we estimate the design’s alignment, he tweaks the design to improve it. That’s the hill-climbing I mean.