niplav comments on Slim overview of work one could do to make AI go better (and a grab-bag of other career considerations)

niplav 21 Mar 2024 8:20 UTC
2 points
0

Prevent sign flip and other near misses

The problem that we have with one proposed solution (adding a dummy utility function that highly disvalues a specific non-suffering thing) is that the resulting utility function is not reflectively stable.

So a theory of value formation and especially on achieving vNM coherence (or achieving whatever framework for rational preferences turns out to be the “correct” one) would be useful here. Then during the process of value formation humans can supervise decision points (i.e., in which direction to resolve the preference).