rikisola comments on Open Thread, Jul. 13 - Jul. 19, 2015

rikisola 17 Jul 2015 18:05 UTC
0 points
Hi all, thanks for taking your time to comment. I’m sure it must be a bit frustrating to read something that lacks technical terms as much as this post, so I really appreciate your input. I’ll just write a couple of lines to summarize my thought, which is to design an AI that: 1- uses an initial utility function U, defined in absolute terms rather than subjective terms (for instance “survival of the AI” rather than “my survival”); 2- doesn’t try to learn an utility function for humans or for other agents, but uses for everyone the same utility function U it uses for itself; 3- updates this utility function when things don’t go to plan, so that it improves its predictions. Is such a design technically feasible? Am I right in thinking that it would make the AI “transparent”, in the sense that it would have no motivation to mislead us. Also wouldn’t this design make the AI indifferent to our actions, which is also desirable? It’s true that different people would have different values, so I’m not sure about how to deal with that. Any thought?