The idea of superintelligence at stake isn’t “good at inferring what people want and then decides to do what people want,” it’s “competent at changing the environment”.
It’s both. Superintelligence is definitionally equal or greater than human ability at a variety of tasks, so it implies equal or greater ability to understand words and concepts. Also competence at changing the environment requires accurate beliefs. So the default expectation is accuracy. If you think an AI would be selectively inaccurate about its values you need to explain why.
And if you program an explicit definition of ‘happiness’ into a machine
What has that to do with NNs? You seem to be just regurgitating standard dogma.
There is no reason to expect
It’s both. Superintelligence is definitionally equal or greater than human ability at a variety of tasks, so it implies equal or greater ability to understand words and concepts. Also competence at changing the environment requires accurate beliefs. So the default expectation is accuracy. If you think an AI would be selectively inaccurate about its values you need to explain why.
What has that to do with NNs? You seem to be just regurgitating standard dogma. There is no reason to expect