I found this to be interesting/valuable commentary after just reading through Stuart’s agenda.
I think speculating about the human-sized logical relationships between speculative parts inside the AI is easier but less useful than speculating about the algorithm that will connect your inputs to your outputs with a big model and lots of computing power, which may or may not have your logical steps as emergent features.
With this more compute/less abstraction approach you’re suggesting, do you mean that it may produce a model of the preferences that’s inscrutable to humans? If so, that could be an issue for getting the human’s buy-in. He talks about this some in section 4.5, that there’s “the human tendency to reject values imposed upon them, just because they are imposed upon them” and the AI may need to involve the human in construction of the utility function.
- How much of the hidden details are in doing meta-reasoning? If I don’t trust an AI, more steps of meta-reasoning makes me trust it even less—humans often say things about meta-reasoning that would be disastrous if implemented. What kind of amazing faculties would be required for an AI to extract partial preferences about meta-reasoning that actually made things better rather than worse? If I was better at understanding what the details actually are, maybe I’d pick on meta-reasoning more.
Which part of his post are you referring to by “meta-reasoning”? Is it the “meta-preferences”?
I found this to be interesting/valuable commentary after just reading through Stuart’s agenda.
With this more compute/less abstraction approach you’re suggesting, do you mean that it may produce a model of the preferences that’s inscrutable to humans? If so, that could be an issue for getting the human’s buy-in. He talks about this some in section 4.5, that there’s “the human tendency to reject values imposed upon them, just because they are imposed upon them” and the AI may need to involve the human in construction of the utility function.
Which part of his post are you referring to by “meta-reasoning”? Is it the “meta-preferences”?