I’m interested in whether explaining brain data causes more weight to be placed on hypotheses that seem similar to our naive introspection, or less. Arguments for “less” might be if our introspection is self-deceptive in ways we don’t want to be disabused of, or if more data would promote a more microphysical, less psychological way of looking at humans. But I think our success rides on overcoming these arguments and designing AI where more is better.
I guess what I’m saying is that I see work on meta-preferences and regularization as being crucial inputs to work that uses brain data, and conversely, using brain data might be an important testbed for our ability to extract psychological models from physical data. Does that make sense?
Thanks for your thoughts! I think I’m having a bit of trouble unpacking this. Can you help me unpack this sentence:
“But I our success rides on overcoming these arguments and designing AI where more is better.”
What is “more”? And what are “these arguments”? And how does this sentence relate to the question of whether explain data makes us put place more or less weight on similar-to-introspection hypotheses?
I’ve edited that sentence to “But I think our success rides on overcoming these arguments and designing AI where more is better.”
Where “more” means more data about humans, or more ability to process the information it already has. And “these arguments” means the arguments for why too much data might lead the AI to do things we don’t want (maybe the most mathematically clear example is how CIRL stops being corrigible if it can accurately predict you).
So to rephrase: there are some reasons why adding brain activity data might cause current AI designs to do things we don’t want. That’s bad; we want value learning schemes that come with principled arguments that more data will lead to better outcomes.
I’m interested in whether explaining brain data causes more weight to be placed on hypotheses that seem similar to our naive introspection, or less. Arguments for “less” might be if our introspection is self-deceptive in ways we don’t want to be disabused of, or if more data would promote a more microphysical, less psychological way of looking at humans. But I think our success rides on overcoming these arguments and designing AI where more is better.
I guess what I’m saying is that I see work on meta-preferences and regularization as being crucial inputs to work that uses brain data, and conversely, using brain data might be an important testbed for our ability to extract psychological models from physical data. Does that make sense?
Thanks for your thoughts! I think I’m having a bit of trouble unpacking this. Can you help me unpack this sentence:
What is “more”? And what are “these arguments”? And how does this sentence relate to the question of whether explain data makes us put place more or less weight on similar-to-introspection hypotheses?
Whoops, I accidentally a word there.
I’ve edited that sentence to “But I think our success rides on overcoming these arguments and designing AI where more is better.”
Where “more” means more data about humans, or more ability to process the information it already has. And “these arguments” means the arguments for why too much data might lead the AI to do things we don’t want (maybe the most mathematically clear example is how CIRL stops being corrigible if it can accurately predict you).
So to rephrase: there are some reasons why adding brain activity data might cause current AI designs to do things we don’t want. That’s bad; we want value learning schemes that come with principled arguments that more data will lead to better outcomes.