I feel like handling human irrationality is a major motivation for agent foundations style research. What is the type of human values? How do we model human (ir)rationality to disentangle (unstable) goals from (imperfect) decison making procedures when figuring out what humans want. What’s the thing which the human brain is an approximation of and how does it approximate that ideal? I know getting a model of human irrationality is important to Vanessa, and she thinks working on meta-cognition will make some progress on that. Scott Garrabrants work, of which geometric rationality is the latest example, makes some headway on these problems. I’d go even further and say that a lot of non-agent foundation agendas are motivated by this problem. For instance, Steve Byrnes agenda tackles the “modelling human rationality” part more directly than any other agenda I can think of.
But that’s the thing: there’s been minor progress at best and researchers who work on these problems are pessimistic about further progress. I certainly don’t know of anything that really looks like it would work. Maybe our perspective is wrong. I would guess that shard theorists believe shard-theory would agree with that claim, and would state that shard theory looks like a more promising route to answering these questions (p~0.4).
So whilst the problem isn’t explicitly written about that much, I think a lot of researchers would contest that the problem is being ignored.
I feel like handling human irrationality is a major motivation for agent foundations style research. What is the type of human values? How do we model human (ir)rationality to disentangle (unstable) goals from (imperfect) decison making procedures when figuring out what humans want. What’s the thing which the human brain is an approximation of and how does it approximate that ideal? I know getting a model of human irrationality is important to Vanessa, and she thinks working on meta-cognition will make some progress on that. Scott Garrabrants work, of which geometric rationality is the latest example, makes some headway on these problems. I’d go even further and say that a lot of non-agent foundation agendas are motivated by this problem. For instance, Steve Byrnes agenda tackles the “modelling human rationality” part more directly than any other agenda I can think of.
But that’s the thing: there’s been minor progress at best and researchers who work on these problems are pessimistic about further progress. I certainly don’t know of anything that really looks like it would work. Maybe our perspective is wrong. I would guess that shard theorists believe shard-theory would agree with that claim, and would state that shard theory looks like a more promising route to answering these questions (p~0.4).
So whilst the problem isn’t explicitly written about that much, I think a lot of researchers would contest that the problem is being ignored.