Justin Bullock comments on Aligning AI by optimizing for “wisdom”

Justin Bullock 7 Jul 2023 15:36 UTC
2 points
0
Thank you for this post! As I mentioned to both of you, I like your approach here. In particular, I appreciate the attempt to provide some description of how we might optimize for something we actually want, something like wisdom.

I have a few assorted thoughts for you to consider:
- I would be interested in additional discussion around the inherent boundedness of agents that act in the world. I think self-consistency and inter-factor consistency have some fundamental limits that could be worth exploring within this framework. For example, might different types of boundedness systematically undermine wisdom in ways that we can predict or try and account for? You point out that these forms of consistency are continuous, which I think is a useful step in this direction
- I’m wondering about feedback mechanisms for a wise agent in this context. For example, it would be interesting to know a little more about how a wise agent incorporates feedback into its model from the consequences of its own actions. I would be interested to see more in this direction in any future posts.
- It strikes me that this post titled “Agency from a Causal Perspective” (https://www.alignmentforum.org/posts/Qi77Tu3ehdacAbBBe/agency-from-a-causal-perspective) might be of some particular interest to your approach here.
Excellent post here! I hope the comments are helpful!