This is an excellent post, thank you a lot for it. Below an assortment of remarks and questions.
The table is interesting. Here’s my attempt at estimating the values for helpfulness, educational value and 2022 neglect (epistemic effort: I made this up on the spot from intuitions):
OOD robustness: 2⁄10, 4⁄10, 2⁄10
Agent foundations: 5⁄10, 7⁄10, 9⁄10
Multi-agent RL: 1⁄10, 4⁄10, 2⁄10
Preference learning: 8⁄10, 7⁄10, 2⁄10
Side-effect minimization: 7⁄10, 5⁄10, 4⁄10
Human-robot interaction: 1⁄10, 1⁄10, 2⁄10
Interpretability: 9⁄10, 8⁄10, 3⁄10
Fairness in ML: 2⁄10, 4⁄10, 1⁄10
Computational social choice: 3⁄10, 5⁄10, 2⁄10
Accountability in ML: 6⁄10, 2⁄10, 3⁄10
Furthermore:
Contributions to preference learning are not particularly helpful to existential safety in my opinion, because their most likely use case is for modeling human consumers just well enough to create products they want to use and/or advertisements they want to click on. Such advancements will be helpful to rolling out usable tech products and platforms more quickly, but not particularly helpful to existential safety.*
*) I hope no one will be too offended by this view. I did have some trepidation about expressing it on the “alignment’ forum, but I think I should voice these concerns anyway, for the following reason.
While I don’t put a lot of probability on this view, I’m glad it was expressed here.
One might want to make a distinction between different layers in the Do-What-I-Mean hierarchy, I believe this text is mostly talking about the first/second layer of the hierarchy (excluding “zero DWIMness”). Perhaps there could be additional risks from companies having a richer understanding of human preferences and their distinction from biases?
However, the need to address multilateral externalities will arise very quickly after unilateral externalities are addressed well enough to roll out legally admissible products, because most of our legal systems have an easier time defining and punishing negative outcomes that have a responsible party. I don’t believe this is a quirk of human legal systems: when two imperfectly aligned agents interact, they complexify each other’s environment in a way that consumes more cognitive resources than interacting with a non-agentic environment.
I have a hard time coming up with an example that distinguishes unilateral externalities from a multilateral externalities, what would be an example? The ARCHES paper doesn’t contain the string “multilateral externalit”.
Thinking out loud: If we have three countries A, B and C, all of which have a coast to the same ocean, and A and B release two different chemicals N and M that are harmless each on their own, but in combination create a toxic compound N₂M₃ that harms citizens of all three countries equally, is that a multilateral externality?
Trying to generalise, is any situation where either two or more actors harm other actors, or one or more actors harm two or more other actors, through polluting a shared resource, a multilateral externality?
Two more (small) questions:
Is “translucent game theory” the same as “open-source game theory”?
You say you prefer talking about papers, but do you by chance have a recommendation for CSC textbooks? The handbook you link doesn’t have exercises, if I remember correctly.
This is an excellent post, thank you a lot for it. Below an assortment of remarks and questions.
The table is interesting. Here’s my attempt at estimating the values for helpfulness, educational value and 2022 neglect (epistemic effort: I made this up on the spot from intuitions):
OOD robustness: 2⁄10, 4⁄10, 2⁄10
Agent foundations: 5⁄10, 7⁄10, 9⁄10
Multi-agent RL: 1⁄10, 4⁄10, 2⁄10
Preference learning: 8⁄10, 7⁄10, 2⁄10
Side-effect minimization: 7⁄10, 5⁄10, 4⁄10
Human-robot interaction: 1⁄10, 1⁄10, 2⁄10
Interpretability: 9⁄10, 8⁄10, 3⁄10
Fairness in ML: 2⁄10, 4⁄10, 1⁄10
Computational social choice: 3⁄10, 5⁄10, 2⁄10
Accountability in ML: 6⁄10, 2⁄10, 3⁄10
Furthermore:
While I don’t put a lot of probability on this view, I’m glad it was expressed here.
One might want to make a distinction between different layers in the Do-What-I-Mean hierarchy, I believe this text is mostly talking about the first/second layer of the hierarchy (excluding “zero DWIMness”). Perhaps there could be additional risks from companies having a richer understanding of human preferences and their distinction from biases?
I have a hard time coming up with an example that distinguishes unilateral externalities from a multilateral externalities, what would be an example? The ARCHES paper doesn’t contain the string “multilateral externalit”.
Thinking out loud: If we have three countries A, B and C, all of which have a coast to the same ocean, and A and B release two different chemicals N and M that are harmless each on their own, but in combination create a toxic compound N₂M₃ that harms citizens of all three countries equally, is that a multilateral externality?
Trying to generalise, is any situation where either two or more actors harm other actors, or one or more actors harm two or more other actors, through polluting a shared resource, a multilateral externality?
Two more (small) questions:
Is “translucent game theory” the same as “open-source game theory”?
You say you prefer talking about papers, but do you by chance have a recommendation for CSC textbooks? The handbook you link doesn’t have exercises, if I remember correctly.