Thanks for opening minds to the possibility that agents & their utility function may not be the most fruitful way to think about these questions. Could you provide a few pointers to these « notably not all » from point 5?
Brain like AGI safety
Shard Theory
Iterated Amplification
Much of interpretability work
Possibly Pragmatic AI Safety, idk much about it.
The selection theorems branch of research
The particular selection theorem case of modularity
Thanks! By interpretability work, you mean in the vein of Colah and the like?
Yes
Thanks for opening minds to the possibility that agents & their utility function may not be the most fruitful way to think about these questions. Could you provide a few pointers to these « notably not all » from point 5?
Brain like AGI safety
Shard Theory
Iterated Amplification
Much of interpretability work
Possibly Pragmatic AI Safety, idk much about it.
The selection theorems branch of research
The particular selection theorem case of modularity
Thanks! By interpretability work, you mean in the vein of Colah and the like?
Yes