Thank you so much for writing up this thorough and well-thought-out writeup, Eli and Charlotte! I think there is a high probability that TAI will be developed soon, and it is excellent that you have brainstormed your opinions on the implications for alignment with such detail.
Regarding your point on interpreting honesty and other concepts from the AI’s internal states: I wanted to share a recent argument I’ve made that the practical upside of interpretability may be lower than originally thought. This is because the interaction dynamics between the AI agent and its environment will likely be subject to substantial complexity and computational irreducibility.
Thank you so much for writing up this thorough and well-thought-out writeup, Eli and Charlotte! I think there is a high probability that TAI will be developed soon, and it is excellent that you have brainstormed your opinions on the implications for alignment with such detail.
Regarding your point on interpreting honesty and other concepts from the AI’s internal states: I wanted to share a recent argument I’ve made that the practical upside of interpretability may be lower than originally thought. This is because the interaction dynamics between the AI agent and its environment will likely be subject to substantial complexity and computational irreducibility.