See my reply to Yitz, tl;dr I am hedgy on alignment being an issue due to being hedgy on algorithms getting “too powerful”, and I am hedgy transparency or interpretability is even a coherent concept for any algorithm complex enough to not be so (sans some edge cases).
But it’s not like I’m the best person to say, far from it.
See my reply to Yitz, tl;dr I am hedgy on alignment being an issue due to being hedgy on algorithms getting “too powerful”, and I am hedgy transparency or interpretability is even a coherent concept for any algorithm complex enough to not be so (sans some edge cases).
But it’s not like I’m the best person to say, far from it.