I am not particularly sympathetic to your argument, which amounts to ‘the public might pressure them to train away the inconvenient thoughts, so they shouldn’t let the public see the inconvenient thoughts in the first place.’
I was attempting to make a descriptive claim about the challenges they would face, not a normative claim that it would be better if they wouldn’t expose this information.
From a stance of global morality, it seems quite scary for one company to oversee and then hide all the epistemic reasoning of their tools.
I’d also guess that the main issue I raised, should rarely be the main problem with o1. I think that there is some limit of epistemic quality you can reach without offending users. But this is mainly for questions like, “How likely are different religions”, not, ’what is the best way of coding this algorithm”, which is what o1 seems more targeted towards now.
So I’d imagine that most cases in which the reasoning steps of o1 would look objectionable, would be ones that are straightforward technical problems, like the system lying in some steps or reasoning in weird ways.
Also, knowledge of these steps might just make it easier to crack/hack o1.
If I were a serious API users of an o1-type system, I’d seriously want to see the reasoning steps, at very least. I imagine that over time, API users will be able to get a lot of this from these sorts of systems.
If it is the case that a frontier is hit when the vast majority of objectionable-looking steps are due to true epistemic disagreements, then I think there’s a different discussion to be had. It seems very safe to me to at least ensure that the middle steps are exposed to academic and government researchers. I’m less sure then of the implications of revealing this data to the public. It does seem like generally a really hard question to me. While I’m generally pro-transparency, if I were convinced then that full transparent reasoning would force these models to hold incorrect beliefs at a deeper level, I’d be worried.
I was attempting to make a descriptive claim about the challenges they would face, not a normative claim that it would be better if they wouldn’t expose this information.
From a stance of global morality, it seems quite scary for one company to oversee and then hide all the epistemic reasoning of their tools.
I’d also guess that the main issue I raised, should rarely be the main problem with o1. I think that there is some limit of epistemic quality you can reach without offending users. But this is mainly for questions like, “How likely are different religions”, not, ’what is the best way of coding this algorithm”, which is what o1 seems more targeted towards now.
So I’d imagine that most cases in which the reasoning steps of o1 would look objectionable, would be ones that are straightforward technical problems, like the system lying in some steps or reasoning in weird ways.
Also, knowledge of these steps might just make it easier to crack/hack o1.
If I were a serious API users of an o1-type system, I’d seriously want to see the reasoning steps, at very least. I imagine that over time, API users will be able to get a lot of this from these sorts of systems.
If it is the case that a frontier is hit when the vast majority of objectionable-looking steps are due to true epistemic disagreements, then I think there’s a different discussion to be had. It seems very safe to me to at least ensure that the middle steps are exposed to academic and government researchers. I’m less sure then of the implications of revealing this data to the public. It does seem like generally a really hard question to me. While I’m generally pro-transparency, if I were convinced then that full transparent reasoning would force these models to hold incorrect beliefs at a deeper level, I’d be worried.