I’m in support of this sort of work. Generally, I like the idea of dividing up LLM architectures into many separate components that could be individually overseen / aligned.
Separating “Shaggoth” / “Face” seems like a pretty reasonable division to me.
At the same time, there are definitely a lot of significant social+political challenges here.
I suspect that one reason why OpenAI doesn’t expose all the thinking of O1 is that this thinking would upset some users, especially journalists and such. It’s hard enough making sure that the final outputs are sufficiently unobjectionable to go public at a large scale. It seems harder to make sure the full set of steps is also unobjectionable.
One important thing that smart intellectuals do is to have objectionable/unpopular beliefs, but still present unobjectionable/popular outputs. For example, I’m sure many of us have beliefs that could get us cancelled by some group or other.
If the entire reasoning process is exposed, this might pressure it to be unobjectionable, even if that trades off against accuracy.
In general, I’m personally very much in favor of transparent thinking and argumentation. It’s just that I’ve noticed this as one fundamental challenge with intellectual activity, and I also expect to see it here.
One other challenge to flag—I imagine that the Shaggoth and Face layers would require (or at least greatly benefit from) some communication back and forth. An intellectual analysis could vary heavily depending on the audience. It’s not enough to do all the intellectual work in one pass, then match it to the audience after that.
For example, if an AI were tasked with designing changes to New York City, it might matter a lot that if the audience is a religious zealot.
One last tiny point—in future work, I’d lean against using the phrase “Shaggoth” as the reasoning step. It sort of makes sense to this audience, but I think it’s a mediocre fit here. For one, because I assume the “Face” could have some “Shaggoth” like properties.
I suspect that one reason why OpenAI doesn’t expose all the thinking of O1 is that this thinking would upset some users, especially journalists and such. It’s hard enough making sure that the final outputs are sufficiently unobjectionable to go public at a large scale. It seems harder to make sure the full set of steps is also unobjectionable.
I suspect the same thing, they almost come right out and say it: (emphasis mine)
We believe that a hidden chain of thought presents a unique opportunity for monitoring models. Assuming it is faithful and legible, the hidden chain of thought allows us to “read the mind” of the model and understand its thought process. For example, in the future we may wish to monitor the chain of thought for signs of manipulating the user. However, for this to work the model must have freedom to express its thoughts in unaltered form, so we cannot train any policy compliance or user preferences onto the chain of thought. We also do not want to make an unaligned chain of thought directly visible to users.
Therefore, after weighing multiple factors including user experience, competitive advantage, and the option to pursue the chain of thought monitoring, we have decided not to show the raw chains of thought to users.
I think this is a bad reason to hide the CoT from users. I am not particularly sympathetic to your argument, which amounts to ‘the public might pressure them to train away the inconvenient thoughts, so they shouldn’t let the public see the inconvenient thoughts in the first place.’ I think the benefits of letting the public see the CoT are pretty huge, but even if they were minor, it would be kinda patronizing and an abuse of power to hide them preemptively.
I suspect that one reason why OpenAI doesn’t expose all the thinking of O1 is that this thinking would upset some users, especially journalists and such. It’s hard enough making sure that the final outputs are sufficiently unobjectionable to go public at a large scale. It seems harder to make sure the full set of steps is also unobjectionable.
I suspect the same thing, they almost come right out and say it: (emphasis mine)
We believe that a hidden chain of thought presents a unique opportunity for monitoring models. Assuming it is faithful and legible, the hidden chain of thought allows us to “read the mind” of the model and understand its thought process. For example, in the future we may wish to monitor the chain of thought for signs of manipulating the user. However, for this to work the model must have freedom to express its thoughts in unaltered form, so we cannot train any policy compliance or user preferences onto the chain of thought. We also do not want to make an unaligned chain of thought directly visible to users.
Therefore, after weighing multiple factors including user experience, competitive advantage, and the option to pursue the chain of thought monitoring, we have decided not to show the raw chains of thought to users.
I think this is a bad reason to hide the CoT from users. I am not particularly sympathetic to your argument, which amounts to ‘the public might pressure them to train away the inconvenient thoughts, so they shouldn’t let the public see the inconvenient thoughts in the first place.’ I think the benefits of letting the public see the CoT are pretty huge, but even if they were minor, it would be kinda patronizing and an abuse of power to hide them preemptively.
I am not particularly sympathetic to your argument, which amounts to ‘the public might pressure them to train away the inconvenient thoughts, so they shouldn’t let the public see the inconvenient thoughts in the first place.’
I was attempting to make a descriptive claim about the challenges they would face, not a normative claim that it would be better if they wouldn’t expose this information.
From a stance of global morality, it seems quite scary for one company to oversee and then hide all the epistemic reasoning of their tools.
I’d also guess that the main issue I raised, should rarely be the main problem with o1. I think that there is some limit of epistemic quality you can reach without offending users. But this is mainly for questions like, “How likely are different religions”, not, ’what is the best way of coding this algorithm”, which is what o1 seems more targeted towards now.
So I’d imagine that most cases in which the reasoning steps of o1 would look objectionable, would be ones that are straightforward technical problems, like the system lying in some steps or reasoning in weird ways.
Also, knowledge of these steps might just make it easier to crack/hack o1.
If I were a serious API users of an o1-type system, I’d seriously want to see the reasoning steps, at very least. I imagine that over time, API users will be able to get a lot of this from these sorts of systems.
If it is the case that a frontier is hit when the vast majority of objectionable-looking steps are due to true epistemic disagreements, then I think there’s a different discussion to be had. It seems very safe to me to at least ensure that the middle steps are exposed to academic and government researchers. I’m less sure then of the implications of revealing this data to the public. It does seem like generally a really hard question to me. While I’m generally pro-transparency, if I were convinced then that full transparent reasoning would force these models to hold incorrect beliefs at a deeper level, I’d be worried.
I’m in support of this sort of work. Generally, I like the idea of dividing up LLM architectures into many separate components that could be individually overseen / aligned.
Separating “Shaggoth” / “Face” seems like a pretty reasonable division to me.
At the same time, there are definitely a lot of significant social+political challenges here.
I suspect that one reason why OpenAI doesn’t expose all the thinking of O1 is that this thinking would upset some users, especially journalists and such. It’s hard enough making sure that the final outputs are sufficiently unobjectionable to go public at a large scale. It seems harder to make sure the full set of steps is also unobjectionable.
One important thing that smart intellectuals do is to have objectionable/unpopular beliefs, but still present unobjectionable/popular outputs. For example, I’m sure many of us have beliefs that could get us cancelled by some group or other.
If the entire reasoning process is exposed, this might pressure it to be unobjectionable, even if that trades off against accuracy.
In general, I’m personally very much in favor of transparent thinking and argumentation. It’s just that I’ve noticed this as one fundamental challenge with intellectual activity, and I also expect to see it here.
One other challenge to flag—I imagine that the Shaggoth and Face layers would require (or at least greatly benefit from) some communication back and forth. An intellectual analysis could vary heavily depending on the audience. It’s not enough to do all the intellectual work in one pass, then match it to the audience after that.
For example, if an AI were tasked with designing changes to New York City, it might matter a lot that if the audience is a religious zealot.
One last tiny point—in future work, I’d lean against using the phrase “Shaggoth” as the reasoning step. It sort of makes sense to this audience, but I think it’s a mediocre fit here. For one, because I assume the “Face” could have some “Shaggoth” like properties.
Thanks!
I suspect the same thing, they almost come right out and say it: (emphasis mine)
I think this is a bad reason to hide the CoT from users. I am not particularly sympathetic to your argument, which amounts to ‘the public might pressure them to train away the inconvenient thoughts, so they shouldn’t let the public see the inconvenient thoughts in the first place.’ I think the benefits of letting the public see the CoT are pretty huge, but even if they were minor, it would be kinda patronizing and an abuse of power to hide them preemptively.
Thanks!
I suspect the same thing, they almost come right out and say it: (emphasis mine)
I think this is a bad reason to hide the CoT from users. I am not particularly sympathetic to your argument, which amounts to ‘the public might pressure them to train away the inconvenient thoughts, so they shouldn’t let the public see the inconvenient thoughts in the first place.’ I think the benefits of letting the public see the CoT are pretty huge, but even if they were minor, it would be kinda patronizing and an abuse of power to hide them preemptively.
I suspect the real reason is stopping competitors fine-tuning on o1′s CoT, which they also come right out and say:
Totally yeah that’s probably by far the biggest reason
I was attempting to make a descriptive claim about the challenges they would face, not a normative claim that it would be better if they wouldn’t expose this information.
From a stance of global morality, it seems quite scary for one company to oversee and then hide all the epistemic reasoning of their tools.
I’d also guess that the main issue I raised, should rarely be the main problem with o1. I think that there is some limit of epistemic quality you can reach without offending users. But this is mainly for questions like, “How likely are different religions”, not, ’what is the best way of coding this algorithm”, which is what o1 seems more targeted towards now.
So I’d imagine that most cases in which the reasoning steps of o1 would look objectionable, would be ones that are straightforward technical problems, like the system lying in some steps or reasoning in weird ways.
Also, knowledge of these steps might just make it easier to crack/hack o1.
If I were a serious API users of an o1-type system, I’d seriously want to see the reasoning steps, at very least. I imagine that over time, API users will be able to get a lot of this from these sorts of systems.
If it is the case that a frontier is hit when the vast majority of objectionable-looking steps are due to true epistemic disagreements, then I think there’s a different discussion to be had. It seems very safe to me to at least ensure that the middle steps are exposed to academic and government researchers. I’m less sure then of the implications of revealing this data to the public. It does seem like generally a really hard question to me. While I’m generally pro-transparency, if I were convinced then that full transparent reasoning would force these models to hold incorrect beliefs at a deeper level, I’d be worried.