There’s also a failure mode of focusing on “which arguments are the best” instead of “what is actually true”. I don’t understand this failure mode very well, except that I’ve seen myself and others fall into it. Falling into it looks like focusing a lot on specific arguments, and spending a lot of time working out what was meant by the words, rather than feeling comfortable adjusting arguments to fit better into your own ontology and to fit better with your own beliefs.
My sense is that this is because different people have different intuitive priors, and process arguments (mostly) as a kind of Bayesian evidence that updates those priors, rather than modifying the priors (i.e. intuitions) directly.
Eliezer in particular strikes me as having an intuitive prior for AI alignment outcomes that looks very similar to priors for tasks like e.g. writing bug-free software on the first try, assessing the likelihood that a given plan will play out as envisioned, correctly compensating for optimism bias, etc. which is what gives rise to posts concerning concepts like security mindset.
Other people don’t share this intuitive prior, and so have to be argued into it. To such people, the reliability of the arguments in question is actually critical, because if those arguments turn out to have holes, that reverts the downstream updates and restores the original intuitive prior, whatever it looked like—kind of like a souped up version of the burden of proof concept, where the initial placement of that burden is determined entirely via the intuitive judgement of the individual.
This also seems related to why different people seem to naturally gravitate towards either conjunctive or disjunctive models of catastrophic outcomes from AI misalignment: the conjunctive impulse stems from an intuition that AI catastrophe is a priori unlikely, and so a bunch of different claims have to hold simultaneously in order to force a large enough update, whereas the disjunctive impulse stems from the notion that any given low-level claim need not be on particularly firm ground, because the high-level thesis of AI catastrophe robustly manifests via different but converging lines of reasoning.
See also: the focus on coherence, where some people place great importance on the question of whether VNM or other coherence theorems show what Eliezer et al. purport they show about superintelligent agents, versus the competing model wherein none of these individual theorems are important in their particulars, so much as the direction they seem to point, hinting at the concept of what idealized behavior with respect to non-gerrymandered physical resources ought to look like.
I think the real question, then, is where these differences in intuition come from, and unfortunately the answer might have to do a lot with people’s backgrounds, and the habits and heuristics they picked up from said backgrounds—something quite difficult to get at via specific, concrete argumentation.
different people have different intuitive priors, and process arguments (mostly) as a kind of Bayesian evidence that updates those priors, rather than modifying the priors (i.e. intuitions) directly.
I’m not sure I understand this distinction as-written. How is a Bayesian agent supposed to modify priors except by updating on the basis of evidence?
How is a Bayesian agent supposed to modify priors except by updating on the basis of evidence?
They’re not! But humans aren’t ideal Bayesians, and it’s entirely possible for them to update in a way that does change their priors (encoded by intuitions) moving forward. In particular, the difference between having updated one’s intuitive prior, and keeping the intuitive prior around but also keeping track of a different, consciously held posterior, is that the former is vastly less likely to “de-update”, because the evidence that went into the update isn’t kept around in a form that subjects it to (potential) refutation.
(IIRC, E.T. Jaynes talks about this distinction in Chapter 18 of Probability Theory: The Logic of Science, and he models it by introducing something he calls an A_p distribution. His exposition of this idea is uncharacteristically unclear, and his A_p distribution looks basically like a beta distribution with specific values for α and β, but it does seem to capture the distinction I see between “intuitive” updating versus “conscious” updating.)
My sense is that this is because different people have different intuitive priors, and process arguments (mostly) as a kind of Bayesian evidence that updates those priors, rather than modifying the priors (i.e. intuitions) directly.
Eliezer in particular strikes me as having an intuitive prior for AI alignment outcomes that looks very similar to priors for tasks like e.g. writing bug-free software on the first try, assessing the likelihood that a given plan will play out as envisioned, correctly compensating for optimism bias, etc. which is what gives rise to posts concerning concepts like security mindset.
Other people don’t share this intuitive prior, and so have to be argued into it. To such people, the reliability of the arguments in question is actually critical, because if those arguments turn out to have holes, that reverts the downstream updates and restores the original intuitive prior, whatever it looked like—kind of like a souped up version of the burden of proof concept, where the initial placement of that burden is determined entirely via the intuitive judgement of the individual.
This also seems related to why different people seem to naturally gravitate towards either conjunctive or disjunctive models of catastrophic outcomes from AI misalignment: the conjunctive impulse stems from an intuition that AI catastrophe is a priori unlikely, and so a bunch of different claims have to hold simultaneously in order to force a large enough update, whereas the disjunctive impulse stems from the notion that any given low-level claim need not be on particularly firm ground, because the high-level thesis of AI catastrophe robustly manifests via different but converging lines of reasoning.
See also: the focus on coherence, where some people place great importance on the question of whether VNM or other coherence theorems show what Eliezer et al. purport they show about superintelligent agents, versus the competing model wherein none of these individual theorems are important in their particulars, so much as the direction they seem to point, hinting at the concept of what idealized behavior with respect to non-gerrymandered physical resources ought to look like.
I think the real question, then, is where these differences in intuition come from, and unfortunately the answer might have to do a lot with people’s backgrounds, and the habits and heuristics they picked up from said backgrounds—something quite difficult to get at via specific, concrete argumentation.
I’m not sure I understand this distinction as-written. How is a Bayesian agent supposed to modify priors except by updating on the basis of evidence?
They’re not! But humans aren’t ideal Bayesians, and it’s entirely possible for them to update in a way that does change their priors (encoded by intuitions) moving forward. In particular, the difference between having updated one’s intuitive prior, and keeping the intuitive prior around but also keeping track of a different, consciously held posterior, is that the former is vastly less likely to “de-update”, because the evidence that went into the update isn’t kept around in a form that subjects it to (potential) refutation.
(IIRC, E.T. Jaynes talks about this distinction in Chapter 18 of Probability Theory: The Logic of Science, and he models it by introducing something he calls an A_p distribution. His exposition of this idea is uncharacteristically unclear, and his A_p distribution looks basically like a beta distribution with specific values for α and β, but it does seem to capture the distinction I see between “intuitive” updating versus “conscious” updating.)