PhD candidate in sociology and statistics at the University of Chicago. Co-founder of the Sentience Institute. Word cloud: agency, benchmarks, causality, digital minds, generalization, HCI, moral circle expansion, NLP, RLHF, robustness
Jacy Reese Anthis
I don’t think the “actual claim” is necessarily true. You need more assumptions than a fixed difficulty of AGI, assumptions that I don’t think everyone would agree with. I walk through two examples in my comment: one that implies “Gradual take-off implies shorter timelines” and one that implies “Gradual take-off implies longer timelines.”
I agree with this post that the accelerative forces of gradual take-off (e.g., “economic value… more funding… freeing up people to work on AI...”) are important and not everyone considers them when thinking through timelines.
However, I think the specific argument that “Gradual take-off implies shorter timelines” requires a prior belief that not everyone shares, such as a prior that an AGI of difficulty D will occur in the same year in both timelines. I don’t think such a prior is implied by “conditioned on a given level of “AGI difficulty””. Here are two example priors, one that leads to “Gradual take-off implies shorter timelines” and one that leads to the opposite. The first sentence of each is most important.
Gradual take-off implies shorter timelines
Step 1: (Prior) Set AGI of difficulty D to occur at the same year Y in the gradual and sudden take-off timelines.
Step 2: Notice that the gradual take-off timeline has AIs of difficulties like 0.5D sooner, which would make AGI occur sooner than Y because of the accelerative forces of “economic value… more funding… freeing up people to work on AI...” etc. Therefore, move AGI occurrence in gradual take-off from Y to some year before Y, such as 0.5Y.
=> AGI occurs at 0.5Y in the gradual timeline and Y in the sudden timeline.
Gradual take-off implies longer timelines
Step 1: (Prior) Set AI of difficulty 0.5D to occur at the same year Y in the gradual and sudden take-off timelines. To fill in AGI of difficulty D in each timeline, suppose that both are superlinear but sudden AGI arrives at exactly Y and gradual AGI arrives at 1.5Y.
Step 2: Notice that the gradual take-off timeline has AIs of difficulties like 0.25D sooner, which would make AGI occur sooner than Y because of the accelerative forces of “economic value… more funding… freeing up people to work on AI...” etc. Therefore, move 0.5D AI occurrence in gradual take-off from Y to some year before Y, such as Y/2, and move AGI occurrence in gradual take-off correspondingly from 1.5Y to 1.25Y.
=> AGI occurs at 1.25Y in the gradual timeline and Y in the sudden timeline.
By the way, this is separate from Stefan_Schubert’s critique that very short timelines are possible with sudden take-off but not with gradual take-off, which I personally think can be considered a counterexample if we treat the impossibility of gradual take-off as “long” but not really a counterexample if we just consider the shortness comparison to be indeterminate because there are no very short gradual timelines.- Apr 21, 2022, 1:18 PM; 4 points) 's comment on For every choice of AGI difficulty, conditioning on gradual take-off implies shorter timelines. by (
I think another important part of Pearl’s journey was that during his transition from Bayesian networks to causal inference, he was very frustrated with the correlational turn in early 1900s statistics. Because causality is so philosophically fraught and often intractable, statisticians shifted to regressions and other acausal models. Pearl sees that as throwing out the baby (important causal questions and answers) with the bathwater (messy empirics and a lack of mathematical language for causality, which is why he coined the do operator).
Pearl discusses this at length in The Book of Why, particularly the Chapter 2 sections on “Galton and the Abandoned Quest” and “Pearson: The Wrath of the Zealot.” My guess is that Pearl’s frustration with statisticians’ focus on correlation was immediate upon getting to know the field, but I don’t think he’s publicly said how his frustration began.
This model was produced by fine-tuning DeBERTa XL on a dataset produced by contractors labeling a bunch of LM-generated completions to snippets of fanfiction that were selected by various heuristics to have a high probability of being completed violently.
I think you might have better performance if you train your own DeBERTa XL-like model with classification of different snippets as a secondary objective alongside masked token prediction, rather than just fine-tuning with that classification after the initial model training. (You might use different snippets in each step to avoid double-dipping the information in that sample, analogous to splitting text data for causal inference, e.g., Egami et al 2018.) The Hugging Face DeBERTa XL might not contain the features that would be most useful for the follow-up task of nonviolence fine-tuning. However, that might be a less interesting exercise if you want to build tools for working with more naturalistic models.
I appreciate making these notions more precise. Model splintering seems closely related to other popular notions in ML, particularly underspecification (“many predictors f that a pipeline could return with similar predictive risk”), the Rashomon effect (“many different explanations exist for the same phenomenon”), and predictive multiplicity (“the ability of a prediction problem to admit competing models with conflicting predictions”), as well as more general notions of generalizability and out-of-sample or out-of-domain performance. I’d be curious what exactly makes model splintering different. Some example questions: Is the difference just the alignment context? Is it that “splintering” refers specifically to features and concepts within the model failing to generalize, rather than the model as a whole failing to generalize? If so, what does it even mean for the model as a whole to fail to generalize but not features failing to generalize? Is it that the aggregation of features is not a feature? And how are features and concepts different from each other, if they are?
I think most thinkers on this topic wouldn’t think of those weights as arbitrary (I know you and I do, as hardcore moral anti-realists), and they wouldn’t find it prohibitively difficult to introduce those weights into the calculations. Not sure if you agree with me there.
I do agree with you that you can’t do moral weight calculations without those weights, assuming you are weighing moral theories and not just empirical likelihoods of mental capacities.
I should also note that I do think intertheoretic comparisons become an issue in other cases of moral uncertainty, such as with infinite values (e.g. a moral framework that absolutely prohibits lying). But those cases seem much harder than moral weights between sentient beings under utilitarianism.
I don’t think the two-elephants problem is as fatal to moral weight calculations as you suggest (e.g. “this doesn’t actually work”). The two-envelopes problem isn’t a mathematical impossibility; it’s just an interesting example of mathematical sleight-of-hand.
Brian’s discussion of two-envelopes is just to point out that moral weight calculations require a common scale across different utility functions (e.g. the decision to fix the moral weight of a human at 1 whether you’re using brain size, all-animals-are-equal, unity-weighting, or any other weighing approach). It’s not to say that there’s a philosophical or mathematical impossibility in doing these calculations, as far as I understand.
FYI I discussed this a little with Brian before commenting, and he subsequently edited his post a little, though I’m not yet sure if we’re in agreement on the topic.
I strongly agree. There are two claims here. The weak one is that, if you hold complexity constant, directed acyclic graphs (DAGs; Bayes nets or otherwise) are not necessarily any more interpretable than conventional NNs because NNs are DAGs at that level. I don’t think anyone who understands this claim would disagree with it.
But that is not the argument being put forth by Pearl/Marcus/etc. and arguably contested by LeCun/etc.; they claim that in practice (i.e., not holding anything constant), DAG-inspired or symbolic/hybrid AI approaches like Neural Causal Models have interpretability gains without much if any drop in performance, and arguably better performance on tasks that matter most. For example, they point to the 2021 NetHack Challenge, a difficult roguelike video game where non-NN performance still exceeds NN performance.
Of course there’s not really a general answer here, only specific answers to specific questions like, “Will a NN or non-NN model win the 2024 NetHack challenge?”