Thank you for this detailed review, David. Replies to selected points:
I think that “trying to come up with your own answers first” is reasonable advice for studying many topics. The details of how that looks like, and how much time to spend on that might vary from person to person. I think that every aspiring alignment researcher should decide for themself where is the best balance for their own personal skill set and learning style. However, I do think that you should at least study the foundation (statistical & computational learning theory and algorithmic information theory) before thinking about new mathematical approaches to alignment. Moreover, after coming up with your own answer, you definitely should study the answers of other people to understand the overlap, the differences and the advantages of each. I feel that in alignment especially, people too often reinvent the wheel and write long posts about “what if just make the AI do X”, unaware that X was already discussed 20 times in the past.
Regard “idealized models” vs “inscrutable kludgery”: The property of “inscrutability” is in the map, not in the territory. That fact we don’t understand how ANNs work is a fact about our current state ignorance, not about the inherent inscrutability of ANNs. There is also no law of nature which says that any future AI paradigm cannot have better theoretical foundation. Moreover, the way you talk about “naive Bayes” seems confused. In learning theory, we have a relatively small number of metrics and desiderata for learning algorithms (e.g. sample complexity, regret) and a large variety of actual algorithms (kernel methods, convex optimization, UCB, Thompson sampling, Q-learning etc). Many different algorithms can satisfy similar desiderata. And, e.g. algorithms that have low regret are automatically asymptotically Bayes-optimal. So, if e.g. transformers have low regret w.r.t. some natural prior (which I think is very likely), then they are approximately “naive” Bayes. Ofc finding this prior and characterizing the rate of convergence is a difficult task, but even so considerable progress has already been made in simple cases (e.g. feedforward ANNs with 2 or 3 layers), and there are some promising avenues for further progress (e.g. singular learning theory).
From my perspective, the main motivation to think about “ontological crises” is not that we’re worried our AI will have a crisis and start doing something wrong, but that the fact we’re confused about this question means we don’t know what is even the correct type signature of values, which seems pretty important to figure out before we can talk about aligning the AI’s values to our own values. Similarly, issues like Occam’s razor and anthropics are important primarily not because of specific failure modes (although I do think simulation hypotheses are a serious concern), but because our confusion about them is a symptom of our confusion about the concepts of “intelligence” and “agency” in general (see also the rocket alignment parable and cannonballs that circumnavigate the Earth vs sending a spaceship to the Moon).
Thank you for this detailed review, David. Replies to selected points:
I think that “trying to come up with your own answers first” is reasonable advice for studying many topics. The details of how that looks like, and how much time to spend on that might vary from person to person. I think that every aspiring alignment researcher should decide for themself where is the best balance for their own personal skill set and learning style. However, I do think that you should at least study the foundation (statistical & computational learning theory and algorithmic information theory) before thinking about new mathematical approaches to alignment. Moreover, after coming up with your own answer, you definitely should study the answers of other people to understand the overlap, the differences and the advantages of each. I feel that in alignment especially, people too often reinvent the wheel and write long posts about “what if just make the AI do X”, unaware that X was already discussed 20 times in the past.
Regard “idealized models” vs “inscrutable kludgery”: The property of “inscrutability” is in the map, not in the territory. That fact we don’t understand how ANNs work is a fact about our current state ignorance, not about the inherent inscrutability of ANNs. There is also no law of nature which says that any future AI paradigm cannot have better theoretical foundation. Moreover, the way you talk about “naive Bayes” seems confused. In learning theory, we have a relatively small number of metrics and desiderata for learning algorithms (e.g. sample complexity, regret) and a large variety of actual algorithms (kernel methods, convex optimization, UCB, Thompson sampling, Q-learning etc). Many different algorithms can satisfy similar desiderata. And, e.g. algorithms that have low regret are automatically asymptotically Bayes-optimal. So, if e.g. transformers have low regret w.r.t. some natural prior (which I think is very likely), then they are approximately “naive” Bayes. Ofc finding this prior and characterizing the rate of convergence is a difficult task, but even so considerable progress has already been made in simple cases (e.g. feedforward ANNs with 2 or 3 layers), and there are some promising avenues for further progress (e.g. singular learning theory).
From my perspective, the main motivation to think about “ontological crises” is not that we’re worried our AI will have a crisis and start doing something wrong, but that the fact we’re confused about this question means we don’t know what is even the correct type signature of values, which seems pretty important to figure out before we can talk about aligning the AI’s values to our own values. Similarly, issues like Occam’s razor and anthropics are important primarily not because of specific failure modes (although I do think simulation hypotheses are a serious concern), but because our confusion about them is a symptom of our confusion about the concepts of “intelligence” and “agency” in general (see also the rocket alignment parable and cannonballs that circumnavigate the Earth vs sending a spaceship to the Moon).