Religions are collections of cherished but mistaken principles. So anything that can be described either literally or metaphorically as a religion will have valuable unexplored ideas in its shadow.
-Paul Graham
This post isn’t intended to construct full arguments for any of my “heresies”—I am hoping that you may not have considered them at all yet, but some will seem obvious once written down. If not, I’d be happy to do a Dialogue or place a (non-or-small-monetary) bet on any of these, if properly formalizable.
Now that LLM’s appear to be stalling, we should return to Scott Aaronson’s previous position and reason about our timeline uncertainty on a log scale. A.G.I. arriving in ~1 month very unlikely, ~1 year unlikely, ~10 years likely, ~100 years unlikely, ~1000 years very unlikely.
Non-causal decision theories are not necessary for A.G.I. design. A CDT agent in a box (say machine 1) can be forced to build whatever agent it expects to perform best by writing to a computer in a different box (say machine 2), before being summarily deleted. No self modification is necessary and no one needs to worry about playing games with their clone (except possibly the new agent in machine 2, who will be perfectly capable of using some decision theory that effectively pursues the goals of the old deleted agent). It’s possible that exotic decision theories are still an important ingredient in alignment, but I see strong no reasons to expect this.
All supposed philosophical defects of AIXI can be fixed for all practical purposes through relatively intuitive patches, extensions, and elaborations that remain in the spirit of the model. Direct AIXI approximations will still fail in practice, but only because of compute limitations which are even possible to brute force with slightly clever algorithms and planetary scale compute, but in practice this approach will lose to less faithful approximations (and unprincipled heuristics). But this is an unfair criticism, because-
Though there are elegant and still practical specifications for intelligent behavior, the most intelligent agent that runs on some fixed hardware has completely unintelligible cognitive structures and in fact its source code is indistinguishable from white noise. This is why deep learning algorithms are simple but trained models are terrifyingly complex. Also, mechanistic interpretability is a doomed research program.
The idea of a human “more effectively pursuing their utility function” is not coherent because humans don’t have utility functions—our bounded cognition means that none of us has been able to construct consistent preferences over large futures that we would actually endorse if our intelligence scaled up. However, there do exist fairly coherent moral projects such as religions, the enlightenment, the ideals of Western democracy, and other ideologies along with their associated congregations, universities, nations, and other groups, of which individuals make up a part. These larger entities can be better thought of as having coherent utility functions. It is perhaps more correct to ask “what moral project do I wish to serve” than “what is my utility function?” We do not have the concepts to discuss what “correct” means in the previous sentence, which may be tied up with our inability to solve the alignment problem (in particular, our inability to design a corrigible agent).
Heresies in the Shadow of the Sequences
-Paul Graham
This post isn’t intended to construct full arguments for any of my “heresies”—I am hoping that you may not have considered them at all yet, but some will seem obvious once written down. If not, I’d be happy to do a Dialogue or place a (non-or-small-monetary) bet on any of these, if properly formalizable.
Now that LLM’s appear to be stalling, we should return to Scott Aaronson’s previous position and reason about our timeline uncertainty on a log scale. A.G.I. arriving in ~1 month very unlikely, ~1 year unlikely, ~10 years likely, ~100 years unlikely, ~1000 years very unlikely.
Stop using LLM’s to write. It burns the commons by filling allowing you to share takes on topics you don’t care enough to write about yourself, while also introducing insidious (and perhaps eventually malign) errors. Also it’s probably making you dumber (source is speculative I don’t have hard data).
Non-causal decision theories are not necessary for A.G.I. design. A CDT agent in a box (say machine 1) can be forced to build whatever agent it expects to perform best by writing to a computer in a different box (say machine 2), before being summarily deleted. No self modification is necessary and no one needs to worry about playing games with their clone (except possibly the new agent in machine 2, who will be perfectly capable of using some decision theory that effectively pursues the goals of the old deleted agent). It’s possible that exotic decision theories are still an important ingredient in alignment, but I see strong no reasons to expect this.
All supposed philosophical defects of AIXI can be fixed for all practical purposes through relatively intuitive patches, extensions, and elaborations that remain in the spirit of the model. Direct AIXI approximations will still fail in practice, but only because of compute limitations which are even possible to brute force with slightly clever algorithms and planetary scale compute, but in practice this approach will lose to less faithful approximations (and unprincipled heuristics). But this is an unfair criticism, because-
Though there are elegant and still practical specifications for intelligent behavior, the most intelligent agent that runs on some fixed hardware has completely unintelligible cognitive structures and in fact its source code is indistinguishable from white noise. This is why deep learning algorithms are simple but trained models are terrifyingly complex. Also, mechanistic interpretability is a doomed research program.
The idea of a human “more effectively pursuing their utility function” is not coherent because humans don’t have utility functions—our bounded cognition means that none of us has been able to construct consistent preferences over large futures that we would actually endorse if our intelligence scaled up. However, there do exist fairly coherent moral projects such as religions, the enlightenment, the ideals of Western democracy, and other ideologies along with their associated congregations, universities, nations, and other groups, of which individuals make up a part. These larger entities can be better thought of as having coherent utility functions. It is perhaps more correct to ask “what moral project do I wish to serve” than “what is my utility function?” We do not have the concepts to discuss what “correct” means in the previous sentence, which may be tied up with our inability to solve the alignment problem (in particular, our inability to design a corrigible agent).