I posit that the reason that humans are able to solve any coordination problems at all is that evolution has shaped us into game players that apply something vaguely like a tit-for-tat strategy meant to enforce convergence to a nearby Schelling Point / Nash Equilibrium, and to punish defectors from this Schelling Point / Nash Equilibrium. I invoke a novel mathematical formalization of Kant’s Categorical Imperative as a potential basis for coordination towards a globally computable Schelling Point. I believe that this constitutes a promising approach to the alignment problem, as the mathematical formalization is both simple to implement and reasonably simple to measure deviations from. Using this formalization would therefore allow us both to prevent and detect misalignment in powerful AI systems. As a theory of change, I believe that applying RLHF to LLM’s using a strong and consistent formalization of the Categorical Imperative is a plausible and reasonably direct route to good outcomes in the prosaic case of LLM’s, and I believe that LLM’s with more neuromorphic components added are a strong contender for a pathway to AGI.
So, this could be an abstract at the beginning of the sequence, and the individual articles could approximately provide evidence for sentences in this abstract.
Or you could do it the Eliezer’s way, and start with posting the articles that provide evidence for the individual sentences (each article containing its own summary), and only afterwards post an article that ties it all together. This way would allow readers to evaluate each article on its own merits, without being distracted by whether they agree or disagree with the conclusion.
It is possible that you have actually tried to do exactly this, but speaking for myself, I never would have guessed so from reading the original articles.
(Also, if your first article gets downvoted, please pause and reflect on that fact. Either your idea is wrong and readers express disagreement, or it is just really badly written and readers express confusion. In either case, pushing forward is not helpful.)
Is this clear enough:
I posit that the reason that humans are able to solve any coordination problems at all is that evolution has shaped us into game players that apply something vaguely like a tit-for-tat strategy meant to enforce convergence to a nearby Schelling Point / Nash Equilibrium, and to punish defectors from this Schelling Point / Nash Equilibrium. I invoke a novel mathematical formalization of Kant’s Categorical Imperative as a potential basis for coordination towards a globally computable Schelling Point. I believe that this constitutes a promising approach to the alignment problem, as the mathematical formalization is both simple to implement and reasonably simple to measure deviations from. Using this formalization would therefore allow us both to prevent and detect misalignment in powerful AI systems. As a theory of change, I believe that applying RLHF to LLM’s using a strong and consistent formalization of the Categorical Imperative is a plausible and reasonably direct route to good outcomes in the prosaic case of LLM’s, and I believe that LLM’s with more neuromorphic components added are a strong contender for a pathway to AGI.
Much better.
So, this could be an abstract at the beginning of the sequence, and the individual articles could approximately provide evidence for sentences in this abstract.
Or you could do it the Eliezer’s way, and start with posting the articles that provide evidence for the individual sentences (each article containing its own summary), and only afterwards post an article that ties it all together. This way would allow readers to evaluate each article on its own merits, without being distracted by whether they agree or disagree with the conclusion.
It is possible that you have actually tried to do exactly this, but speaking for myself, I never would have guessed so from reading the original articles.
(Also, if your first article gets downvoted, please pause and reflect on that fact. Either your idea is wrong and readers express disagreement, or it is just really badly written and readers express confusion. In either case, pushing forward is not helpful.)
Recorded a sort of video lecture here: https://open.substack.com/pub/bittertruths/p/semi-adequate-equilibria