“We are computer scientists. We do not lack in faith.” (Ketan Mulmuley)
MadHatter
Oh, come on. If the rationality community disapproved of Einstein predicting the transit of Mercury, that’s an L for the rationality community, not for Einstein.
I have offered to say why I believe it to be true, as soon as I can get clearance from my company to publish capabilities relevant theoretical neuroscience work.
That’s fair, and I need to do a better job of building on-ramps for different readers. My most recent shortform is an attempt to build such an on-ramp for the LessWrong memeplex.
That’s fair (strong up/agree vote).
If you consult my recent shortform, I lay out a more measured, skeptical description of the project. Basically, ethicophysics constitutes a globally computable Schelling Point, such that it can be used as a protocol between different RL agents that believe in “oughts” to achieve Pareto-optimal outcomes. As long as the largest coalition agrees to prefer Jesus to Hitler, I think (and I need to do far more to back this up) defectors can be effectively reined in, the same way that Bitcoin works because the majority of the computers hooked up to it don’t want to destroy faith in the Bitcoin protocol.
Ethicophysics for Skeptics
Or, what the fuck am I talking about?
In this post, I will try to lay out my theories of computational ethics in as simple, skeptic-friendly, non-pompous language as I am able to do. Hopefully this will be sufficient to help skeptical readers engage with my work.
The ethicophysics is a set of computable algorithms that suggest (but do not require) specific decisions in response to ethical decisions in a multi-player reinforcement learning problem.
The design goal that the various equations need to satisfy is that they should select a uniquely identifiable Schelling Point and Nash Equilibrium, such that all participants in the reinforcement learning algorithm who follow the ethicophysical algorithms will cooperate to achieve a Pareto-optimal outcome with high total reward, and such that any non-dominant coalition of defectors can be contained and if necessary neutralized by the larger, dominant coalition of players following the strategies selected by the ethicophysics.
The figure of merit, then, that determines if the ethicophysical algorithms are having the intended effect, is the divergence of the observed outcome from a notional outcome that would be achieved by using pure coordination and pure cooperation between all players. The existence of this divergence between what is and what could be in the absence of coordination problems is roughly what I take to be the content of Scott Alexander’s post on Moloch. I denote this phenomenon (“Moloch”) by the philosophical term “collective akrasia”, since it is a failure of communities to exercise self-mastery that is roughly isomorphic to the classic philosophical problem of akrasia. Rather than the classical “why do I not do as I ought?” question, it is rather a matter of “why do we not do as we ought?”, where we refers to the community under consideration.
So then the question is, what algorithms should we use to minimize collective akrasia? Phrased in this simple language, it becomes clear that many important civilizational problems fall under this rubric; in particular, climate change is maybe 10% a technical issue and 90% a coordination problem. In particular, China is unwilling to cooperate with the US because it feels that the US is “pulling the ladder up after itself” in seeking to limit the emissions of China’s rapidly industrializing society more than the US’s reasonably mature and arguably stagnating industrial base.
So we (as a society) find ourselves in need of a set of algorithms to decide what is “fair”, in a way that is visibly “the best we can do”, in the sense of Pareto optimality, but also in some larger, more important sense of minimizing collective akrasia, or “the money we are leaving on the table”, or “Moloch”.
The conservation laws defined in Ethicophysics I and Ethicophysics II operate as a sort of “social fact” technology to establish common knowledge of what is fair and what is not fair. Once a large and powerful coalition of agents has common knowledge of a computable Schelling Point and Nash Equilibrium, we can simply steer towards that Schelling Point and punish those who defect against the selected Schelling Point in a balanced, moderate way, that is simply chosen to incentivize cooperation.
So, my goal in publishing and promulgating the ethicophysical results I have proved so far is to allow people who are interested in solving the problem of aligning powerful intelligences to also join a large coalition of people who are steering towards a mutually compatible Schelling Point that is more Jesus-flavored and less Hitler-flavored, which seems like something that all reasonable people can get behind.
I thus argue that, far from being a foreign irritant that needs to be expelled from the LessWrong memetic ecosystem, the ethicophysics is a rigorous piece of mathematics and technology that is both necessary and sufficient to deliver us from our collective nightmare of collective akrasia, or “Moloch”, which is one of the higher and more noble ambitions espoused by the effective altruist community.
In future posts, I will review the extant results in the ethicophysics, particularly the conservation laws, and show them to be intuitively plausible descriptions of inarguably real phenomena. For instance, the Law of Conservation of Bullshit would translate into something much like the Simulacra Levels of a popular LessWrong post, in which self-interested actors who stop tracking the true meanings of things over time develop collective akrasia so thorough and so entrenched that they lose the ability to say true things even when they are trying to.
Stay tuned for further updates. Probably the next post will simply treat some very simple cause-and-effect ethical word problems, such as the classic “Can you lie to the Nazis about the location of someone they are looking for?”
MadHatter’s Shortform
They really suck. The old paradigm of Alzheimer’s research is very weak and, as I understand it, no drug has an effect size sufficient to offset even a minimal side effect profile, to the point where I think only one real drug has been approved by the FDA in the old paradigm, and that approval was super controversial. That’s my understanding, anyway. I welcome correction from anyone who knows better.
So maybe we should define the effect size in terms of cogntiive QALY’s? Say, an effective treatment should at least halve the rate of decline of the experimental arm relative to the control arm, with a stretch goal of bringing the decline to within the nuisance levels of normal, non-Alzheimer’s aging, and an even stretchier stretch goal of reversing the condition to the point where the former Alzheimer’s patient starts learning new skills and acquiring new hobbies.
Here is the best I could muster on short notice: https://bittertruths.substack.com/p/ethicophysics-for-skeptics
Since I’m currently rate-limited, I cannot post it officially.
How will we handle the desk drawer effect, where insignificant results are quietly shelved? I guess if the trial is preregistered this won’t happen...
https://chat.openai.com/share/068f5311-f11a-43fe-a2da-cbfc2227de8e
Here are ChatGPT’s speculations on how much it would cost to run this study. I invite any interested reader to work on designing this study. I can also write up my theories as to why this etiology is plausible in arbitrary detail if that is decision-relevant to someone with either grant money or interest in helping to code up the smartphone app we would need. to collect the relevant measurements cheaply. (Intuitively, it would be something like a Dual N-Back app, but more user-friendly for Alzheimer’s patients.)
I can put together some sort of proposal tonight, I suppose.
OK, let’s do it. Your nickel against my $100.
What resolution criteria should we use? Perhaps the first RCT that studies a treatment I deem sufficiently similar has to find a statistically significant effect with a publishable effect size? Or should we require that the first RCT that studies a similar treatment is halted halfway through because it would be unethical for the control group not to receive the treatment? (We could have a side bet on the latter, perhaps.)
What would the study look like? Presumably scores on a standard cognitive test designed to measure decline of Alzheimer’s patients, with four arms: control, Zoloft, Trazadone, and Zoloft + Trazadone (with all of the other components of the treatment, in particular the EPA/DHA, constant for all non-control subjects). Let me know if you have any thoughts on the study design or if I should put together a grant proposal to stsudy this.
OK, sounds good! Consider it a bet.
I wouldn’t say I really do satire? My normal metier is more “the truth, with jokes”. If I’m acting too crazy to be considered a proper rationalist, it’s usually because I am angry or at least deeply annoyed.
OK. I can only personally afford to be wrong to the tune of about $10K, which would be what, $5 on your part? Did I do that math correctly?
OK, anybody who publicly bets on my predicted outcome to the RCT wins the right to engage me in a LessWrong dialogue on a topic of their choosing, in which I will politely set aside my habitual certainty and trollish demeanor.
Well that should be straightforward, and is predicted by my model of serotonin’s function in the brain. It would require an understanding of the function of orexin, which I do not currently possess, beyond the standard intuition that it modulates hunger.
The evolutionary story would be this:
serotonin functions (in my model) to make an agent satisficing, which has many desirable safety properties, e.g. not getting eaten by predators when you forage unnecessarily
the most obvious and important desire to satisfy (and neurally mark as satisfied) is the hunger for food modulated by the hormone/neurotransmitter orexin
the most obvious mechanism (and thus the one I predict) is that serotonergic bacteria in the gut activate some neural population in the gut’s “second brain”, sending a particular neural signal bundle to the primary brain consistent with malnutrition (there are many details here that I have not worked out and which could be usefully worked on by a qualified theoretical neuroscientist)
this neural signal bundle would necessarily up(???)modulate the orexin signal(???)
sustained high levels of orexin lead to autocannibalism of the brain through sustained neural pruning
Well what’s the appropriate way to act in the face of the fact that I AM sure I am right? I’ve been offering public bets of the nickel of some high-karma person versus my $100, which seems like a fair and attractive bet for anyone who doubts my credibility and ability to reason about the things I am talking about.
I will happily bet anyone with significant karma that Yudkowsky will find my work on the ethicophysics valuable a year from now, at the odds given above.
Historically, I have been extremely, extremely good at delaying publication of what I felt were capabilites-relevant advances, for essentially Yudkowskyan doomer reasons. The only reward I have earned for this diligence is to be treated like a crank when I publish alignment-related research because I don’t have an extensive history of public contribution to the AI field.
Here is my speculation of what Q* is, along with a github repository that implements a shitty version of it, postdated several months.
And now I am officially rate-limited to one post per week. Be sure to go to my substack if you are curious about what I am up to.
And I’m happy to code up the smartphone app and run the clinical trial from my own funds. My uncle is starting to have memory trouble, I believe.