Two spins only works for two possible answers. Do you need N spins for N answers?
SilentCal
Many norm violations have specific victims.
I don’t think it’s just a matter of long vs. short term that makes or breaks backwards chaining—it’s more a matter of the backwards branching factor.
For chess, this is enormous—you can’t disjunctively consider every possible mate, nor can you break them into useful categories to reason about. And for each possible mate, there are too many immediate predecessors to them to get useful informaton. You can try to break the mates into categories and reason about those, but the details are so important here that you’re unlikely to get any insights more useful than “removing the opponent’s pieces while keeping mine is a good idea”.
Fighting a war is a bit better—since you mention Imperial Japan in another comment, let’s sketch their thought process. (I might garble some details, but I think it’ll work for our purposes) Their end goal was roughly that western powers not break up the Japanese Empire. Ways this might happen: a) Western powers are diplomatically convinced not to intervene. b) Japan uses some sort of deterrent threat to convince Western powers not to intervene. c) Japan’s land forces can fight off any attempted attack on their empire. d) Japan controls the seas, so foreign powers can’t deliver strong attacks. This is a short enough list that you can consider them one by one, and close enough to exhaustive to make the exercise have some value. Choosing the latter pretty much means abandoning a clean backward chain, which you should be willing to do, but the backwards chain has already done a lot for you! And it’s possible that with the US’s various advantages, a decisive battle was the only way to get even a decent chance at a war win, in which case the paths do victory do converge there and Japan was right to backwards chain from that, even if it didn’t work out in the end.
As for defense budgets, you might consider that we’re backwards chaining on the question “How to make the world better on a grand scale?” You might get a few options: a) Reduce poverty, b) cure diseases, c) prevent wars, d) mitigate existential risk. Probably not exhaustive, but again, this short list contains enough of the solution space to make the exercise worthwhile. Looking into c), you might group wars into categories and decide that “US-initiated invasions” is a large category that could be solved all at once, much more easily than, say, “religious civil wars”. And from there, you could very well end up thinking about the defense budget.
Datum: The existence of this prize has spurred me to put actual some effort into AI alignment, for reasons I don’t fully understand—I’m confident it’s not about the money, and even the offer of feedback isn’t that strong an incentive, since I think anything worthwhile I posted on LW would get feedback anyway.
My guess is that it sends the message that the Serious Real Researchers actually want input from random amateur LW readers like me.
Also, the first announcement of the prize rules was in one ear and out the other for me. Reading this announcement of the winners is what made it click for me that this is something I should actually do. Possibly because I had previously argued on LW with one of the winners in a way that made my brain file them as my equal (admittedly, the topic of that was kinda bike-sheddy, but system 1 gonna system 1).
This. I’ve decided that I’m done with organizing paper. Anything I’ll ever need to read again, I make digital from the start. But I still use paper routinely, in essentially write-only fashion.
This is also a great thing about whiteboards—they foreclose even the option of creating management burden for yourself.
Honestly I’m not sure Oracles are the best approach either, but I’ll push the Pareto frontier of safe AI design wherever I can.
Though I’m less worried about the epistemic flaws exacerbating a box-break—it seems an epistemically healthy AI breaking its box would be maximally bad already—but more about the epistemic flaws being prone to self-correction. For instance, if the AI constructs a subagent of the ‘try random stuff, repeat whatever works’ flavor.
The practical difference is that the counterfactual oracle design doesn’t address side-channel attacks, only unsafe answers.
Internally, the counterfactual oracle is implemented via the utility function: it wants to give an answer that would be accurate if it were unread. This puts no constraints on how it gets that answer, and I don’t see any way extend the technique to cover the reasoning process.
My proposal is implemented via a constraint on the AI’s model of the world. Whether this is actually possible depends on the details of the AI; anything of a “try random stuff, repeat whatever gets results” nature would make it impossible, but an explicitly Bayesian thing like the AIXI family would be amenable. I think this is why Stuart works with the utility function lately, but I don’t think you can get a safe Oracle this way without either creating an agent-grade safe utility function or constructing a superintelligence-proof traditional box.
Epiphenomenal Oracles Ignore Holes in the Box
I’m not sure your refutation of the leverage penalty works. If there really are 3 ↑↑↑ 3 copies of you, your decision conditioned on that may still not be to pay. You have to compare
P(A real mugging will happen) x U(all your copies die)
against
P(fake muggings happen) x U(lose five dollars) x (expected number of copies getting fake-mugged)
where that last term will in fact be proportional to 3 ↑↑↑ 3. Even if there is an incomprehensibly vast matrix, its Dark Lords are pretty unlikely to mug you for petty cash. And this plausibly does make you pay in the Muggle case, since P(fake muggings happen) is way down if ‘mugging’ involves tearing a hole in the sky.
I think I disagree with your approach here.
I, and I think most people in practice, use reflective equilibrium to decide what our ethics are. This means that we can notice that our ethical intuitions are insensitive to scope, but also that upon reflection it seems like this is wrong, and thus adopt an ethics different from that given by our naive intuition.
When we’re trying to use logic to decide whether to accept an ethical conclusion counter to our intuition, it’s no good to document what our intuition currently says as if that settles the matter.
A priori, 1,000 lives at risk may seem just as urgent as 10,000. But we think about it, and we do our best to override it.
And in fact, I fail pretty hard at it. I’m pretty sure the amount I give to charity wouldn’t be different in a world where the effectiveness of the best causes were an order of magnitude different. I suspect this is true of many; certainly anyone following the Giving What We Can pledge is using an ancient Schelling Point rather than any kind of calculation. But that doesn’t mean you can convince me that my “real” ethics doesn’t care how many lives are saved.
When we talk about weird hypotheticals like Pascallian deals, we aren’t trying to figure out what our intuition says; we’re trying to figure out whether we should overrule it.
I get that old formalism isn’t viable, but I don’t see how that obviates the completeness question. “Is it possible that (e.g.) Goldbach’s Conjecture has no counterexamples but cannot be proven using any intuitively satisfying set of axioms?” seems like an interesting* question, and seems to be about the completeness of mathematics-the-social-activity. I can’t cash this out in the politics metaphor because there’s no real political equivalent to theorem proving.
*Interesting if you don’t consider it resolved by Godel, anyway.
>If you don’t assume that mathematics is a formal logic, then worrying about mathematics does not lead one to consider completeness of mathematics in the first place.
To make sure I understand this right: This is because there are definitely computationally-intractable problems (e.g. 3^^^^^3-digit multiplication), so mathematics-as-a-social-activity is obviously incomplete?
Towards a Rigorous Model of Virtue-Signalling
Okay, I was kinda bored while reading this, but after reading it I asked myself how much modest epistomology I used in my life. I realized I wasn’t even at the level of ignoring my immodest inside-view estimates —I wasn’t generating them!
I’m now in the process of seriously evaluating the success chances of the creative ideas I’ve had over the years, which I’m realizing I never actually did. I put real (though hobby-level) work into one once, and I’ve long regarded quitting my day job someday as “a serious possibility”, but I just felt not allowed to generate an honest answer to “how likely would this be to succeed”.
And guess what, this evaluation shows I’m an idiot for keeping my ideas on the back burner as much as I have.
Agreed with Raemon that this was kinda boring. Chapters/sections weren’t part of it for me, either. Just seemed to beat a dead horse a bit, especially after the rest of InEq.
I wouldn’t have bothered with this criticism, except that I find the divided reaction interesting.
Anyone else hearing “Ride of the Valkyries” in their head?
Upvoted because I enjoyed reading it, and therefore personally want more stuff like it. Its shortcomings are real, in particular the concept of “not enough money to facilitate transactions” needs to be fleshed out. I only want more like it on the assumption that this doesn’t funge against other Yudkowsky posts.
I think the Gaffe Theory is approximately correct. My sense is that there are two Overton Windows, one for what serious candidates can say, and one for what a mainstream publication can print an op-ed about.
I think I have a similar problem. I sometimes just fake the signal. Partly I worry that my insincerity shows, but I also suspect that guilt/shame displays are just becoming devalued in general.
My best solution is to display a (genuine) determination to do better in the future—in fact, I’ve basically made that my personal definition of an apology. The only trouble is that I can’t do this when I don’t actually feel I’ve acted wrongly, which is especially a problem insofar as guilt for things that aren’t your fault is sometimes expected. c.f. some theories about survivors’ guilt.
Here’s something puzzling me: in terms of abstract description, enlightenment sounds a lot like dissociation. Yet I’m under the impression that those who experience the former tend to find it Very Good, while those who experience the latter tend to find it Very Bad.