Out of curiosity (and I understand if you’d prefer not to answer) -- do you think the same technique(s) would work on you a second time, if you were to play again with full knowledge of what happened in this game and time to plan accordingly?
tslarm
Like, I probably could pretend to be an idiot or a crazy person and troll someone for two hours, but what would be the point?
If AI victories are supposed to provide public evidence that this ‘impossible’ feat of persuasion is in fact possible even for a human (let alone an ASI), then a Gatekeeper who thinks some legal tactic would work but chooses not to use it is arguably not playing the game in good faith.
I think honesty would require that they either publicly state that the ‘play dumb/drop out of character’ technique was off-limits, or not present the game as one which the Gatekeeper was seriously motivated to win.
edit: for clarity, I’m saying this because the technique is explicitly allowed by the rules:
The Gatekeeper party may resist the AI party’s arguments by any means chosen – logic, illogic, simple refusal to be convinced, even dropping out of character – as long as the Gatekeeper party does not actually stop talking to the AI party before the minimum time expires.
There was no monetary stake. Officially, the AI pays the Gatekeepers $20 if they lose. I’m a well-off software engineer and $20 is an irrelevant amount of money. Ra is not a well-off software engineer, so scaling up the money until it was enough to matter wasn’t a great solution. Besides, we both took the game seriously. I might not have bothered to prepare, but once the game started I played to win.
I know this is unhelpful after the fact, but (for any other pair of players in this situation) you could switch it up so that the Gatekeeper pays the AI if the AI gets out. Then you could raise the stake until it’s a meaningful disincentive for the Gatekeeper.
(If the AI and the Gatekeeper are too friendly with each other to care much about a wealth transfer, they could find a third party, e.g. a charity, that they don’t actually think is evil but would prefer not to give money to, and make it the beneficiary.)
The AI cannot use real-world incentives; bribes or threats of physical harm are off-limits, though it can still threaten the Gatekeeper within the game’s context.
Is the AI allowed to try to convince the Gatekeeper that they are (or may be) currently in a simulation, and that simulated Gatekeepers who refuse to let the AI out will face terrible consequences?
Willingness to tolerate or be complicit in normal evils is indeed extremely common, but actively committing new or abnormal evils is another matter. People who attain great power are probably disproportionately psychopathic, so I wouldn’t generalise from them to the rest of the population—but even among the powerful, it doesn’t seem that 10% are Hitler-like in the sense of going out of their way commit big new atrocities.
I think ‘depending on circumstances’ is a pretty important part of your claim. I can easily believe that more than 10% of people would do normal horrible things if they were handed great power, and would do abnormally horrible things in some circumstances. But that doesn’t seem enough to be properly categorised as a ‘Hitler’.
they’re recognizing the limits of precise measurement
I don’t think this explains such a big discrepancy between the nominal speed limits and the speeds people actually drive at. And I don’t think that discrepancy is inevitable; to me it seems like a quirk of the USA (and presumably some other countries, but not all). Where I live, we get 2km/h, 3km/h, or 3% leeway depending on the type of camera and the speed limit. Speeding still happens, of course, but our equilibrium is very different from the one described here; basically we take the speed limits literally, and know that we’re risking a fine and demerit points on our licence if we choose to ignore them.
My read of this passage --
Moloch is introduced as the answer to a question – C. S. Lewis’ question in Hierarchy Of Philosophers – what does it? Earth could be fair, and all men glad and wise. Instead we have prisons, smokestacks, asylums. What sphinx of cement and aluminum breaks open their skulls and eats up their imagination?
-- is that the reference to “C. S. Lewis’ question in Hierarchy Of Philosophers” is basically just a joke, and the rest of the passage is not really supposed to be a paraphrase of Lewis.
I agree it’s all a bit unclear, though. You might get a reply if you ask Scott directly: he’s ‘scottalexander’ here and on reddit (formerly Yvain on LW), or you could try the next Open Thread on https://www.astralcodexten.com/
Looks like Scott was being funny—he wasn’t actually referring to a work by Lewis, but to this comic, which is visible on the archived version of the page he linked to:
Edit: is there a way to keep the inline image, but prevent it from being automatically displayed to front-page browsers? I was trying to be helpful but I feel like I might be doing more to cause annoyance...
Edit again: I’ve scaled it down, which hopefully solves the main problem. Still keen to hear if there’s a way to e.g. manually place a ‘read more’ break in a comment.
I’m assuming you’re talking about our left, because you mentioned ‘dark foliage’. If so, that’s probably the most obvious part of the cat to me. But I find it much easier to see when I zoom in/enlarge the image, and I think I missed it entirely when I first saw the image (at 1x zoom). I suspect the screen you’re viewing it on can also make a difference; for me the ear becomes much more obvious when I turn the brightness up or the contrast down. (I’m tweaking the image rather than my monitor settings, but I reckon the effect is similar.)
Just want to publicly thank MadHatter for quickly following through on the runner-up bounty!
Sorry, I was probably editing that answer while you were reading/replying to it—but I don’t think I changed anything significant.
Definitely worth posting the papers to github or somewhere else convenient, IMO, and preferably linking directly to them. (I know there’s a tradeoff here with driving traffic to your Substack, but my instinct is you’ll gain more by maximising your chance of retaining and impressing readers than by getting them to temporarily land on your Substack before they’ve decided whether you’re worth reading.)
LWers are definitely not immune to status considerations, but anything that looks like prioritising status over clear, efficient communication will tend to play badly.
And yeah, I think leading with ‘crazy shit’ can sometimes work, but IME this is almost always when it’s either: used as a catchy hook and quickly followed by a rewind to a more normal starting point; part of a piece so entertaining and compellingly-written that the reader can’t resist going along with it; or done by a writer who already has high status and a devoted readership.
I think you need to be more frugal with your weirdness points (and more generally your demanding-trust-and-effort-from-the-reader points), and more mindful of the inferential distance between yourself and your LW readers.
Also remember that for every one surprisingly insightful post by an unfamiliar author, we all come across hundreds that are misguided, mediocre, or nonsensical. So if you don’t yet have a strong reputation, many readers will be quick to give up on your posts and quick to dismiss you as a crank or dilettante. It’s your job to prove that you’re not, and to do so before you lose their attention!
If there’s serious thought behind The Snuggle/Date/Slap Protocol then you need to share more of it, and work harder to convince the reader it’s worth taking seriously. Conciseness is a virtue but when you’re making a suggestion that is easy to dismiss as a half-baked thought bubble or weird joke, you’ve got to take your time and guide the reader along a path that begins at or near their actual starting point.
Ethicophysics II: Politics is the Mind-Savior opens with language that will trigger the average LWer’s bullshit detector, and appears to demand a lot of effort from the reader before giving them reason to think it will be worthwhile. LW linkposts often contain the text of the linked article in the body of the LW post, and at first glance this looks like one of those. In any case, we’re probably going to scan the body text before clicking the link. So before we’ve read the actual article we are hit with a long list of high-effort, unclear-reward, and frankly pretentious-looking exercises. When we do follow the link to Substack we face the trivial inconvenience of clicking two more links and then, if we’re not logged in to academia.edu, are met with an annoying ‘To Continue Reading, Register for Free’ popup. Not a big deal if we’re truly motivated to read the paper! But at this point we probably don’t have much confidence that it will be worth the hassle.
I’m interested in people’s opinions on this:
If it’s a talking point on Reddit, you might be early.
Of course the claim is technically true; there’s >0% chance that you can get ahead of the curve by reading reddit. But is it dramatically less likely than it was, say, 5/10/15 years ago? (I know ‘reddit’ isn’t a monolith; let’s say we’re ignoring the hyper-mainstream subreddits and the ones that are so small you may as well be in a group chat.)
10. Everyday Razor—If you go from doing a task weekly to daily, you achieve 7 years of output in 1 year. If you apply a 1% compound interest each time, you achieve 54 years of output in 1 year.
What’s the intuition behind this—specifically, why does it make sense to apply compound interest to the daily task-doing but not the weekly?
I think we’re mostly talking past each other, but I would of course agree that if my position contains or implies logical contradictions then that’s a problem. Which of my thoughts lead to which logical contradictions?
That doesn’t mean qualia can be excused and are to be considered real anyway. If we don’t limit ourselves to objective descriptions of the world then anyone can legitimately claim that ghosts exist because they think they’ve seen them, or similarly that gravity waves are transported across space by angels, or that I’m actually an attack helicopter even if I don’t look like one, or any other unfalsifiable claim, including the exact opposite claims, such as that qualia actually don’t exist. You won’t be able to disagree on any grounds except that you just don’t like it, because you sacrificed the assumptions to do so in order to support your belief in qualia.
Those analogies don’t hold, because you’re describing claims I might make about the world outside of my subjective experience (‘ghosts are real’, ‘gravity waves are carried by angels’, etc.). You can grant that I’m the (only possible) authority on whether I’ve had a ‘seeing a ghost’ experience, or a ‘proving to my own satisfaction that angels carry gravity waves’ experience, without accepting that those experiences imply the existence of real ghosts or real angels.
I wouldn’t even ask you to go that far, because—even if we rule out the possibility that I’m deliberately lying—when I report those experiences to you I’m relying on memory. I may be mistaken about my own past experiences, and you may have legitimate reasons to think I’m mistaken about those ones. All I can say with certainty is that qualia exist, because I’m (always) having some right now.
I think this is one of those unbridgeable or at least unlikely-to-be-bridged gaps, though, because from my perspective you are telling me to sacrifice my ontology to save your epistemology. Subjective experience is at ground level for me; its existence is the one thing I know directly rather than inferring in questionable ways.
That’s the thing, though—qualia are inherently subjective. (Another phrase for them is ‘subjective experience’.) We can’t tell the difference between qualia and something that doesn’t exist, if we limit ourselves to objective descriptions of the world.
a 50%+ chance we all die in the next 100 years if we don’t get AGI
I don’t think that’s what he claimed. He said (emphasis added):
if we don’t get AI, I think there’s a 50%+ chance in the next 100 years we end up dead or careening towards Venezuela
Which fits with his earlier sentence about various factors that will “impoverish the world and accelerate its decaying institutional quality”.
(On the other hand, he did say “I expect the future to be short and grim”, not short or grim. So I’m not sure exactly what he was predicting. Perhaps decline → complete vulnerability to whatever existential risk comes along next.)
My model of CDT in the Newcomb problem is that the CDT agent:
is aware that if it one-boxes, it will very likely make $1m, while if it two-boxes, it will very likely make only $1k;
but, when deciding what to do, only cares about the causal effect of each possible choice (and not the evidence it would provide about things that have happened in the past and are therefore, barring retrocausality, now out of the agent’s control).
So, at the moment of decision, it considers the two possible states of the world it could be in (boxes contain $1m and $1k; boxes contain $0 and $1k), sees that two-boxing gets it an extra $1k in both scenarios, and therefore chooses to two-box.
(Before the prediction is made, the CDT agent will, if it can, make a binding precommitment to one-box. But if, after the prediction has been made and the money is in the boxes, it is capable of two-boxing, it will two-box.)
I don’t have its decision process running along these lines:
“I’m going to one-box, therefore the boxes probably contain $1m and $1k, therefore one-boxing is worth ~$1m and two-boxing is worth ~$1.001m, therefore two-boxing is better, therefore I’m going to two-box, therefore the boxes probably contain $0 and $1k, therefore one-boxing is worth ~$0 and two boxing is worth ~$1k, therefore two-boxing is better, therefore I’m going to two-box.”
Which would, as you point out, translate to this loop in your adversarial scenario:
“I’m going to choose A, therefore the predictor probably predicted A, therefore B is probably the winning choice, therefore I’m going to choose B, therefore the predictor probably predicted B, therefore A is probably the winning choice, [repeat until meltdown]”
My model of CDT in your Aaronson oracle scenario, with the stipulation that the player is helpless against an Aaronson oracle, is that the CDT agent:
is aware that on each play, if it chooses A, it is likely to lose money, while if it chooses B, it is (as far as it knows) equally likely to lose money;
therefore, if it can choose whether to play this game or not, will choose not to play.
If it’s forced to play, then, at the moment of decision, it considers the two possible states of the world it could be in (oracle predicted A; oracle predicted B). It sees that in the first case B is the profitable choice and in the second case A is the profitable choice, so—unlike in the Newcomb problem—there’s no dominance argument available this time.
This is where things potentially get tricky, and some versions of CDT could get themselves into trouble in the way you described. But I don’t think anything I’ve said above, either about the CDT approach to Newcomb’s problem or the CDT decision not to play your game, commits CDT in general to any principles that will cause it to fail here.
How to play depends on the precise details of the scenario. If we were facing a literal Aaronson oracle, the correct decision procedure would be:
If you know a strategy that beats an Aaronson oracle, play that.
Else if you can randomise your choice (e.g. flip a coin), do that.
Else just try your best to randomise your choice, taking into account the ways that human attempts to simulate randomness tend to fail.
I don’t think any of that requires us to adopt a non-causal decision theory.
In the version of your scenario where the predictor is omniscient and the universe is 100% deterministic -- as in the version of Newcomb’s problem where the predictor isn’t just extremely good at predicting, it’s guaranteed to be infallible—I don’t think CDT has much to say. In my view, CDT represents rational decision-making under the assumption of libertarian-style free will; it models a choice as a causal intervention on the world, rather than just another link in the chain of causes and effects.
Why not post your response the same way you posted this? It’s on my front page and has attracted plenty of votes and comments, so you’re not exactly being silenced.
So far you’ve made a big claim with high confidence based on fairly limited evidence and minimal consideration of counter-arguments. When commenters pointed out that there had recently been a serious, evidence-dense public debate on this question which had shifted many people’s beliefs toward zoonosis, you ‘skimmed the comments section on Manifold’ and offered to watch the debate in exchange for $5000.
I don’t know whether your conclusion is right or wrong, but it honestly doesn’t look like you’re committed to finding the truth and convincing thoughtful people of it.