I explicitly asked Anthropic whether they had a policy of not releasing models significantly beyond the state of the art. They said no, and that they believed Claude 3 was noticeably beyond the state of the art at the time of its release.
dogiv
The situation at Zaporizhzhia (currently) does not seem to be an impending disaster. The fire is/was in an administrative building. Fires at nuclear power plants can be serious, but the reactor buildings are concrete and would not easily catch fire due to nearby shelling or other external factors.
Some click-seekers on Twitter have made comparisons to Chernobyl. That kind of explosion cannot happen accidentally at Zaporizhzhia (it’s a safer power plant design with sturdy containment structures surrounding the reactors). If the Russians wanted to cause a massive radioactive cloud like Chernobyl, they would have to use their own explosives, and I think it would take a very big bomb to do it. They would have to blow the roof off the containment building first, and then somehow break open the massive steel reactor vessel and spread the contents into the air.
A Fukushima-style meltdown also does not look very plausible unless someone takes over the plant and intentionally disables safety systems.
More info here: https://mobile.twitter.com/isabelleboemeke/status/1499594126521679872
https://mobile.twitter.com/BeCurieus/status/1499604899990052866
Sounds like something GPT-3 would say...
Alternatively, aging (like most non-discrete phenotypes) may be omnigenic.
Thanks for posting this, it’s an interesting idea.
I’m curious about your second-to-last paragraph: if our current evidence already favored SSA or SIA (for instance, if we knew that an event occurred in the past that had a small chance of creating a huge number of copies of each human, but we also know that we are not copies), wouldn’t that already have been enough to update our credence in SSA or SIA? Or did you mean that there’s some other category of possible observations, which is not obviously evidence one way or the other, but under this UDT framework we could still use it to make an update?
I’m curious who is the target audience for this scale…
People who have an interest in global risks will find it simplistic—normally I would think of the use of a color scale as aimed at the general public, but in this case it may be too simple even for the curious layman. The second picture you linked, on the other hand, seems like a much more useful way to categorize risks (two dimensions, severity vs urgency).
I think this scale may have some use in trying to communicate to policy makers who are unfamiliar with the landscape of GCRs, and in particularly to try to get them to focus on the red and orange risks that currently get little interest. But where is the platform for that communication to happen? It seems like currently the key conversations would be happening at a more technical level, in DoD, DHS, or FEMA. A focus on interventions would be helpful there. I couldn’t get the whole paper, but from what you wrote above it sounds like you have some interesting ideas about ranking risks based on a combination of probability and possible interventions. If that could be formalized, I think it would make the whole idea a lot stronger. Like you say, people are reasonably skeptical about probabilities (even if they’re just an order of magnitude) but if you can show that the severity of the risk isn’t very sensitive to probability, maybe it would help to overcome that obstacle.
Note also that non-alphanumeric symbols are hard to google. I kind of guessed it from context but couldn’t confirm until I saw Kaj’s comment.
Separately, and more important, the way links are displayed currently makes it hard to tell if a link has already been visited. Also if you select text you can’t see links anymore.
Firefox 57 on Windows 10.
I am ecountering some kind of error when opening the links here to rationalsphere and single conversational locus. When I open them, a box pops up that says “Complete your profile” and asks me to enter my email address (even though I used my email to log in in the first place). When I type it in and press submit, I get the error: {”id”:”app.mutation_not_allowed”,”value”:”\”usersEdit\” on _id \”BSRa9LffXLw4FKvTY\”″}
I think this is an excellent approach to jargon and I appreciate the examples you’ve given. There is too much tendency, I think, for experts in a field to develop whatever terminology makes their lives easiest (or even in some cases makes them “sound smart”) without worrying about accessibility to newcomers.
… but maybe ideally hints at a broader ecosystem of ideas
This sounds useful, but very hard to do in practice… do you know of a case where it’s successful?
Thanks for posting!
I haven’t read your book yet but I find your work pretty interesting. I hope you won’t mind a naive question… you’ve mentioned non-sunlight-dependent foods like mushrooms and leaf tea. Is it actually possible for a human to survive on foods like this? Has anybody self-experimented with it?
By my calculation, a person who needs 1800 kcals/day would have to eat about 5 kg of mushrooms. Tea (the normal kind, anyway) doesn’t look any better.
Bacteria fed by natural gas seems like a very promising food source—and one that might even be viable outside of catastrophe scenarios. Apparently it’s being used for fish feed already.
You are assuming that all rational strategies are identical and deterministic. In fact, you seem to be using “rational” as a stand-in for “identical”, which reduces this scenario to the twin PD. But imagine a world where everyone makes use of the type of supperrationality you are positing here—basically, everyone assumes people are just like them. Then any one person who switches to a defection strategy would have a huge advantage. Defecting becomes the rational thing to do. Since everybody is rational, everybody switches to defecting—because this is just a standard one-shot PD. You can’t get the benefits of knowing the opponent’s source code unless you know the opponent’s source code.
The first section is more or less the standard solution to the open source prisoner’s dilemma, and the same as what you would derive from a logical decision theory approach, though with different and less clear terminology than what is in the literature.
The second section, on application to human players, seems flawed to me (as does the claim that it applies to superintelligences who cannot see each other’s source code). You claim the following conditions are necessary:
A and B are rational
A and B know each other’s preferences
They are each aware of 1 and 2
But in fact, your concept of predisposing oneself relies explicitly on having access to the other agent’s source code (and them having access to yours). If you know the other agent does not have access to your source code, then it is perfectly rational to predispose yourself to defect, whether or not you predict that the other agent has done the same. Cooperating only makes sense if there’s a logical correlation between your decision to cooperate and your opponent’s decision to cooperate; both of you just being “rational” does not make your decision processes identical.
“Recurrent Decision Theory” is not a meaningful idea to develop based on this post; just read and understand the existing work on UDT/FDT and you will save yourself some trouble.
I think many of us “rationalists” here would agree that rationality is a tool for assessing and manipulating reality. I would say much the same about morality. There’s not really a dichotomy between morality being “grounded on evolved behavioral patterns” and having “a computational basis implemented somewhere in the brain and accessed through the conscious mind as an intuition”. Rather, the moral intuitions we have are computed in our brains, and the form of that computation is determined both by the selection pressures of evolution and the ways that our evolved brain structures interact with our various environments.
So what is our highest priority here? It’s neither Rationality nor Truth, but Morality in the broad sense—the somewhat arbitrary and largely incoherent set of states of reality that our moral intuition prefers. I say arbitrary because our moral intuition does not aim entirely at the optimization target of the evolutionary process that generated it—propagating our genes. Call that moral relativism if you want to.
I think this is an interesting and useful view, if applied judiciously. In particular, it will always tend to be most relevant for crony beliefs—beliefs that affect the belief-holder’s life mainly through other people’s opinions of them, like much of politics and some of religion. When it comes to close-up stuff that can cause benefit or harm directly, you will find that most people really do have a model of the world. When you ask someone whether so-and-so would make a good president, the answer is often a signal about their cultural affiliations. Ask them which is the fastest way to get to where they work, and the answer reflects what they’ve learned about rush-hour traffic patterns. Ask people if they believe in God, and the answer is a signal. Ask them if they believe pre-marital sex is ever acceptable, and the answer you get is a lot more practical.
It’s also worth unpacking the us-vs-them terminology you employ here. Many of us may tend to be more literal than the average person (especially those who fall on the spectrum) but in my experience we are still prone to this same behavior. In most cases, there’s nothing wrong with that. Understanding the difference can help us avoid trying to cooperatively world-model with people who are just expressing social beliefs, and can also help us recognize world-modeling when we see it, so that we can reduce our tendency to make snap judgements about people on the basis of the beliefs they express.
This doesn’t actually seem to match the description. They only talk about having used one laser, with two stakes, whereas your diagram requires using two lasers. Your setup would be quite difficult to achieve, since you would somehow have to get both lasers perfectly horizontal; I’m not sure a standard laser level would give you this kind of precision. In the version they describe, they level the laser by checking the height of the beam on a second stake. This seems relatively easy.
My guess is they just never did the experiment, or they lied about the result. But it would be kind of interesting to repeat it sometime.
Thanks, that’s an interesting perspective. I think even high-level self-modification can be relatively safe with sufficient asymmetry in resources—simulated environments give a large advantage to the original, especially if the successor can be started with no memories of anything outside the simulation. Only an extreme difference in intelligence between the two would overcome that.
Of course, the problem of transmitting values to a successor without giving it any information about the world is a tricky one, since most of the values we care about are linked to reality. But maybe some values are basic enough to be grounded purely in math that applies to any circumstances.
If visible precommitment by B requires it to share the source code for its successor AI, then it would also be giving up any hidden information it has. Essentially both sides have to be willing to share all information with each other, creating some sort of neutral arbitration about which side would have won and at what cost to the other. That basically means creating a merged superintelligence is necessary just to start the bargaining process, since they each have to prove to the other that the neutral arbiter will control all relevant resources to prevent cheating.
Realistically, there will be many cases where one side thinks its hidden information is sufficient to make the cost of conflict smaller than the costs associated with bargaining, especially given the potential for cheating.
I’ve read a couple of Lou Keep’s essays in this series and I find his writing style very off-putting. It seems like there’s a deep idea about society and social-economic structures buried in there, but it’s obscured by a hodgepodge of thesis-antithesis and vague self-reference.
As best I can tell, his point is that irrational beliefs like belief in magic (specifically, protection from bullets) can be useful for a community (by encouraging everyone to resist attackers together) even though it is not beneficial to the individual (since it doesn’t prevent death when shot). He relates this to Seeing Like A State, in that any attempt by the state to increase legibility by clarifying the benefits makes them disappear.
He further points out that political and economic policies tend to focus on measurable effects, whereas the ultimate point of governments and economies is to improve the subjective wellbeing of people (happiness, although he says that’s just a stand-in for something else he doesn’t feel like explaining).
Extending that, he thinks we have probably lost some key cultural traditions that were very important to the quality of people’s lives, but weren’t able to thrive in a modern economic setting. He doesn’t give any examples of that, although he mentions marriages and funerals as examples of traditions that have survived. Still, it seems plausible.
Overall, it reminds me of Scott Alexander’s essay How the West was Won, about the advance of universalist (capitalist) culture and its ability to out-compete traditional systems whether or not it actually improves people’s lives. Moloch is also relevant.
It’s very likely I’ve missed a key aspect here. If anyone knows what it is, please let me know.
And to elaborate a little bit (based on my own understanding, not what they told me) their RSP sort of says the opposite. To avoid a “race to the bottom” they base the decision to deploy a model on what harm it can cause, regardless of what models other companies have released. So if someone else releases a model with potentially dangerous capabilities, Anthropic can’t/won’t use that as cover to release something similar that they wouldn’t have released otherwise. I’m not certain whether this is the best approach, but I do think it’s coherent.