It’s possibly just matter of how it’s prompted (the hidden system prompt). I’ve seen similar responses from GPT-4 based chatbots.
pseud
The cited markets often don’t support the associated claim.
“This question will resolve in the negative to the dollar amount awarded”
This is a clear, unambiguous statement.
If we can’t agree even on that, we have little hope of reaching any kind of satisfying conclusion here.
Further, if you’re going to accuse me of making things up (I think this is, in this case, a violation of the sensible frontpage commenting guideline “If you disagree, try getting curious about what your partner is thinking”) then I doubt it’s worth it to continue this conversation.
I think the situation is simple enough we can talk directly about how it is, rather than how it might seem.
The question itself does not imply any kind of net award, and the resolution criteria do not mention any kind of net reward. Further, the resolution criteria are worded in such a way that implies the question should not be resolved to a net award. So, if you are to make an argument in favour of a net award it would make sense to address why you are going against the resolution criteria and in doing so resolving to something other than the answer to the question asked.
Here are the resolution criteria, edited for improved readability:
-
This question will resolve to the total dollar amount awarded to Depp as a result of the ongoing jury trial.
-
In the event that no money is awarded or the jury does not find Heard responsible or the trial ends without a verdict this question will resolve to $0 USD.
-
In the event that this trial results in a monetary award for Amber Heard, including legal fees or other penalties imposed by a court, this question will resolve in the negative to the dollar amount awarded
Clause 3, which you quoted, is intended to come into effect only if clause 1 has not already come into effect (this is clear not just because it is the structure of the criteria, but also because otherwise we would reach a contradiction of resolving to both X and not-X). So, clause 3 is not meant to be and cannot be applied to the situation at hand.
Clause 3, even if it did apply to the situation at hand, makes no mention of a net award.
Clause 1, on the other hand, can be applied—following clause 1, the question would be resolved to the total dollar amount awarded to Depp (total, not less any anount), which would be appropriate because it precisley answers the actual question asked: “How much money will be awarded to Johnny Depp in his defamation suit against his ex-wife Amber Heard?”.
Now, you might nonetheless think that it is more reasonable to resolve to a net amount, despite that not being an answer to the question asked, and it being a resolution not supported by the resolution criteria, but if so it would be logical to make an argument for it not based on the resolution criteria, which do not support it. And it would make sense to address the fact that you are going against the resolution criteria and in doing so unneccesarily resolving to something other than the answer to the question asked.
-
Metaculus questions have a good track record of being resolved in a fair matter.
Do they? My experience has been the opposite. E.g. admins resolved “[Short Fuse] How much money will be awarded to Johnny Depp in his defamation suit against his ex-wife Amber Heard?” in an absurd manner* and refused to correct it when I followed up on it.
*they resolved it to something other than the amount awarded to Depp despite thatamount being the answer to the question and the correct resolution according to the resolution criteria
My comment wasn’t well written, I shouldn’t have used the word “complaining” in reference to what Said was doing. To clarify:
As I see it, there are two separate claims:
That the complaints prove that Said has misbehaved (at least a little bit)
That the complaints increase the probability that Said has misbehaved
Said was just asking questions—but baked into his questions is the idea of the significance of the complaints, and this significance seems to be tied to claim 1.
Jefftk seems to be speaking about claim 2. So, his comment doesn’t seem like a direct response to Said’s comment, although the point is still a relevant one.
It didn’t seem like Said was complaining about the reports being seen as evidence that it is worth figuring out whether thing could be better. Rather, he was complaining about them being used as evidence that things could be better.
It’s probably worth noting that Yudkowsky did not really make the argument for AI risk in his article. He says that AI will literally kill everyone on Earth, and he gives an example of how it might do so, but he doesn’t present a compelling argument for why it would.[0] He does not even mention orthogonality or instrumental convergence. I find it hard to blame these various internet figures who were unconvinced about AI risk upon reading the article.
[0] He does quote “the AI does not love you, nor does it hate you, and you are made of atoms it can use for something else.”
I’d prefer my comments to be judged simply by their content rather than have people’s interpretation coloured by some badge. Presumably, the change is a part of trying to avoid death-by-pacifism, during an influx of users post-ChatGPT. I don’t disagree with the motivation behind the change, I just dislike the change itself. I don’t like being a second-class citizen. It’s unfun. Karma is fun, “this user is below an arbitrary karma threshold” badges are not.
A badge placed on all new users for a set time would be fair. A badge placed on users with more than a certain amount of Karma could be fun. Current badge seems unfun—but perhaps I’m alone in thinking this.
Anybody else think it’s dumb to have new user leaves beside users who have been here for years? I’m not a new user. It doesn’t feel so nice to have a “this guy might not know what he’s talking about” badge by my name.
Like, there’s a good chance I’ll never pass 100 karma, or whatever the threshold is. So I’ll just have these leaves by my name forever?
To be clear, that it more-likely-than-not would want to kill everyone is the article’s central assertion. “[Most likely] literally everyone on Earth will die” is the key point. Yes, he doesn’t present a convincing argument for it, and that is my point.
The point isn’t that I’m unaware of the orthogonality thesis, it’s that Yudkowsky doesn’t present it in his recent popular articles and podcast appearances[0]. So, he asserts that the creation of superhuman AGI will almost certainly lead to human extinction (until massive amounts of alignment research has been successfully carried out), but he doesn’t present an argument for why that is the case. Why doesn’t he? Is it because he thinks normies cannot comprehend the argument? Is this not a black pill? IIRC he did assert that superhuman AGI would likely decide to use our atoms on the Bankless podcast, but he didn’t present a convincing argument in favour of that position.
[0] see the following: https://time.com/6266923/ai-eliezer-yudkowsky-open-letter-not-enough/ ,
,
Yud keeps asserting the near-certainty of human extinction if superhuman AGI is developed before we do a massive amount of work on alignment. But he never provides anything close to a justification for this belief. That makes his podcast appearances and articles unconvincing—a most surprising, and crucial part of his argument is left unsupported. Why has he made the decision to present his argument this way? Does he think there is no normie-friendly argument for the near-certainty of extinction? If so, it’s kind of a black pill with regard to his argument ever gaining enough traction to meaningfully slow down AI development.
edit: if any voters want to share their reasoning, I’d be interested in a discussion. What part do you disagree with? That Yudkowsky is not providing justification for the near-certainty of extinction? That this makes his articles and podcast appearances unconvincing? That this is a black pill?
Why not ask him for his reasoning, then evaluate it? If a person thinks there’s 10% x-risk over the next 100 years if we don’t develop superhuman AGI, and only a 1% x-risk if we do, then he’d suggest that anybody in favour of pausing AI progress was taking “unacceptable risks for the whole of himanity”.
I don’t like it. “The problem of creating AI that is superhuman at chess” isn’t encapsulated in the word “chess”, so you shouldn’t say you “solved chess” if what you mean is that you created an AI that is superhuman at chess. What it means for a game to be solved is widely-known and well-developed[0]. Using the exact same word, in extremely similar context, to mean something else seems unnecessarily confusing.
Nit: that’s not what “solved” means. Superhuman ability =/= solved.
My thoughts:
There is no reason to live in fear of the Christian God or any other traditional gods. However, there is perhaps a reason to live in fear of some identical things:
We live in a simulation run by a Christian, Muslim, Jew, etc., and he has decided to make his religion true in his simulation. There are a lot of religious people—if people or organisations gain the ability to run such simulations, there’s a good chance that many of these organisations will be religious, and their simulations influenced by this fact.
And the following situation seems more likely and has a somewhat similar result:
We develop some kind of aligned AI. This AI decides that humans should be rewarded according to how they conducted themselves in their lives.
How do we know there is no afterlife? I think there’s a chance there is.
Some examples of situations in which there is an afterlife:
We live in a simulation and whatever is running the simulation decided to set up an afterlife of some kind. Could be a collection of its favourite agents, or a reward for its best behaved ones, or etc.
We do not live in a simulation, but after the technological singularity an AI is able to reconstruct humans and decides to place them in a simulated world or to re-embody them
Various possibilities far beyond our current understanding of the world
But I don’t know what your reasoning is—maybe you have ruled out these and the various other possibilities.
I think that’s where these companies’ AI safety budgets go: make sure the AI doesn’t state obvious truths about the wrong things / represent the actually popular opinions on the wrong things.
I agree there’s nothing about consciousness specifically, but it’s quite different to the hidden prompt used for GPT-4 Turbo in ways which are relevant. Claude is told to act like a person, GPT is told that it’s a large language model. But I do now agree that there’s more to it than that (i.e., RLHF).