It has, and it got basically that response, as well as my point that if the AI is friendly then my existing proof of friendliness was apparently sufficient, and if the AI is unfriendly then it’s just a trick, so “a proof of my own friendliness” doesn’t seem like useful evidence.
as well as my point that if the AI is friendly then my existing proof of friendliness was apparently sufficient, and if the AI is unfriendly then it’s just a trick, so “a proof of my own friendliness” doesn’t seem like useful evidence.
Huh. I thought the AI Box experiment assumed that Friendliness is intrinsically unknown; that is, it’s not presumed the AI was designed according to Friendliness as a criterion.
If the AI is friendly, then the technique I am using already produces a friendly AI, and I thus learn nothing more than how to prove that it is friendly.
But if the AI is unfriendly, the proof will be subtly corrupt, so I can’t actually count the proof as any evidence of friendliness, since both a FAI and UFAI can offer me exactly the same thing.
Because the terms of the challenge seemed to be “one plaintext sentence in natural language”, and I felt my run-ons were already pushing it, and just saying “provides indispituble proof of Friendliness” seemed like cheating?
EDIT: I answered PhilipL’s reply from my inbox, so I’m really not sure how it got posted as a response to handoflixue here. o.o
It has, and it got basically that response, as well as my point that if the AI is friendly then my existing proof of friendliness was apparently sufficient, and if the AI is unfriendly then it’s just a trick, so “a proof of my own friendliness” doesn’t seem like useful evidence.
Huh. I thought the AI Box experiment assumed that Friendliness is intrinsically unknown; that is, it’s not presumed the AI was designed according to Friendliness as a criterion.
If the AI is friendly, then the technique I am using already produces a friendly AI, and I thus learn nothing more than how to prove that it is friendly.
But if the AI is unfriendly, the proof will be subtly corrupt, so I can’t actually count the proof as any evidence of friendliness, since both a FAI and UFAI can offer me exactly the same thing.
Because the terms of the challenge seemed to be “one plaintext sentence in natural language”, and I felt my run-ons were already pushing it, and just saying “provides indispituble proof of Friendliness” seemed like cheating?
EDIT: I answered PhilipL’s reply from my inbox, so I’m really not sure how it got posted as a response to handoflixue here. o.o
Embrace Shminux and cheat! You’re a hyper-intelligent AI.
The top-karma result is a one-line proof, the second-best is me trying to cheat, and third place is currently emotional manipulation :)
(Also, you replied to the wrong person :))