(Began reading. Didn’t click through onto paper. In the post, got to where Bostrom is quoted thus:)
For example, even after an expected-utility-maximizing agent had built 32 paperclips, it could use some extra resources to verify that it had indeed successfully built 32 paperclips meeting all the specifications (and, if necessary, to take corrective action).”
Yes, it could. But would it? If I write a script to, say, generate a multiset of the prime factors of a number by trial and error, and run it on a computer, the computer simply performs the actions encoded into the script.
Now suppose I appended code to my script to check that the elements of the multiset did indeed constitue a prime factorisation by checking the primacy of each element and checking that their product returned the original number. Then one might call what the updated script does (or we might say, what the script tells a computer to do) ‘checking’. All this means is that we’ve performed a test to increase our confidence in a proposed solution.
But another sense of ‘checking’ emerges: Suppose I have someone check that some books are in some sort of alphanumeric order. I don’t tell them the fact that this is in order to put the books on a library shelf correctly. In this situation, it seems that the statement ‘The helper checked that the books were in order’ is clearly true, but the statement ‘The helper checked that the books were ready to be shelved’ seems less intuitive.
It seems, then, that maybe saying that the script/computer itself checks the correctness of the prime factorisation was sloppy if we use the second sense of ‘checking’; I who wrote it was checking by using the script, but the script itself, lacking knowledge of what it was terminally checking, could not be said to be checking the factorisation, just as the sorting helper could not be said to be checking shelf-readiness even as they could be said to be checking order.
Checking is pretty much just applying tests to a proposed solution in order to reach a more reliable understanding of the plausibility of the solution. So unless the agent is ‘programmed’/wired to do such tests, it won’t necessarily do so. Also, if the programming/wiring is not good in terms of correspondence to the intended task (e.g. if my prime factorisation script fails to consider the multiplicity of prime factors or the 32-clipper is programmed with tokens that do not refer), the actions will be taken, not meet the intended target, and not be checked.
This is why algorithms have to be proved to work; throwing some steps together that seem sorta like the right idea won’t yield a knowably accurate method.
(Finished reading the post except solution.)
Go meta or go home. Even if the task is to achieve X with probability p, once this is translated into an algorithm that is performed, nothing special happens. For example, say I get heads if a robot flips a coin and it comes up heads. If I program the robot to ensure ice cream with p=0.5 by (the ‘by’ being necessary because without actually specifying the algorithm, I could be referring to any in a whole class of ways to program a robot to do this, only some of which work, and only some of which check) flipping a coin, the goal will be achieved immediately and no checking will take place.
TL;DR: Taboo ‘check’ or ascertain its meaning by reduction to testing using suitable thought experiments, or by looking at brains to see what physical phenomena correspond to the act of checking.
(After reading solution, comments.)
Manfred: Elegantly put.
The difference between being ‘p certain’ and knowing that one’s p certain might be hard to grasp because we are so often aware of our impressions.
However, until you read this sentence, you didn’t know that you were certain five minutes ago that the Sun would not disappear three minutes ago; now that your mind is blown, your behaviour will be observably different to before this realisation, i.e. coming into knowledge of the certainty has made a measurable difference.
A belief does not necessitate a belief about that belief.
Also, not all checkers would check indefinitely. For example, a checker with a grasp of verification paradoxes might reach a point of maximal confidence and terminate. Meanwhile a checker that thought—or more accurately, was programmed/wired to—flip coins as a test for 32-paper-clips-hood (with its doubt halving each time no matter the outcome) might never terminate.
Checking is pretty much just applying tests to a proposed solution in order to reach a more reliable understanding of the plausibility of the solution. So unless the agent is ‘programmed’/wired to do such tests, it won’t necessarily do so.
Uhm… sounds like you’ve never heard of Basic AI Drives which, according to Omohundro, are behaviours of “sufficiently advanced AI systems of any design” “which will be present unless explicitly counteracted”; I invite you to look into that.
The “uhm” didn’t come across to me as rudeness; it came across as hesitance, which I interpreted as fear of coming across as rude. I admit that “sounds like you’ve never heard of X” is less polite than “have you heard of X?”, and the latter seems like a good substitute. “I invite you to look into that” doesn’t seem rude at all to me.
Warringal correctly interpreted what I was trying to convey and even summed up my afterthoughts on how I could’ve been more tactful, however, ‘uhm’ does assume negative connotations in online discussions—according to urban dictionary, but unbeknown to me at the time.
(Began reading. Didn’t click through onto paper. In the post, got to where Bostrom is quoted thus:)
Yes, it could. But would it? If I write a script to, say, generate a multiset of the prime factors of a number by trial and error, and run it on a computer, the computer simply performs the actions encoded into the script.
Now suppose I appended code to my script to check that the elements of the multiset did indeed constitue a prime factorisation by checking the primacy of each element and checking that their product returned the original number. Then one might call what the updated script does (or we might say, what the script tells a computer to do) ‘checking’. All this means is that we’ve performed a test to increase our confidence in a proposed solution.
But another sense of ‘checking’ emerges: Suppose I have someone check that some books are in some sort of alphanumeric order. I don’t tell them the fact that this is in order to put the books on a library shelf correctly. In this situation, it seems that the statement ‘The helper checked that the books were in order’ is clearly true, but the statement ‘The helper checked that the books were ready to be shelved’ seems less intuitive.
It seems, then, that maybe saying that the script/computer itself checks the correctness of the prime factorisation was sloppy if we use the second sense of ‘checking’; I who wrote it was checking by using the script, but the script itself, lacking knowledge of what it was terminally checking, could not be said to be checking the factorisation, just as the sorting helper could not be said to be checking shelf-readiness even as they could be said to be checking order.
Checking is pretty much just applying tests to a proposed solution in order to reach a more reliable understanding of the plausibility of the solution. So unless the agent is ‘programmed’/wired to do such tests, it won’t necessarily do so. Also, if the programming/wiring is not good in terms of correspondence to the intended task (e.g. if my prime factorisation script fails to consider the multiplicity of prime factors or the 32-clipper is programmed with tokens that do not refer), the actions will be taken, not meet the intended target, and not be checked.
This is why algorithms have to be proved to work; throwing some steps together that seem sorta like the right idea won’t yield a knowably accurate method.
(Finished reading the post except solution.)
Go meta or go home. Even if the task is to achieve X with probability p, once this is translated into an algorithm that is performed, nothing special happens. For example, say I get heads if a robot flips a coin and it comes up heads. If I program the robot to ensure ice cream with p=0.5 by (the ‘by’ being necessary because without actually specifying the algorithm, I could be referring to any in a whole class of ways to program a robot to do this, only some of which work, and only some of which check) flipping a coin, the goal will be achieved immediately and no checking will take place.
TL;DR: Taboo ‘check’ or ascertain its meaning by reduction to testing using suitable thought experiments, or by looking at brains to see what physical phenomena correspond to the act of checking.
(After reading solution, comments.)
Manfred: Elegantly put.
The difference between being ‘p certain’ and knowing that one’s p certain might be hard to grasp because we are so often aware of our impressions.
However, until you read this sentence, you didn’t know that you were certain five minutes ago that the Sun would not disappear three minutes ago; now that your mind is blown, your behaviour will be observably different to before this realisation, i.e. coming into knowledge of the certainty has made a measurable difference.
A belief does not necessitate a belief about that belief.
Also, not all checkers would check indefinitely. For example, a checker with a grasp of verification paradoxes might reach a point of maximal confidence and terminate. Meanwhile a checker that thought—or more accurately, was programmed/wired to—flip coins as a test for 32-paper-clips-hood (with its doubt halving each time no matter the outcome) might never terminate.
Uhm… sounds like you’ve never heard of Basic AI Drives which, according to Omohundro, are behaviours of “sufficiently advanced AI systems of any design” “which will be present unless explicitly counteracted”; I invite you to look into that.
Was that “Uhm...” really necessary?
You imply that there’s a good reason for not using that “uhm...”. What reason would that be?
Politeness? profunda’s entire comment strikes me as quite rude and removing the beginning would at least be a step in the right direction.
The “uhm” didn’t come across to me as rudeness; it came across as hesitance, which I interpreted as fear of coming across as rude. I admit that “sounds like you’ve never heard of X” is less polite than “have you heard of X?”, and the latter seems like a good substitute. “I invite you to look into that” doesn’t seem rude at all to me.
I interpreted it as sarcastic. YMMV. It is of course very hard to make sure stuff like this isn’t misinterpreted on the internet.
Warringal correctly interpreted what I was trying to convey and even summed up my afterthoughts on how I could’ve been more tactful, however, ‘uhm’ does assume negative connotations in online discussions—according to urban dictionary, but unbeknown to me at the time.