A truism in software is that code is harder to read than write
Another truism is that truisms are untrue things that people say anyway.
Examples of code that is easier to read than write include those where the code represents a deep insight that must be discovered in order to implement it. This does not apply to most examples of software that we use to automate minutia but could potentially apply to the core elements of a GAI’s search procedure.
The above said I of course agree that the thought of being able to read the AI’s mind is ridiculous.
Examples of code that is easier to read than write include those where the code represents a deep insight that must be discovered in order to implement it.
Unless you also explain that insight in a human-understandable way through comments, it doesn’t follow that such code is easier to read than write, because the reader would then have to have the same insight to figure what the hell is going on in the code.
Unless you also explain that insight in a human-understandable way through comments, it doesn’t follow that such code is easier to read than write, because the reader would then have to have the same insight to figure what the hell is going on in the code.
For example, being given code that simulates relativity before Einstein et al. discovered it would have made discovering relativity a lot easier.
Well, yeah, code fully simulating SR and written in a decent way would, but code approximately simulating collisions of ultrarelativistic particles with hand-coded optimizations… not sure.
I of course agree that the thought of being able to read the AI’s mind is ridiculous.
It’s not transparently obvious to me why this would be “ridiculous”, care to enlighten me? Building an AI at all seems ridiculous to many people, but that’s because they don’t actually think about the issue because they’ve never encountered it before. It really seems far more ridiculous to me that we shouldn’t even try to read the AIs mind, when there’s so much at stake.
AIs aren’t Gods, with time and care and lots of preparation reading their thoughts should be doable. If you disagree with that statement, please explain why? Rushing things here seems like the most awful idea possible, I really think it would be worth the resource investment.
Why are you so confident that the first version of FAI we make will be safe?
I’m not. I expect it to kill us all with high probability (which is nevertheless lower than the probability of obliteration if no FAI is actively attempted.)
It would be very hard, yes. I never tried to deny that. But I don’t think it’s hard enough to justify not trying to catch it.
Also, you’re only viewing the “output” of the AI, essentially, with that example. If you could model the cognitive processes of the authors of secretly malicious code, then it would be much more obvious that some of their (instrumental) goals didn’t correspond to the ones that you wanted them to be achieving. The only way an AI could deceive us would be to deceive itself, and I’m not confident that an AI could do that.
Since then, I’ve thought more, and gained a lot of confidence on this issue. Firstly, any decision made by the AI to deceive us about its thought processes would logically precede anything that would actually deceive us, so we don’t have to deal with the AI hiding its previous decision to be devious. Secondly, if the AI is divvying its own brain up into certain sections, some of which are filled with false beliefs and some which are filled with true ones, it seems like the AI would render itself impotent on a level proportionate to the extent that it filled itself with false beliefs. Thirdly, I don’t think a mechanism which allowed for total self deception would even be compatible with rationality.
Even if the AI can modify its code, it can’t really do anything that wasn’t entailed by its original programming.
(Ok, it could have a security vulnerability that allowed the execution of externally-injected malicious code, but that is a general issue of all computer systems with an external digital connection)
If it’s a self-modifying AI, the main problem is that it keeps changing. You might find the memory position that corresponds to, say, expected number of paperclips. When you look at it next week wondering how many paperclips there are, it’s changed to staples, and you have no good way of knowing.
If it’s not a self-modifying AI, then I suspect it would be pretty easy. If it used Solomonoff induction, it would be trivial. If not, you are likely to run into problems with stuff that only approximates Bayesian stuff. For example, if you let it develop its own hanging nodes, you’d have a hard time figuring out what they correspond to. They might not even correspond to something you could feasibly understand. If there’s a big enough structure of them, it might even change.
This is a reason it would be extremely difficult. Yet I feel the remaining existential risk should outweigh that.
It seems to me reasonably likely that our first version of FAI would go wrong. Human values are extremely difficult to understand because they’re spaghetti mush, and they often contradict each other and interact in bizarre ways. Reconciling that in a self consistent and logical fashion would be very difficult to do. Coding a program to do that would be even harder. We don’t really seem to have made any real progress on FAI thus far, so I think this level of skepticism is warranted.
I’m proposing multiple alternative tracks to safer AI, which should probably be used in conjunction with the best FAI we can manage. Some of these tracks are expensive, and difficult, but others seem simpler. The interactions between the different tracks produces a sort of safety net where the successes of one check the failures of others, as I’ve had to show throughout this conversation again and again.
I’m willing to spend much more to keep the planet safe against a much lower level of existential risk than anyone else here, I think. That’s the only reason I can think to explain why everyone keeps responding with objections that essentially boil down to “this would be difficult and expensive”. But the entire idea of AI is expensive, as well as FAI, yet the costs are accepted easily in those cases. I don’t know why we shouldn’t just add another difficult project to our long list of difficult projects to tackle, given the stakes that we’re dealing with.
Most people on this site seem only to consider AI as a project to be completed in the next fifty or so years. I see it more as the most difficult task that’s ever been attempted in all humankind. I think it will take at least 200 hundred years, even factoring in the idea that new technologies I can’t even imagine will be developed over that time. I think the most common perspective on the way we should approach AI is thus flawed, and rushed, compared to the stakes, which are millions of generations of human decendents. We’re approaching a problem that effects millions of future generations, and trying to fix it in half a generation with as cheap a budget as we think we can justify, and that seems like a really bad idea (possibly the worst idea ever) to me.
Another truism is that truisms are untrue things that people say anyway.
Examples of code that is easier to read than write include those where the code represents a deep insight that must be discovered in order to implement it. This does not apply to most examples of software that we use to automate minutia but could potentially apply to the core elements of a GAI’s search procedure.
The above said I of course agree that the thought of being able to read the AI’s mind is ridiculous.
Unless you also explain that insight in a human-understandable way through comments, it doesn’t follow that such code is easier to read than write, because the reader would then have to have the same insight to figure what the hell is going on in the code.
For example, being given code that simulates relativity before Einstein et al. discovered it would have made discovering relativity a lot easier.
Well, yeah, code fully simulating SR and written in a decent way would, but code approximately simulating collisions of ultrarelativistic particles with hand-coded optimizations… not sure.
It’s not transparently obvious to me why this would be “ridiculous”, care to enlighten me? Building an AI at all seems ridiculous to many people, but that’s because they don’t actually think about the issue because they’ve never encountered it before. It really seems far more ridiculous to me that we shouldn’t even try to read the AIs mind, when there’s so much at stake.
AIs aren’t Gods, with time and care and lots of preparation reading their thoughts should be doable. If you disagree with that statement, please explain why? Rushing things here seems like the most awful idea possible, I really think it would be worth the resource investment.
Sure, possible. Just a lot harder than creating an FAI to do it for you—especially when the AI has an incentive to obfuscate.
Why are you so confident that the first version of FAI we make will be safe? Doing both is safest and seems like it would be worth the investment.
I’m not. I expect it to kill us all with high probability (which is nevertheless lower than the probability of obliteration if no FAI is actively attempted.)
Humans reading computer code aren’t gods either. How long until an uFAI would get caught if it did stuff like this?
It would be very hard, yes. I never tried to deny that. But I don’t think it’s hard enough to justify not trying to catch it.
Also, you’re only viewing the “output” of the AI, essentially, with that example. If you could model the cognitive processes of the authors of secretly malicious code, then it would be much more obvious that some of their (instrumental) goals didn’t correspond to the ones that you wanted them to be achieving. The only way an AI could deceive us would be to deceive itself, and I’m not confident that an AI could do that.
That’s not the same as “I’m confident that an AI couldn’t do that”, is it?
At the time, it wasn’t the same.
Since then, I’ve thought more, and gained a lot of confidence on this issue. Firstly, any decision made by the AI to deceive us about its thought processes would logically precede anything that would actually deceive us, so we don’t have to deal with the AI hiding its previous decision to be devious. Secondly, if the AI is divvying its own brain up into certain sections, some of which are filled with false beliefs and some which are filled with true ones, it seems like the AI would render itself impotent on a level proportionate to the extent that it filled itself with false beliefs. Thirdly, I don’t think a mechanism which allowed for total self deception would even be compatible with rationality.
Even if the AI can modify its code, it can’t really do anything that wasn’t entailed by its original programming.
(Ok, it could have a security vulnerability that allowed the execution of externally-injected malicious code, but that is a general issue of all computer systems with an external digital connection)
The hard part is predicting everything that was entailed by its initial programing and making sure it’s all safe.
That’s right, history of engineering tells us that “provably safe” and “provably secure” systems fail in unanticipated ways.
If it’s a self-modifying AI, the main problem is that it keeps changing. You might find the memory position that corresponds to, say, expected number of paperclips. When you look at it next week wondering how many paperclips there are, it’s changed to staples, and you have no good way of knowing.
If it’s not a self-modifying AI, then I suspect it would be pretty easy. If it used Solomonoff induction, it would be trivial. If not, you are likely to run into problems with stuff that only approximates Bayesian stuff. For example, if you let it develop its own hanging nodes, you’d have a hard time figuring out what they correspond to. They might not even correspond to something you could feasibly understand. If there’s a big enough structure of them, it might even change.
This is a reason it would be extremely difficult. Yet I feel the remaining existential risk should outweigh that.
It seems to me reasonably likely that our first version of FAI would go wrong. Human values are extremely difficult to understand because they’re spaghetti mush, and they often contradict each other and interact in bizarre ways. Reconciling that in a self consistent and logical fashion would be very difficult to do. Coding a program to do that would be even harder. We don’t really seem to have made any real progress on FAI thus far, so I think this level of skepticism is warranted.
I’m proposing multiple alternative tracks to safer AI, which should probably be used in conjunction with the best FAI we can manage. Some of these tracks are expensive, and difficult, but others seem simpler. The interactions between the different tracks produces a sort of safety net where the successes of one check the failures of others, as I’ve had to show throughout this conversation again and again.
I’m willing to spend much more to keep the planet safe against a much lower level of existential risk than anyone else here, I think. That’s the only reason I can think to explain why everyone keeps responding with objections that essentially boil down to “this would be difficult and expensive”. But the entire idea of AI is expensive, as well as FAI, yet the costs are accepted easily in those cases. I don’t know why we shouldn’t just add another difficult project to our long list of difficult projects to tackle, given the stakes that we’re dealing with.
Most people on this site seem only to consider AI as a project to be completed in the next fifty or so years. I see it more as the most difficult task that’s ever been attempted in all humankind. I think it will take at least 200 hundred years, even factoring in the idea that new technologies I can’t even imagine will be developed over that time. I think the most common perspective on the way we should approach AI is thus flawed, and rushed, compared to the stakes, which are millions of generations of human decendents. We’re approaching a problem that effects millions of future generations, and trying to fix it in half a generation with as cheap a budget as we think we can justify, and that seems like a really bad idea (possibly the worst idea ever) to me.