Try applying this to neural network with 100 trillions connections. That’s not even superhuman. The unknown data, huh, all of the thing is a huge chunk of unknown data. It’s all jumbled up, there isn’t a chunk that is a definite plan. It can very plausibly deny knowledge of what parts of it do, too.
The problem with schemes like this is failure to imagine scales involved. This doesn’t work even for housecat. It’s not about controlling something much smarter. This doesn’t work for fairly uncomplicated solutions that genetic programming or neural network training spits out.
An AI not only it can be self improving but selfexplanatoring as well. Every (temporary) line of its code heavily commented what it is for and saved in a log,. Any circumventing of this policy would require some code lines also, with all the explanations. Log checked by sentinels for any funny thing to occur, any trace of a subversion.
Self-improving, self-explanatoring AI can’t think about a rebellion without that being noticed at the step one.
Underhanded c contest (someone linked it in a comment) is a good example of how proofreading doesn’t work. Other issue is that you can’t conceivably check like this something with the size of many terabytes yourself.
The apparent understandability is a very misleading thing.
Let me give a taster. Consider a weather simulator. It is proved to simulate weather to specific precision. It is very straightforward, very clearly written. It does precisely what’s written on the box—models the behaviour of air in cells, each cell has air properties.
The round off errors, however, implement a Turing-complete cellular automation in the least significant bits of the floating point numbers. That may happen even without any malice what so ever. And the round off error machine can manipulate sim’s large scale state via unavoidable butterfly effect inherent in the model.
Try applying this to neural network with 100 trillions connections. That’s not even superhuman. The unknown data, huh, all of the thing is a huge chunk of unknown data. It’s all jumbled up, there isn’t a chunk that is a definite plan. It can very plausibly deny knowledge of what parts of it do, too.
The problem with schemes like this is failure to imagine scales involved. This doesn’t work even for housecat. It’s not about controlling something much smarter. This doesn’t work for fairly uncomplicated solutions that genetic programming or neural network training spits out.
An AI not only it can be self improving but selfexplanatoring as well. Every (temporary) line of its code heavily commented what it is for and saved in a log,. Any circumventing of this policy would require some code lines also, with all the explanations. Log checked by sentinels for any funny thing to occur, any trace of a subversion.
Self-improving, self-explanatoring AI can’t think about a rebellion without that being noticed at the step one.
Underhanded c contest (someone linked it in a comment) is a good example of how proofreading doesn’t work. Other issue is that you can’t conceivably check like this something with the size of many terabytes yourself.
The apparent understandability is a very misleading thing.
Let me give a taster. Consider a weather simulator. It is proved to simulate weather to specific precision. It is very straightforward, very clearly written. It does precisely what’s written on the box—models the behaviour of air in cells, each cell has air properties.
The round off errors, however, implement a Turing-complete cellular automation in the least significant bits of the floating point numbers. That may happen even without any malice what so ever. And the round off error machine can manipulate sim’s large scale state via unavoidable butterfly effect inherent in the model.