Another test could be to see if its performance in its select field suddenly jumps up in effectiveness.
To give a real world example, when Google (which is the closest thing we have to an AI right now, I think) gained the ability to suggest terms based on what one has already typed, it became much easier to search for things.
Or when it will eventually gain the ability to parse human language, or so on.
Maybe a form of unit testing could be useful? Create a simple and not so simple test for a range of domains and get all AI’s to run them periodically.
By default the narrow AI’s would fail even the simple tests in other domains, but we would be able to monitor if / as it learns other domains.
Another test could be to see if its performance in its select field suddenly jumps up in effectiveness. To give a real world example, when Google (which is the closest thing we have to an AI right now, I think) gained the ability to suggest terms based on what one has already typed, it became much easier to search for things. Or when it will eventually gain the ability to parse human language, or so on.