This seems non-impossible. On the other hand, humans have categories not just because of simplicity, but also because of usefulness.
Good point, but it seems like some categories (like person) are useful even for paperclip maximizers. I really don’t see how you could completely understand media and documents from human society yet be confused by a categorization between people and non-people.
And of course, even if you manage to make a bunch of categories, many of which correspond to human categories, you still have to pick out specific categories in order to communicate or set up a goal system.
Right, you can “index” a category by providing some positive and negative examples. If I gave you some pictures of oranges and some pictures of non-oranges, you could figure out the true categorization because you consider the categorization of oranges/non-oranges to be simple. There’s probably a more robust way of doing this.
Good point, but it seems like some categories (like person) are useful even for paperclip maximizers. I really don’t see how you could completely understand media and documents from human society yet be confused by a categorization between people and non-people.
Right, you can “index” a category by providing some positive and negative examples. If I gave you some pictures of oranges and some pictures of non-oranges, you could figure out the true categorization because you consider the categorization of oranges/non-oranges to be simple. There’s probably a more robust way of doing this.