This is my favorite so far, since you really propose a task which is not impressive, while I would be quite impressed with most of the other suggestions. A sign for this is that my probability for GPT-4 not doing this is relatively close to 50%.
In fact there is a tradeoff here between certainty and surprise: The more confident you are GPT-4 won’t solve a task, the more surprised you will be if it does. This follows from conservation of expected evidence.
This is my favorite so far, since you really propose a task which is not impressive, while I would be quite impressed with most of the other suggestions. A sign for this is that my probability for GPT-4 not doing this is relatively close to 50%.
In fact there is a tradeoff here between certainty and surprise: The more confident you are GPT-4 won’t solve a task, the more surprised you will be if it does. This follows from conservation of expected evidence.