Using nat.dev, I find that 002, 003, and Turbo all get the same result, wrong on the first and right on the second. This is an example of Turbo being Inferior to Chat. I also tried Cohere, which got both. I also tried Claude. Full v1.2 got both wrong. Instant 1.0, which should be inferior, got the second correct. It also produced a wordy answer to the first which I give half credit because it said that it was difficult but possible for the slow policeman to catch the fast thief. I only tried each twice, with and without “Let us think,” which made no difference to the first. I almost didn’t bother adding it to the second since they did so well without it. Adding it made 002 and Claude-instant fail, but Claude1.2 succeed. (I also tried llama and alpaca, but they timed out.)
how does −003 compare?
Using nat.dev, I find that 002, 003, and Turbo all get the same result, wrong on the first and right on the second. This is an example of Turbo being Inferior to Chat. I also tried Cohere, which got both. I also tried Claude. Full v1.2 got both wrong. Instant 1.0, which should be inferior, got the second correct. It also produced a wordy answer to the first which I give half credit because it said that it was difficult but possible for the slow policeman to catch the fast thief. I only tried each twice, with and without “Let us think,” which made no difference to the first. I almost didn’t bother adding it to the second since they did so well without it. Adding it made 002 and Claude-instant fail, but Claude1.2 succeed. (I also tried llama and alpaca, but they timed out.)