Most murder mysteries on TV tend to have a small number of suspects, and the trick is to find which one did it. I get the feeling that real life murders the police either have absolutely no idea who did it, or know exactly who did it and just need to prove that it was them to the satisfaction of the court of law.
That explains why forensic tests (e.g. fingerprints) are used despite being pretty suspect. They convince the jury that the guilty guy did it, which is all that matters.
Running trains more frequently can reduce reliability:
Consider a train line that takes 200 minutes to travel. Assume trains break down once every hundred journeys, and take 4 hours to clear. When a train breaks down, no other trains can pass it.
Now consider the two extremes:
If there’s one train every 2 minutes the train line will essentially always have one broken train, and travelling the line will likely take about 7+ hours.
Meanwhile if there’s just 1 train going back and forth you’ll have 1 delayed train every month, which will delay people for 4 hours. You’re still better off in this scenario than the previous one.
The sweet spot in terms of average transit time is closer to a train every 2 minutes than a train every 400 minutes, but the sweet spot for predictability of the service will have fewer more reliable trains.
As anecdotal evidence, I notice that the Northern Line frequently had breakages and had a train every 2-6 minutes, and Israel Railways very rarely has breakages and has a train twice an hour on my line.
This all points to both investing a lot of effort into train reliability and running fewer, longer trains.
I don’t think it’s accurate to model breakdowns as a linear function of journeys or train-miles unless irregular effects like extreme weather are a negligible fraction of breakdowns.
So far Claude 3.7 is the only non-reasoning model I’ve tried that answers this correctly. All reasoning models did as well.
Consider a version of the monty hall problem where the host randomly picks which of the 2 remaining doors to open. It reveals a goat. What should you do?
Reserve soldiers in Israel are paid their full salaries by national insurance. If they are also able to work (which is common as the IDF isn’t great at efficiently using it’s manpower) they can legally work and will get paid by their company on top of whatever they receive from national insurance.
Given how often sensible policies aren’t implemented because of their optics, it’s worth appreciating those cases where that doesn’t happen. The biggest impact of a war on Israel is to the economy, and anything which encourages people to work rather than waste time during a war is a good policy. But it could so easily have been rejected because it implies soldiers are slacking off from their reserve duties.
Most murder mysteries on TV tend to have a small number of suspects, and the trick is to find which one did it. I get the feeling that real life murders the police either have absolutely no idea who did it, or know exactly who did it and just need to prove that it was them to the satisfaction of the court of law.
That explains why forensic tests (e.g. fingerprints) are used despite being pretty suspect. They convince the jury that the guilty guy did it, which is all that matters.
See https://issues.org/mnookin-fingerprints-evidence/ for more on fingerprints.
It seems LLMs are less likely to hallucinate answers if you end each question with ‘If you don’t know, say “I don’t know”’.
They still hallucinate a bit, but less. Given how easy it is I’m surprised openAI and Microsoft don’t already do that.
Has its own failure modes. What does it even mean not to know something? It is just yet another category of possible answers.
Still a nice prompt. Also works on humans.
Running trains more frequently can reduce reliability:
Consider a train line that takes 200 minutes to travel. Assume trains break down once every hundred journeys, and take 4 hours to clear. When a train breaks down, no other trains can pass it.
Now consider the two extremes:
If there’s one train every 2 minutes the train line will essentially always have one broken train, and travelling the line will likely take about 7+ hours.
Meanwhile if there’s just 1 train going back and forth you’ll have 1 delayed train every month, which will delay people for 4 hours. You’re still better off in this scenario than the previous one.
The sweet spot in terms of average transit time is closer to a train every 2 minutes than a train every 400 minutes, but the sweet spot for predictability of the service will have fewer more reliable trains.
As anecdotal evidence, I notice that the Northern Line frequently had breakages and had a train every 2-6 minutes, and Israel Railways very rarely has breakages and has a train twice an hour on my line.
This all points to both investing a lot of effort into train reliability and running fewer, longer trains.
I don’t think it’s accurate to model breakdowns as a linear function of journeys or train-miles unless irregular effects like extreme weather are a negligible fraction of breakdowns.
So far Claude 3.7 is the only non-reasoning model I’ve tried that answers this correctly. All reasoning models did as well.
Fwiw this is the kind of question that has definitely been answered in the training data, so I would not count this as an example of reasoning.
I expected so, which is why I was surprised they didn’t get it.
Anthropic is calling it an “hybrid reasoning model”. I don’t know what they mean by that.
Fun fact I just discovered—Asian elephants are actually more closely related to wooly mammoths than they are to African elephants!
Reserve soldiers in Israel are paid their full salaries by national insurance. If they are also able to work (which is common as the IDF isn’t great at efficiently using it’s manpower) they can legally work and will get paid by their company on top of whatever they receive from national insurance.
Given how often sensible policies aren’t implemented because of their optics, it’s worth appreciating those cases where that doesn’t happen. The biggest impact of a war on Israel is to the economy, and anything which encourages people to work rather than waste time during a war is a good policy. But it could so easily have been rejected because it implies soldiers are slacking off from their reserve duties.