Since Raemon’s Thinking Physics exercise I’ve been toying with writing physics puzzles along those lines. (For fun, not because I’m aiming to write better exercise candidates.) If you assume an undergrad-level background and expand to modern physics and engineering there are interesting places you can go. I think a lot about noise and measurement, so that’s where my mind has been. Maybe some baseline questions could look like the below? Curious to hear anyone’s thoughts.
Pushing a thermal oscillator
You’re standing at one end of a grocery aisle. In your cart, you have a damped oscillator in a thermal bath, initially in equilibrium.
You push the cart, making sure it moves smoothly according to a prescribed velocity profile, and you bring it to a stop at the other end of the aisle. You then wait for the oscillator to reach equilibrium with its bath again.
The final temperature is
Cooler than before
Exactly the same
Hotter than before
Not enough information. More than one of the above may be true because of one or more of the following:
You can only answer in expectation.
It depends on the properties of the oscillator.
It depends on the cart’s trajectory.
Thermal velocity and camera speed
You’re observing a particle undergo thermal motion in a fluid. It’s continuously bombarded by fluid molecules that effectively subject the particle to a white noise force and velocity damping. You estimate that it tends to lose its momentum and change direction on a timescale of 1 millisecond.
You want to get some statistics on the particle’s velocity. You know the average velocity is zero, but there will be some variance that depends on temperature. You recall that in equilibrium that the particle should have velocity v with probability proportional to the Boltzmann factor e−m|v|2/2kBT, giving a root mean square thermal velocity vth.
You calculate velocity by taking pairs of pictures at different times, then dividing the change in position by the time step. Your camera has an effectively instantaneous shutter speed.
In experiment 1, you use a time step of 0.1 milliseconds to measure velocity. In experiment 2, you use a time step of 10 milliseconds.
You collect distributions of measured velocities for each experiment, giving root mean square velocities v1 and v2, respectively. What do you find?
You’re using an oscilloscope to measure the thermal noise voltage across a resistance R. Internally, the oscilloscope has a parallel input resistance Rin and capacitance C, where the voltage on the capacitor is used to deflect electrons in a cathode ray tube to continuously draw a line on the screen proportional to the voltage over time.
The resistor and oscilloscope are at the same temperature. Is it possible to determine R from the amplitude of the fluctuating voltage shown on the oscilloscope?
You’ve attached one end of a conductive molecule to an electrode. If the molecule bends by a certain distance d at the other end, it touches another electrode, closing an electrical circuit. (You also have a third electrode where you can apply a voltage to actuate the switch.)
You’re worried about the thermal bending motion of the molecule accidentally closing the circuit, causing an error. You calculate, using the Boltzmann distribution over the elastic potential energy in the molecule, that the probability of a thermal deformation of at least d is 10−9 (a single-tailed six-sigma deformation in a normal distribution where expected potential energy is kBT/2), but you don’t know how to use this information. You know that the bending motion has a natural frequency of 100 GHz with an energy decay timescale of 0.1 nanosecond, and that it behaves as an ideal harmonic oscillator in a thermal bath.
You’re considering integrating this switch into a 1 GHz processor. What is the probability p of an error in a 1 nanosecond clock cycle?
p<10−9 — the Boltzmann distribution is a long-time limit, so you have sub-Boltzmann probability in finite time.
p=10−9 — the probability is determined by the Boltzmann distribution.
p≈10−8 — the 0.1 nanosecond damping timescale means, roughly, it gets 10 draws from the Boltzmann distribution.
p≈10−7 — the 100 GHz natural frequency means it gets 100 tries to cause an error.
p>10−7 — the Boltzmann distribution is over long-time averages, so you expect larger deviations on short timescales that otherwise get averaged away.
I’m going to guess 3. Reasoning: I’m sure right away that 1, 2 are wrong. Reason: If you leave the thing sitting for long enough then obviously it’s going to eventually fail. So 2 is wrong and 1 is even wronger. I’m also pretty sure that 5 is wrong. Something like 5 is true for the velocity (or rather, the estimated velocity based on measuring displacement after a given time Δt) of a particle undergoing Brownian motion, but I don’t think that’s a good model for this situation. For one thing, on a small time-scale, Brownian velocities don’t actually become infinite, instead we see that they’re actually caused by individual molecules bumping into the object, and all energies remain finite.
3 and 4 are both promising because they actually make use of the time-scales given in the problem. 4 seems wrong because if we imagined that the relaxation timescale was instead 1 second, then after looking at the position and velocity once the system oscillates in that same amplitude for a very long time, and doesn’t get any more tries to beat its previous score. Answer is 3 by elimination, and it also seems intuitive that the relaxation timescale is the one that counts how many tries you get. (up to some constant factors)
This reasoning is basically right, but the answer ends up being 5 for a relatively mundane reason.
If the time-averaged potential energy is k_B T / 2, so is the kinetic energy. Because damping is low, at some point in a cycle, you’ll deterministically have the sum of the two in potential energy and nothing in kinetic energy. So you do have some variation getting averaged away.
More generally, while the relaxation timescale is the relevant timescale here, I also wanted to introduce an idea about very fast measurement events like the closing of the electrical circuit. If you have observables correlated on short timescales, then measurements faster than that won’t necessarily follow expectations from naive equilibrium thinking.
Good point, I had briefly thought of this when answering, and it was the reason I mentioned constant factors in my comment. However, on closer inspection:
The “constant” factor is actually only nearly constant.
It turns out to be bigger than 10.
Explanation:
10^{-9} is about 6 sigma. To generalize, let’s say we have a sigma, where a is some decently large number so that the position-only Boltzmann distribution gives an extremely tiny probability of error.
So we have the following probability of error for the position-only Boltzmann distribution:
p1=1√2π∫∞ae−x2/2
Our toy model for this scenario is that rather than just sampling position, we jointly sample position and momentum, and then compute the amplitude. Equivalently, we sample position twice, and add it in quadrature to get amplitude. This gives a probability of:
p2=1(√2π)2∫∞ae−r2/2∫2π0rdϕdr=∫∞are−r2/2dr=e−a2/2
Since we took a to be decently large, we can approximate the integrand in our expression for p1 with an exponential distribution (basically, we Taylor expand the exponent):
p1≈=1√2πe−a2/2∫∞ae−a(x−a)dx=1√2πe−a2/21a
Result: p2 is larger than p1 by a factor of a√2π. While the √2π is constant, a grows (albeit very slowly) as the probability of error shrinks. Hence “nearly constant”. For this problem, where a=6, we get a factor of about 15, so probability 1.5×10−8 per try.
Why is this worth thinking about? If we just sample at a single point in time, and consider only the position at that time, then we get the original 10−9 per try. This is wrong because momentum gets to oscillate and turn into displacement, as you’ve already pointed out. On the other hand, if we remember the equipartition theorem, then we might reason that since the variance of amplitude is twice the variance of position, the probability of error is massively amplified. We don’t have to naturally get a 6 sigma displacement. We only need to get a roughly a 6/√2 sigma displacement and wait for it to rotate into place. This is wrong because we’re dealing with rare events here, and for the above scenario to work out, we actually need to simultaneously get 6/√2 displacement and 6/√2 momentum, both of which are rare and independent.
So it’s quite interesting that the actual answer is in between, and comes, roughly speaking, from rotating the tail of the distribution around by a full circle of circumference 2πa. :::
Anyway, very cool and interesting question! Thanks for sharing it.
vth is the RMS instantaneous velocity. Taking pictures at intervals gives an averaged velocity, which is slower because the particle wastes some time going in different directions that cancel out.v1 is going to be near the instantaneous velocity, but still a little slower, since the velocity is still going to decay slightly, even over 1/10th of the decay time.v2 is going to be significantly slower. If we make the time step even slower than 10 ms, we expect the RMS velocity to go roughly as the inverse square root of the timestep. Anyway, the answer should be 3:
I sometimes wonder how much we could learn from toy models of superhuman performance, in terms of what to expect from AI progress. I suspect the answer is “not much”, but I figured I’d toss some thoughts out here, as much to discharge any need I feel to think about them further as to see if anyone has any good pointers here.
Like—when is performance about making super-smart moves, and when is it about consistently not blundering for as long as possible? My impression is that in Chess, something like “average centipawn loss” (according to some analysis engine) doesn’t explain outcomes as well as “worst move per game”. (I don’t know the keywords to search for, but I relatedly found this neat paper which finds a power law for the difference between best and second-best moves in a position.) What does Go look like, in comparison?
How deep are games? What’s the longest chain of players such that each consistently beats the next? How much comes from the game itself being “deep” versus the game being made up of many repeated small contests? (E.g., the longest chain for best-of-9 Chess is going to be about 3 times longer than that for Chess, if the assumptions behind the rating system hold. Or, another example, is Chess better thought of as Best-Of-30 ChessMove with Elo-like performance and rating per move, or perhaps as Best-Of-30 Don’tBlunder with binary performance per move?)
Where do ceilings come from? Are there diminishing returns on driving down blunder probabilities given fixed deep uncertainties or external randomness? Is there such a thing as “perfect play”, and when can we tell if we’re approaching it? (Like—maybe there’s some theoretically-motivated power law that matches a rating distribution until some cutoff at the extreme tail?)
What do real-world “games” and “rating distributions” look like in this light?
When I think about what I’d expect to see in experiments like that, I get curious about a sort of “baseline” set of experiments without deception or even verbal explanations. When can I distinguish the better of two chess engines more efficiently than playing them against each other and looking at the win/loss record? How much does it help to see the engines’ analyses over just observing moves?
How is this related? Well, how deep is Chess? Ratings range between, say, 800 and 3500, with 300 points being enough to distinguish players (human or computer) reasonably well. So we might say there are about 10 “levels” in practice, or that it has a rating depth of 10.
If Chess were Best-Of-30 ChessMove as described above, then ChessMove would have a rating depth a bit below 2 (just dividing by √30). In other words, we’d expect it to be very hard to ever distinguish any pair of engines off a single recommended move—and difficult with any number of isolated observations, given our own error-prone human evaluation. If it’s closer to Best-Of-30 Don’tBlunder, it’s a little more complicated—usually you can’t tell the difference because there basically is none, but on rare pivotal moves it will be nearly as easy to tell as when looking at a whole game.
The solo version of the experiment looks like this:
I find a chess engine with a rating around mine, and use it to analyze positions in games against other engines. Play a bunch of games to get a baseline “hybrid” rating for myself with that advisor.
I do the same thing with a series of stronger chess engines, ideally each within a “level” of the last.
I do the same thing with access to the output of two engines, and I’m blinded to which is which. (The blinding might require some care around, for example, timing, as well as openings.) In sub-experiment A, I only get top moves and their scores. In sub-experiment B, I can look at lines from the current position up to some depth. In sub-experiment C, I can use these engines however I want. For example, I can basically play them against each other if I want to run down my own clock doing it. (Because pairs might be within a level of one another, I can’t be sure which is stronger from a single win/loss outcome. I’d hope to find more efficient ways of distinguishing them.)
I repeat #3 with different random pairs of advisors.
What I’d expect is that my ratings with pairs of advisors should be somewhere between my rating with the bad advisor and my rating with the good advisor. If I can successfully distinguish them, it’s close to the latter. If I’m just guessing, it’s close to the former (in the Don’tBlunder world) or to the midpoint (in the ChessMove world). I should have an easier time in sub-experiments B and C. Having a worse engine in the mix weighs me down relatively more (a) the closer the engines are to each other, and (b) the stronger both engines are compared to me.
The main question I’d hope might be answerable this way would be something like, “How do (a) and (b) trade off?” Which is easier to distinguish—1800 and 2100, or, say, 2700 and 3300? Will there be a ceiling beyond which I’m always just guessing? Might I tend to side with worse advisors because, being closer to my level, they agree with me?
It seems like we’d want some handle on these questions before asking how much worse outright deception can be.
(There’s some trouble here because higher-ranked players are more likely to draw given a fixed rating difference. This itself is relatively Don’tBlunder-like, and it makes me wonder if it’s possible to project how far our best engines are likely to be from perfect play. But it makes it harder to disentangle inability to draw distinctions in play above my level from “natural” indistinguishability. There are also more general issues in doing these experiments with computers—for example, weak engines tend to be weak in ways humans wouldn’t be, and it’s hard to calibrate ratings for superhuman play.)
(It might also be interesting to automate myself out of this experiment by choosing between recommendations using some simple scripted logic and evaluation by a relatively weak engine.)
Along the lines of what I wrote in the parent, even though I think there’s potentially a related and fairly deep “worldview”-type crux (crux generator?) nearby when it comes to AI risk—are we in a ChessMove world or a Don’tBlunder world?—[sorry, these are terrible names, because actual Chess moves are more like Don’tBlunder, which is itself horribly ugly]—I’m not particularly motivated to do this experiment, because I don’t think any possible answer on this level of metaphor would be informative enough to shift anyone on more important questions.
Since Raemon’s Thinking Physics exercise I’ve been toying with writing physics puzzles along those lines. (For fun, not because I’m aiming to write better exercise candidates.) If you assume an undergrad-level background and expand to modern physics and engineering there are interesting places you can go. I think a lot about noise and measurement, so that’s where my mind has been. Maybe some baseline questions could look like the below? Curious to hear anyone’s thoughts.
Pushing a thermal oscillator
You’re standing at one end of a grocery aisle. In your cart, you have a damped oscillator in a thermal bath, initially in equilibrium.
You push the cart, making sure it moves smoothly according to a prescribed velocity profile, and you bring it to a stop at the other end of the aisle. You then wait for the oscillator to reach equilibrium with its bath again.
The final temperature is
Cooler than before
Exactly the same
Hotter than before
Not enough information. More than one of the above may be true because of one or more of the following:
You can only answer in expectation.
It depends on the properties of the oscillator.
It depends on the cart’s trajectory.
Thermal velocity and camera speed
You’re observing a particle undergo thermal motion in a fluid. It’s continuously bombarded by fluid molecules that effectively subject the particle to a white noise force and velocity damping. You estimate that it tends to lose its momentum and change direction on a timescale of 1 millisecond.
You want to get some statistics on the particle’s velocity. You know the average velocity is zero, but there will be some variance that depends on temperature. You recall that in equilibrium that the particle should have velocity v with probability proportional to the Boltzmann factor e−m|v|2/2kBT, giving a root mean square thermal velocity vth.
You calculate velocity by taking pairs of pictures at different times, then dividing the change in position by the time step. Your camera has an effectively instantaneous shutter speed.
In experiment 1, you use a time step of 0.1 milliseconds to measure velocity. In experiment 2, you use a time step of 10 milliseconds.
You collect distributions of measured velocities for each experiment, giving root mean square velocities v1 and v2, respectively. What do you find?
v1=v2=vth
v1<v2<vth
v2<v1<vth
v2<vth<v1
vth<v2<v1
Measuring noise and measurement noise
You’re using an oscilloscope to measure the thermal noise voltage across a resistance R. Internally, the oscilloscope has a parallel input resistance Rin and capacitance C, where the voltage on the capacitor is used to deflect electrons in a cathode ray tube to continuously draw a line on the screen proportional to the voltage over time.
The resistor and oscilloscope are at the same temperature. Is it possible to determine R from the amplitude of the fluctuating voltage shown on the oscilloscope?
Yes, if Rin≪R
Yes, if Rin∼R
Yes, if Rin≫R
No
Molecular electromechanical switch
You’ve attached one end of a conductive molecule to an electrode. If the molecule bends by a certain distance d at the other end, it touches another electrode, closing an electrical circuit. (You also have a third electrode where you can apply a voltage to actuate the switch.)
You’re worried about the thermal bending motion of the molecule accidentally closing the circuit, causing an error. You calculate, using the Boltzmann distribution over the elastic potential energy in the molecule, that the probability of a thermal deformation of at least d is 10−9 (a single-tailed six-sigma deformation in a normal distribution where expected potential energy is kBT/2), but you don’t know how to use this information. You know that the bending motion has a natural frequency of 100 GHz with an energy decay timescale of 0.1 nanosecond, and that it behaves as an ideal harmonic oscillator in a thermal bath.
You’re considering integrating this switch into a 1 GHz processor. What is the probability p of an error in a 1 nanosecond clock cycle?
p<10−9 — the Boltzmann distribution is a long-time limit, so you have sub-Boltzmann probability in finite time.
p=10−9 — the probability is determined by the Boltzmann distribution.
p≈10−8 — the 0.1 nanosecond damping timescale means, roughly, it gets 10 draws from the Boltzmann distribution.
p≈10−7 — the 100 GHz natural frequency means it gets 100 tries to cause an error.
p>10−7 — the Boltzmann distribution is over long-time averages, so you expect larger deviations on short timescales that otherwise get averaged away.
EDIT: added spoiler formatting
I’m going to guess 3. Reasoning: I’m sure right away that 1, 2 are wrong. Reason: If you leave the thing sitting for long enough then obviously it’s going to eventually fail. So 2 is wrong and 1 is even wronger. I’m also pretty sure that 5 is wrong. Something like 5 is true for the velocity (or rather, the estimated velocity based on measuring displacement after a given time Δt) of a particle undergoing Brownian motion, but I don’t think that’s a good model for this situation. For one thing, on a small time-scale, Brownian velocities don’t actually become infinite, instead we see that they’re actually caused by individual molecules bumping into the object, and all energies remain finite.
3 and 4 are both promising because they actually make use of the time-scales given in the problem. 4 seems wrong because if we imagined that the relaxation timescale was instead 1 second, then after looking at the position and velocity once the system oscillates in that same amplitude for a very long time, and doesn’t get any more tries to beat its previous score. Answer is 3 by elimination, and it also seems intuitive that the relaxation timescale is the one that counts how many tries you get. (up to some constant factors)
This reasoning is basically right, but the answer ends up being 5 for a relatively mundane reason.
If the time-averaged potential energy is k_B T / 2, so is the kinetic energy. Because damping is low, at some point in a cycle, you’ll deterministically have the sum of the two in potential energy and nothing in kinetic energy. So you do have some variation getting averaged away.
More generally, while the relaxation timescale is the relevant timescale here, I also wanted to introduce an idea about very fast measurement events like the closing of the electrical circuit. If you have observables correlated on short timescales, then measurements faster than that won’t necessarily follow expectations from naive equilibrium thinking.
Good point, I had briefly thought of this when answering, and it was the reason I mentioned constant factors in my comment. However, on closer inspection:
The “constant” factor is actually only nearly constant.
It turns out to be bigger than 10.
Explanation:
10^{-9} is about 6 sigma. To generalize, let’s say we have a sigma, where a is some decently large number so that the position-only Boltzmann distribution gives an extremely tiny probability of error.
So we have the following probability of error for the position-only Boltzmann distribution:
p1=1√2π∫∞ae−x2/2
Our toy model for this scenario is that rather than just sampling position, we jointly sample position and momentum, and then compute the amplitude. Equivalently, we sample position twice, and add it in quadrature to get amplitude. This gives a probability of:
p2=1(√2π)2∫∞ae−r2/2∫2π0rdϕdr=∫∞are−r2/2dr=e−a2/2
Since we took a to be decently large, we can approximate the integrand in our expression for p1 with an exponential distribution (basically, we Taylor expand the exponent):
p1≈=1√2πe−a2/2∫∞ae−a(x−a)dx=1√2πe−a2/21a
Result: p2 is larger than p1 by a factor of a√2π. While the √2π is constant, a grows (albeit very slowly) as the probability of error shrinks. Hence “nearly constant”. For this problem, where a=6, we get a factor of about 15, so probability 1.5×10−8 per try.
Why is this worth thinking about? If we just sample at a single point in time, and consider only the position at that time, then we get the original 10−9 per try. This is wrong because momentum gets to oscillate and turn into displacement, as you’ve already pointed out. On the other hand, if we remember the equipartition theorem, then we might reason that since the variance of amplitude is twice the variance of position, the probability of error is massively amplified. We don’t have to naturally get a 6 sigma displacement. We only need to get a roughly a 6/√2 sigma displacement and wait for it to rotate into place. This is wrong because we’re dealing with rare events here, and for the above scenario to work out, we actually need to simultaneously get 6/√2 displacement and 6/√2 momentum, both of which are rare and independent.
So it’s quite interesting that the actual answer is in between, and comes, roughly speaking, from rotating the tail of the distribution around by a full circle of circumference 2πa. :::
Anyway, very cool and interesting question! Thanks for sharing it.
EDIT: added spoiler formatting
vth is the RMS instantaneous velocity. Taking pictures at intervals gives an averaged velocity, which is slower because the particle wastes some time going in different directions that cancel out.v1 is going to be near the instantaneous velocity, but still a little slower, since the velocity is still going to decay slightly, even over 1/10th of the decay time.v2 is going to be significantly slower. If we make the time step even slower than 10 ms, we expect the RMS velocity to go roughly as the inverse square root of the timestep. Anyway, the answer should be 3:
v2<v1<vth
I sometimes wonder how much we could learn from toy models of superhuman performance, in terms of what to expect from AI progress. I suspect the answer is “not much”, but I figured I’d toss some thoughts out here, as much to discharge any need I feel to think about them further as to see if anyone has any good pointers here.
Like—when is performance about making super-smart moves, and when is it about consistently not blundering for as long as possible? My impression is that in Chess, something like “average centipawn loss” (according to some analysis engine) doesn’t explain outcomes as well as “worst move per game”. (I don’t know the keywords to search for, but I relatedly found this neat paper which finds a power law for the difference between best and second-best moves in a position.) What does Go look like, in comparison?
How deep are games? What’s the longest chain of players such that each consistently beats the next? How much comes from the game itself being “deep” versus the game being made up of many repeated small contests? (E.g., the longest chain for best-of-9 Chess is going to be about 3 times longer than that for Chess, if the assumptions behind the rating system hold. Or, another example, is Chess better thought of as Best-Of-30 ChessMove with Elo-like performance and rating per move, or perhaps as Best-Of-30 Don’tBlunder with binary performance per move?)
Where do ceilings come from? Are there diminishing returns on driving down blunder probabilities given fixed deep uncertainties or external randomness? Is there such a thing as “perfect play”, and when can we tell if we’re approaching it? (Like—maybe there’s some theoretically-motivated power law that matches a rating distribution until some cutoff at the extreme tail?)
What do real-world “games” and “rating distributions” look like in this light?
Related would be some refactoring of Deception Chess.
When I think about what I’d expect to see in experiments like that, I get curious about a sort of “baseline” set of experiments without deception or even verbal explanations. When can I distinguish the better of two chess engines more efficiently than playing them against each other and looking at the win/loss record? How much does it help to see the engines’ analyses over just observing moves?
How is this related? Well, how deep is Chess? Ratings range between, say, 800 and 3500, with 300 points being enough to distinguish players (human or computer) reasonably well. So we might say there are about 10 “levels” in practice, or that it has a rating depth of 10.
If Chess were Best-Of-30 ChessMove as described above, then ChessMove would have a rating depth a bit below 2 (just dividing by √30). In other words, we’d expect it to be very hard to ever distinguish any pair of engines off a single recommended move—and difficult with any number of isolated observations, given our own error-prone human evaluation. If it’s closer to Best-Of-30 Don’tBlunder, it’s a little more complicated—usually you can’t tell the difference because there basically is none, but on rare pivotal moves it will be nearly as easy to tell as when looking at a whole game.
The solo version of the experiment looks like this:
I find a chess engine with a rating around mine, and use it to analyze positions in games against other engines. Play a bunch of games to get a baseline “hybrid” rating for myself with that advisor.
I do the same thing with a series of stronger chess engines, ideally each within a “level” of the last.
I do the same thing with access to the output of two engines, and I’m blinded to which is which. (The blinding might require some care around, for example, timing, as well as openings.) In sub-experiment A, I only get top moves and their scores. In sub-experiment B, I can look at lines from the current position up to some depth. In sub-experiment C, I can use these engines however I want. For example, I can basically play them against each other if I want to run down my own clock doing it. (Because pairs might be within a level of one another, I can’t be sure which is stronger from a single win/loss outcome. I’d hope to find more efficient ways of distinguishing them.)
I repeat #3 with different random pairs of advisors.
What I’d expect is that my ratings with pairs of advisors should be somewhere between my rating with the bad advisor and my rating with the good advisor. If I can successfully distinguish them, it’s close to the latter. If I’m just guessing, it’s close to the former (in the Don’tBlunder world) or to the midpoint (in the ChessMove world). I should have an easier time in sub-experiments B and C. Having a worse engine in the mix weighs me down relatively more (a) the closer the engines are to each other, and (b) the stronger both engines are compared to me.
The main question I’d hope might be answerable this way would be something like, “How do (a) and (b) trade off?” Which is easier to distinguish—1800 and 2100, or, say, 2700 and 3300? Will there be a ceiling beyond which I’m always just guessing? Might I tend to side with worse advisors because, being closer to my level, they agree with me?
It seems like we’d want some handle on these questions before asking how much worse outright deception can be.
(There’s some trouble here because higher-ranked players are more likely to draw given a fixed rating difference. This itself is relatively Don’tBlunder-like, and it makes me wonder if it’s possible to project how far our best engines are likely to be from perfect play. But it makes it harder to disentangle inability to draw distinctions in play above my level from “natural” indistinguishability. There are also more general issues in doing these experiments with computers—for example, weak engines tend to be weak in ways humans wouldn’t be, and it’s hard to calibrate ratings for superhuman play.)
(It might also be interesting to automate myself out of this experiment by choosing between recommendations using some simple scripted logic and evaluation by a relatively weak engine.)
Along the lines of what I wrote in the parent, even though I think there’s potentially a related and fairly deep “worldview”-type crux (crux generator?) nearby when it comes to AI risk—are we in a ChessMove world or a Don’tBlunder world?—[sorry, these are terrible names, because actual Chess moves are more like Don’tBlunder, which is itself horribly ugly]—I’m not particularly motivated to do this experiment, because I don’t think any possible answer on this level of metaphor would be informative enough to shift anyone on more important questions.