The anti-EHT portion of this seems about as bad, I’m afraid, as the anti-LIGO portion of your other anti-LIGO post. You point and laugh at various things said by Bouman, and you’re wrong about all of them.
(Also, it turns out that all your quotations are not actually quotations. You’re paraphrased Bouman, often uncharitably and often apparently not understanding what she was actually saying. I don’t think you should do that.)
----
First, a couple from Bouman’s TEDx talk before the EHT results came out. These you just say are “absurd” and suggest that she had “terrible teachers and no advisor”, without offering any actual explanation of what’s wrong with them, so I’ll have to guess at what your complaint is.
Bouman says (TEDx talk, 11:00) “I can take random image fragments and assemble them like a puzzle to construct an image of a black hole”. Perhaps your objection is to the idea that this amounts to making an image at random, or something. If you read the actual description of the CHIRP algorithm Bouman is discussing, you will find that nothing in it is anything like that crazy; Bouman is just trying to give a sketchy idea (for a lay audience with zero scientific or mathematical expertise) of what she’s doing. I don’t much like her description of it, but understood as an attempt to convey the idea to a lay audience there’s nothing absurd about it.
Bouman says (TEDx talk, 6:40): “Some images are less likely than others and it is my job to design an algorithm that gives more weight to the images that are more likely.”. Perhaps your objection is that she’s admitting to biasing the algorithm so it only ever gives images that are consistent with (say) the predictions of General Relativity. Again, if you read the actual paper, it’s really not doing that.
So what is it doing? Like many image-reconstruction algorithms, the idea is to search for a reconstruction that minimizes a measure of “error plus weirdness”. The error term describes how badly wrong the measurements would have to be if reality matched the candidate reconstruction; the weirdness term (the fancy term is “regularizer”) describes how surprising the candidate reconstruction is in itself. Obviously this framework is consistent with an algorithm with a ton of bias built in (e.g., the “weirdness” term simply measures how different the candidate reconstruction is from a single image you want to bias it towards) or one with no bias built in at all (e.g., the “weirdness” term is always zero and you just pick an image that minimizes the error). What you want, as Bouman explicitly says at some length, is to pick your “weirdness” term so that the things it penalizes are ones that are unlikely to be real even if we are quite badly wrong about (e.g.) exactly what happens in the middle of a galaxy.
The “weirdness” term in the CHIRP algorithm is a so-called “patch prior”, which means that you get it by computing individual weirdness measures for little patches of the image, and you do that over lots of patches that cover the image, and add up the results. (This is what she’s trying to get at with the business about random image fragments.) The patches used by CHIRP are only 8x8 pixels, which means they can’t encode very much in the way of prejudices about the structure of a black hole.
If you picked a patch prior that said that the weirdness of a patch is just the standard deviation of the pixels in the patch, then (at least for some ways of filling in the details) I think this is equivalent to running a moving-average filter over your image. I point this out just as a way of emphasizing that using a patch prior for your “weirdness” term doesn’t imply any controversial sort of bias.
For CHIRP, they have a way of building a patch prior from a large database of images, which amounts to learning what tiny bits of those images tend to look like, so that the algorithm will tend to produce output whose tiny pieces look like tiny pieces of those images. You might worry that this would also tend to produce output that looks like those images on a larger scale, somehow. That’s a reasonable concern! Which is why they explicitly checked for that. (That’s what is shown by the slide from the TEDx talk that I thought might be misleading you, above.) The idea is: take several very different large databases of images, use each of them to build a different patch prior, and then run the algorithm using a variety of inputs and see how different the outputs are with differently-learned patch priors. And the answer is that the outputs look almost identical whatever set of images they use to build the prior. So whatever features of those 8x8 patches the algorithm is learning, they seem to be generic enough that they can be learned equally well from synthetic black hole images, from real astronomical images, or from photos of objects here on earth.
So, “an algorithm that gives more weight to the images that are more likely” doesn’t mean “an algorithm that looks for images matching the predictions of general relativity” or anything like that; it means “an algorithm that prefers images whose little 8x8-pixel patches resemble 8x8-pixel patches of other images, and by the way it turns out that it hardly matters what other images we use to train the algorithm”.
Oh, a bonus: you remember I said that one extreme is where the “weirdness” term is zero, so it definitely doesn’t import any problematic assumptions about the nature of the data? Well, if you look at the CalTech talk at around 38:00 you’ll see that Bouman actually shows you what you get when you do almost exactly that. (It’s not quite a weirdness term of zero; they impose two constraints, first that the amount of emission in each place is non-negative, and second a “field-of-view constraint” which I assume means that they’re only interested in radio waves coming from the region of space they were actually trying to measure. … And it still looks pretty decent and produces output with much the same form as the published image.
----
Then you turn to a talk Bouman gave at CalTech. You say that each of the statements you quote out of context “would disqualify an experiment”, so let’s take a look. With these you’ve said a bit more about what you object to, so I’m more confident that my responses will actually be responsive to your complaints than with the TEDx talk ones. These are in the same order as your list, which is almost but not quite the same as their order within the talk.
Bouman says (CalTech, 5:08) “this is equivalent to taking a picture of an orange on the moon.” I already discussed this in comments on your other post: you seem to think that “this seems impossibly hard” is the same thing as “this is actually impossibly hard”, and that’s demonstrably wrong because other things that have seemed as obviously difficult as getting an image of an orange on the moon have turned out to be possible, and the whole point of Bouman’s talk is to say that this one turned out to be possible too. Of course it could turn out that she’s wrong, but what you’re saying here is that we should just assume she’s wrong. That would have made us dismiss (for instance) radio, electronic computers, and the rocketry that would enable us to put an orange on the moon if we chose to do so.
Bouman says (CalTech, 14:40) “the challenge of dealing with data with 100% uncertainty.” Except that she doesn’t, at least not anywhere near that point in the video. She does say that some things are off by “almost 100%”, but she doesn’t e.g. use the word “uncertainty” here. Which makes it rather odd that the first thing you do is to talk about their choice of using “uncertainty” to quantify things. So, anyway, you begin by suggesting that maybe what they’re trying to do is to average data with itself several times to reduce its errors. That would indeed be stupid, but you have no reason to think that the EHT team was doing that (or anything like it) so this is just a dishonest bit of rhetoric. Then you say “They convinced themselves that they could use measurements with 100% uncertainty because the uncorrelated errors would would cancel out and they called this procedure: closure of systematic gain error” and complain that it’s only correlated errors that you can get rid of by adding things up, not uncorrelated ones. Except that so far as I can tell you just made up all the stuff about correlated versus uncorrelated errors.
So let’s talk about this for a moment, because it seems pretty clear that you haven’t understood what they’re doing and have assumed the most uncharitable possible interpretation. You say: “But from what I could tell, this procedure was nothing more than multiplying together the amplitudes and adding the phase errors together and from what I learned about basic data analysis, you can only add together correlated errors that you want to remove from your data.”
Nope, the procedure is not just multiplying the amplitudes and adding the phases, although it does involve doing something a bit like that, and the errors are correlated with one another in particular ways which is why the method works.
So, they have a whole lot of measurements, each of which comes from a particular pair of telescopes at a particular time (and at a particular signal frequency, but I think they pick one frequency and just work with that). Each telescope at each time has a certain unknown gain error and a certain unknown phase error, and the measurements they take (called “visibilities” are complex numbers with the property that if i,j are two telescopes then the effect of those errors is to multiply V(i,j) by gain(i)gain(j)exp(phase(i)−phase(j)). And now the point is that there are particular ways of combining the measurements that make the gains or the phases completely cancel out. So, e.g., if you take the product V(i,j) V(j,k) V(k,l) then the gain errors are still there but the phases cancel, so despite the phase errors in the individual measurements you know the true phase of that product. And if you take the product (V(a,b) V(c,d)) / (V(a,d) V(b,c)) then the phase errors are still there but the gains cancel, so despite the gain errors in the individual measurements you know the true magnitude of that product.
So: yes, the errors are correlated, in a very precise way, because the errors are per-telescope rather than per-visibility-measurement. That doesn’t let you compute exactly what the measurements should have been—you can’t actually get rid of the noise—but it lets you compute some other slightly less informative derived measurements from which that noise has been completely removed.
(There will be other sources of noise, of course. But those aren’t so large and their effects can be effectively reduced by taking more/longer measurements.)
Also, by the way, what she’s saying is not that overall the uncertainty is 100% (again, I feel that I have to reiterate that so far as I can tell that particular formulation of the statement is one you just made up and not anything Bouman actually said) but that for some measurements the gain error is large. (Mostly because one particular telescope was badly calibrated.)
Bouman says (CalTech, 16:00) “the CLEAN algorithm is guided a lot by the user.” Yes, and she is pointing out that this is an unfortunate feature of the (“self-calibrating”) CLEAN algorithm, and a way in which her algorithm is better. (Also, if you listen at about 35:00, you’ll find that they actually develope da way to make CLEAN not need human guidance.)
Bouman says (CalTech, 19:30) “Most people use this method to do calibration before imaging, but we set it up to do calibration during imaging by multiplying the amplitudes and adding the phases to cancel out uncorrelated noise.” This is the business with products and quotients of visibilities that I described above.
Bouman says (CalTech, 31:40) “A data set will equally predict an image with or without a hole if you lack phase information.” Except she doesn’t say that or anything like it: you just made that up. If this is meant to be a paraphrase of what she said, it’s an incredibly bad one. But, to address what might be your point here (this is an instance where you just misquote, point and laugh, without explaining what problem you claim to see, so I have to guess), note that it is not the case that they lack phase information. Their phases have errors, as discussed above, and some of those errors may be large, but they are systematically related in a way that means that a lot of information about the phases is recoverable without those errors.
Bouman says (CalTech, 39:30) “The phase data is unusable and the amplitude data is barely usable.” Once again, that isn’t actually what she says. I don’t find her explanation in this part of the talk very clear, but I think what she’s actually talking about is using the best-fit parameters they got as a way of inferring what those gain and/or phase errors are, not because they need to do that to get their image but because they can then do that multiple times using different radio sources and different image-processing pipelines and check that they get similar results each time—it’s another way to see how robust their process is, by giving it an opportunity to produce silly results—and she’s saying that the phases are too bad to be able to do that. Again, there absolutely is usable phase information even though each individual measurement’s phase is scrambled.
If you look at about 40:50 there’s a slide that makes this clearer. Their reconstruction uses “closure phases” (i.e., the quantities computed in the way I described above, where the systematic phase errors cancel) and “closure amplitudes” (i.e., the other quantities computed in the way I described above, where the systematic amplitude errors cancel) and “absolute amplitudes” (which are bad but not so bad you can’t get useful information from them) -- but not the absolute phases (which are so bad that you can’t get useful information from them without doing the cancellation thing). That slide also shows an image produced by not using the absolute amplitudes at all, just for comparison; the difference between the two gives some idea of how bad the amplitude errors are. (Mostly not that bad, as she says.)
Bouman says (CalTech, 36.20) “The machine learning algorithm tuned hundreds of thousands of ‘parameters’ to produce the image.” I’m pretty sure you misunderstood here, though her explanation is not very clear. (The possibility of misunderstanding is one reason why it is very bad that you keep presenting your paraphrases as if they are actual quotations.) What they had hundreds of thousands of was parameter values. That is, they looked at hundreds of thousands of possible combinations of values for their parameters, and for the particular bit of their process she’s describing at this point they chose a (still large but) much smaller number of parameter-combinations and looked at all the resulting reconstructions. (The goal here is to make sure that their process is robust and doesn’t give results that vary wildly as you vary the parameters that go into the reconstruction procedure.)
The anti-EHT portion of this seems about as bad, I’m afraid, as the anti-LIGO portion of your other anti-LIGO post. You point and laugh at various things said by Bouman, and you’re wrong about all of them.
(Also, it turns out that all your quotations are not actually quotations. You’re paraphrased Bouman, often uncharitably and often apparently not understanding what she was actually saying. I don’t think you should do that.)
----
First, a couple from Bouman’s TEDx talk before the EHT results came out. These you just say are “absurd” and suggest that she had “terrible teachers and no advisor”, without offering any actual explanation of what’s wrong with them, so I’ll have to guess at what your complaint is.
Bouman says (TEDx talk, 11:00) “I can take random image fragments and assemble them like a puzzle to construct an image of a black hole”. Perhaps your objection is to the idea that this amounts to making an image at random, or something. If you read the actual description of the CHIRP algorithm Bouman is discussing, you will find that nothing in it is anything like that crazy; Bouman is just trying to give a sketchy idea (for a lay audience with zero scientific or mathematical expertise) of what she’s doing. I don’t much like her description of it, but understood as an attempt to convey the idea to a lay audience there’s nothing absurd about it.
Bouman says (TEDx talk, 6:40): “Some images are less likely than others and it is my job to design an algorithm that gives more weight to the images that are more likely.”. Perhaps your objection is that she’s admitting to biasing the algorithm so it only ever gives images that are consistent with (say) the predictions of General Relativity. Again, if you read the actual paper, it’s really not doing that.
So what is it doing? Like many image-reconstruction algorithms, the idea is to search for a reconstruction that minimizes a measure of “error plus weirdness”. The error term describes how badly wrong the measurements would have to be if reality matched the candidate reconstruction; the weirdness term (the fancy term is “regularizer”) describes how surprising the candidate reconstruction is in itself. Obviously this framework is consistent with an algorithm with a ton of bias built in (e.g., the “weirdness” term simply measures how different the candidate reconstruction is from a single image you want to bias it towards) or one with no bias built in at all (e.g., the “weirdness” term is always zero and you just pick an image that minimizes the error). What you want, as Bouman explicitly says at some length, is to pick your “weirdness” term so that the things it penalizes are ones that are unlikely to be real even if we are quite badly wrong about (e.g.) exactly what happens in the middle of a galaxy.
The “weirdness” term in the CHIRP algorithm is a so-called “patch prior”, which means that you get it by computing individual weirdness measures for little patches of the image, and you do that over lots of patches that cover the image, and add up the results. (This is what she’s trying to get at with the business about random image fragments.) The patches used by CHIRP are only 8x8 pixels, which means they can’t encode very much in the way of prejudices about the structure of a black hole.
If you picked a patch prior that said that the weirdness of a patch is just the standard deviation of the pixels in the patch, then (at least for some ways of filling in the details) I think this is equivalent to running a moving-average filter over your image. I point this out just as a way of emphasizing that using a patch prior for your “weirdness” term doesn’t imply any controversial sort of bias.
For CHIRP, they have a way of building a patch prior from a large database of images, which amounts to learning what tiny bits of those images tend to look like, so that the algorithm will tend to produce output whose tiny pieces look like tiny pieces of those images. You might worry that this would also tend to produce output that looks like those images on a larger scale, somehow. That’s a reasonable concern! Which is why they explicitly checked for that. (That’s what is shown by the slide from the TEDx talk that I thought might be misleading you, above.) The idea is: take several very different large databases of images, use each of them to build a different patch prior, and then run the algorithm using a variety of inputs and see how different the outputs are with differently-learned patch priors. And the answer is that the outputs look almost identical whatever set of images they use to build the prior. So whatever features of those 8x8 patches the algorithm is learning, they seem to be generic enough that they can be learned equally well from synthetic black hole images, from real astronomical images, or from photos of objects here on earth.
So, “an algorithm that gives more weight to the images that are more likely” doesn’t mean “an algorithm that looks for images matching the predictions of general relativity” or anything like that; it means “an algorithm that prefers images whose little 8x8-pixel patches resemble 8x8-pixel patches of other images, and by the way it turns out that it hardly matters what other images we use to train the algorithm”.
Oh, a bonus: you remember I said that one extreme is where the “weirdness” term is zero, so it definitely doesn’t import any problematic assumptions about the nature of the data? Well, if you look at the CalTech talk at around 38:00 you’ll see that Bouman actually shows you what you get when you do almost exactly that. (It’s not quite a weirdness term of zero; they impose two constraints, first that the amount of emission in each place is non-negative, and second a “field-of-view constraint” which I assume means that they’re only interested in radio waves coming from the region of space they were actually trying to measure. … And it still looks pretty decent and produces output with much the same form as the published image.
----
Then you turn to a talk Bouman gave at CalTech. You say that each of the statements you quote out of context “would disqualify an experiment”, so let’s take a look. With these you’ve said a bit more about what you object to, so I’m more confident that my responses will actually be responsive to your complaints than with the TEDx talk ones. These are in the same order as your list, which is almost but not quite the same as their order within the talk.
Bouman says (CalTech, 5:08) “this is equivalent to taking a picture of an orange on the moon.” I already discussed this in comments on your other post: you seem to think that “this seems impossibly hard” is the same thing as “this is actually impossibly hard”, and that’s demonstrably wrong because other things that have seemed as obviously difficult as getting an image of an orange on the moon have turned out to be possible, and the whole point of Bouman’s talk is to say that this one turned out to be possible too. Of course it could turn out that she’s wrong, but what you’re saying here is that we should just assume she’s wrong. That would have made us dismiss (for instance) radio, electronic computers, and the rocketry that would enable us to put an orange on the moon if we chose to do so.
Bouman says (CalTech, 14:40) “the challenge of dealing with data with 100% uncertainty.” Except that she doesn’t, at least not anywhere near that point in the video. She does say that some things are off by “almost 100%”, but she doesn’t e.g. use the word “uncertainty” here. Which makes it rather odd that the first thing you do is to talk about their choice of using “uncertainty” to quantify things. So, anyway, you begin by suggesting that maybe what they’re trying to do is to average data with itself several times to reduce its errors. That would indeed be stupid, but you have no reason to think that the EHT team was doing that (or anything like it) so this is just a dishonest bit of rhetoric. Then you say “They convinced themselves that they could use measurements with 100% uncertainty because the uncorrelated errors would would cancel out and they called this procedure: closure of systematic gain error” and complain that it’s only correlated errors that you can get rid of by adding things up, not uncorrelated ones. Except that so far as I can tell you just made up all the stuff about correlated versus uncorrelated errors.
So let’s talk about this for a moment, because it seems pretty clear that you haven’t understood what they’re doing and have assumed the most uncharitable possible interpretation. You say: “But from what I could tell, this procedure was nothing more than multiplying together the amplitudes and adding the phase errors together and from what I learned about basic data analysis, you can only add together correlated errors that you want to remove from your data.”
Nope, the procedure is not just multiplying the amplitudes and adding the phases, although it does involve doing something a bit like that, and the errors are correlated with one another in particular ways which is why the method works.
So, they have a whole lot of measurements, each of which comes from a particular pair of telescopes at a particular time (and at a particular signal frequency, but I think they pick one frequency and just work with that). Each telescope at each time has a certain unknown gain error and a certain unknown phase error, and the measurements they take (called “visibilities” are complex numbers with the property that if i,j are two telescopes then the effect of those errors is to multiply V(i,j) by gain(i)gain(j)exp(phase(i)−phase(j)). And now the point is that there are particular ways of combining the measurements that make the gains or the phases completely cancel out. So, e.g., if you take the product V(i,j) V(j,k) V(k,l) then the gain errors are still there but the phases cancel, so despite the phase errors in the individual measurements you know the true phase of that product. And if you take the product (V(a,b) V(c,d)) / (V(a,d) V(b,c)) then the phase errors are still there but the gains cancel, so despite the gain errors in the individual measurements you know the true magnitude of that product.
So: yes, the errors are correlated, in a very precise way, because the errors are per-telescope rather than per-visibility-measurement. That doesn’t let you compute exactly what the measurements should have been—you can’t actually get rid of the noise—but it lets you compute some other slightly less informative derived measurements from which that noise has been completely removed.
(There will be other sources of noise, of course. But those aren’t so large and their effects can be effectively reduced by taking more/longer measurements.)
Also, by the way, what she’s saying is not that overall the uncertainty is 100% (again, I feel that I have to reiterate that so far as I can tell that particular formulation of the statement is one you just made up and not anything Bouman actually said) but that for some measurements the gain error is large. (Mostly because one particular telescope was badly calibrated.)
Bouman says (CalTech, 16:00) “the CLEAN algorithm is guided a lot by the user.” Yes, and she is pointing out that this is an unfortunate feature of the (“self-calibrating”) CLEAN algorithm, and a way in which her algorithm is better. (Also, if you listen at about 35:00, you’ll find that they actually develope da way to make CLEAN not need human guidance.)
Bouman says (CalTech, 19:30) “Most people use this method to do calibration before imaging, but we set it up to do calibration during imaging by multiplying the amplitudes and adding the phases to cancel out uncorrelated noise.” This is the business with products and quotients of visibilities that I described above.
Bouman says (CalTech, 31:40) “A data set will equally predict an image with or without a hole if you lack phase information.” Except she doesn’t say that or anything like it: you just made that up. If this is meant to be a paraphrase of what she said, it’s an incredibly bad one. But, to address what might be your point here (this is an instance where you just misquote, point and laugh, without explaining what problem you claim to see, so I have to guess), note that it is not the case that they lack phase information. Their phases have errors, as discussed above, and some of those errors may be large, but they are systematically related in a way that means that a lot of information about the phases is recoverable without those errors.
Bouman says (CalTech, 39:30) “The phase data is unusable and the amplitude data is barely usable.” Once again, that isn’t actually what she says. I don’t find her explanation in this part of the talk very clear, but I think what she’s actually talking about is using the best-fit parameters they got as a way of inferring what those gain and/or phase errors are, not because they need to do that to get their image but because they can then do that multiple times using different radio sources and different image-processing pipelines and check that they get similar results each time—it’s another way to see how robust their process is, by giving it an opportunity to produce silly results—and she’s saying that the phases are too bad to be able to do that. Again, there absolutely is usable phase information even though each individual measurement’s phase is scrambled.
If you look at about 40:50 there’s a slide that makes this clearer. Their reconstruction uses “closure phases” (i.e., the quantities computed in the way I described above, where the systematic phase errors cancel) and “closure amplitudes” (i.e., the other quantities computed in the way I described above, where the systematic amplitude errors cancel) and “absolute amplitudes” (which are bad but not so bad you can’t get useful information from them) -- but not the absolute phases (which are so bad that you can’t get useful information from them without doing the cancellation thing). That slide also shows an image produced by not using the absolute amplitudes at all, just for comparison; the difference between the two gives some idea of how bad the amplitude errors are. (Mostly not that bad, as she says.)
Bouman says (CalTech, 36.20) “The machine learning algorithm tuned hundreds of thousands of ‘parameters’ to produce the image.” I’m pretty sure you misunderstood here, though her explanation is not very clear. (The possibility of misunderstanding is one reason why it is very bad that you keep presenting your paraphrases as if they are actual quotations.) What they had hundreds of thousands of was parameter values. That is, they looked at hundreds of thousands of possible combinations of values for their parameters, and for the particular bit of their process she’s describing at this point they chose a (still large but) much smaller number of parameter-combinations and looked at all the resulting reconstructions. (The goal here is to make sure that their process is robust and doesn’t give results that vary wildly as you vary the parameters that go into the reconstruction procedure.)