I don’t have a fully-fleshed-out answer, but here’s a thought experiment: change the units of all the inputs. Multiply one of them by 12, another by 0.01, another by 10^8, etc.
In a system without any sort of symmetry or natural Euclidean distance metric over the inputs, it’s like that transformation has already happened before we started. All the inputs have completely different and arbitrary units. If we take a “one unit” step, in actual practice that will be massive along some of the inputs and inconsequential along others—whereas our sqrt(1/n) strategy relies on the step being effectively similar along each dimension.
The right way to talk about this is probably condition number of local curvature, but I haven’t tried to properly formalize it.
In that thought experiment, Euclidean distance doesn’t work because different dimensions have different units. To fix that, you could move to the log scale. Or is the transformation actually more complicated than multiplication?
You can’t take logarithms of non-positive numbers. Even if you know some parameter is non-positive, you still have a free choice of scale, so the problem of comparing scales of different parameters does not go away.
You can, actually. ln(5cm)=ln(5)+ln(cm), and since we measure distances, the ln(cm) cancels out. The same way, ln(-5)=ln(5)+ln(-1). ln(-1) happens to be pi*i, since e^(pi*i) is −1.
Since we measure distances, the ln(-1) cancels out. Also, we are comparing how well each setting of the parameters performs; for that, whether the parameters are complex-valued doesn’t matter.
Erm, I think you’re getting mixed up between comparing parameters and comparing the results of applying some function to parameters. These are not the same, and it’s the latter that become incomparable.
(Also, would your algorithm derive that ln(4+3i)=ln(5) since |4+3i|=|5|? I really don’t expect the “since we measure distances” trick to work, but if it does work, it should also work on this example.)
As far as I saw, you were getting mixed up on that. We never compare the parameter-vectors for being greater than/less than each other; they aren’t ordered.
(No, if some parameter started out with such values as 4+3i or 5, the ln transformation would not equate them. But multiplying both by e^0.01 would add 0.01 to both logarithms, regardless of previous units.)
I think at this point we should just ask @johnswentworth which one of us understood him correctly. As far as I see, we measure a distance between vectors, not between individual parameters, and that’s why this thing fails.
I don’t have a fully-fleshed-out answer, but here’s a thought experiment: change the units of all the inputs. Multiply one of them by 12, another by 0.01, another by 10^8, etc.
In a system without any sort of symmetry or natural Euclidean distance metric over the inputs, it’s like that transformation has already happened before we started. All the inputs have completely different and arbitrary units. If we take a “one unit” step, in actual practice that will be massive along some of the inputs and inconsequential along others—whereas our sqrt(1/n) strategy relies on the step being effectively similar along each dimension.
The right way to talk about this is probably condition number of local curvature, but I haven’t tried to properly formalize it.
In that thought experiment, Euclidean distance doesn’t work because different dimensions have different units. To fix that, you could move to the log scale. Or is the transformation actually more complicated than multiplication?
Yeah, then we’d just use a more complicated single-variable transformation in the thought experiment.
You can’t take logarithms of non-positive numbers. Even if you know some parameter is non-positive, you still have a free choice of scale, so the problem of comparing scales of different parameters does not go away.
You can, actually. ln(5cm)=ln(5)+ln(cm), and since we measure distances, the ln(cm) cancels out. The same way, ln(-5)=ln(5)+ln(-1). ln(-1) happens to be pi*i, since e^(pi*i) is −1.
If you allow complex numbers, comparison “greater than/less than” breaks down.
Since we measure distances, the ln(-1) cancels out. Also, we are comparing how well each setting of the parameters performs; for that, whether the parameters are complex-valued doesn’t matter.
Erm, I think you’re getting mixed up between comparing parameters and comparing the results of applying some function to parameters. These are not the same, and it’s the latter that become incomparable.
(Also, would your algorithm derive that ln(4+3i)=ln(5) since |4+3i|=|5|? I really don’t expect the “since we measure distances” trick to work, but if it does work, it should also work on this example.)
As far as I saw, you were getting mixed up on that. We never compare the parameter-vectors for being greater than/less than each other; they aren’t ordered.
(No, if some parameter started out with such values as 4+3i or 5, the ln transformation would not equate them. But multiplying both by e^0.01 would add 0.01 to both logarithms, regardless of previous units.)
I think at this point we should just ask @johnswentworth which one of us understood him correctly. As far as I see, we measure a distance between vectors, not between individual parameters, and that’s why this thing fails.