To me it seems not unreasonable to think that some ideas from tropical geometry might be relevant ML, for the simple reason that the functions coming up in ML with ReLU activation are PL, and people in tropical geometry have thought seriously about PL functions. Of course this does not guarantee that there is anything useful to be said!
One possible example that comes to mind in the context of your post here is the concept of polyhedral currents. As I understand it, here the notion of “density of polygons’ is used as a kind of proxy for the derivative of a PL function? But I think the theory of polyhedral currents gives a much more general theory of differentiation of PL functions. Very naively, rather than just recording the locus where the function fails to be linear, one also records how much the derivative changes when crossing the walls. I learnt about this from a paper of Mihatsch:
https://arxiv.org/pdf/2107.12067
but I’m certain there are older references.
I’m really a log person, I don’t know the tropical world very well; sorry if what I write does not make sense!
Thanks for the reference, and thanks for providing an informed point of view here. I would love to have more of a debate here, and would quite like being wrong as I like tropical geometry.
First, about your concrete question:
As I understand it, here the notion of “density of polygons’ is used as a kind of proxy for the derivative of a PL function?
Density is a proxy for the second derivative: indeed, the closer a function is to linear, the easier it is to approximate it by a linear function. I think a similar idea occurs in 3D graphics, in mesh optimization, where you can improve performance by reducing the number of cells in flatter domains (I don’t understand this field, but this is done in this paper according to some energy curvature-related energy functional). The question of “derivative change when crossing walls” seems similar. In general, glancing at the paper you sent, it looks like polyhedral currents are a locally polynomial PL generalization of currents of ordinary functions (and it seems that there is some interesting connection made to intersection theory/analogues of Chow theory, though I don’t have nearly enough background to read this part carefully). Since the purpose of PL functions in ML is to approximate some (approximately smooth, but fractally messy and stochastic) “true classification”, I don’t see why one wouldn’t just use ordinary currents here (currents on a PL manifold can be made sense of after smoothing, or in a distribution-valued sense, etc.).
In general, I think the central crux between us is whether or not this is true:
tropical geometry might be relevant ML, for the simple reason that the functions coming up in ML with ReLU activation are PL
I’m not sure I agree with this argument. The use of PL functions is by no means central to ML theory, and is an incidental aspect of early algorithms. The most efficient activation functions for most problems tend to not be ReLUs, though the question of activation functions is often somewhat moot due to the universal approximation theorem (and the fact that, in practice, at least for shallow NNs anything implementable by one reasonable activation tends to be easily implementable, with similar macroscopic properties, by any other). So the reason that PL functions come up is that they’re “good enough to approximate any function” (and also “asymptotic linearity” seems genuinely useful to avoid some explosion behaviors). But by the same token, you might expect people who think deeply about polynomial functions to be good at doing analysis because of the Stone-Weierstrass theorem.
More concretely, I think there are two core “type mismatches” between tropical geometry and the kinds of questions that appear in ML:
Algebraic geometry in general (including tropical geometry) isn’t good at dealing with deep compositions of functions, and especially approximate compositions.
(More specific to TG): the polytopes that appear in neural nets are as I explained inherently random (the typical interpretation we have of even combinatorial algorithms like modular addition is that the PL functions produce some random sharding of some polynomial function). This is a very strange thing to consider from the point of view of a tropical geometer: like as an algebraic geometer, it’s hard for me to imagine a case where “this polynomial has degree approximately 5… it might be 4 or 6, but the difference between them is small”. I simply can’t think of any behavior that is at all meaningful from an AG-like perspective where the questions of fan combinatorics and degrees of polynomials are replaced by questions of approximate equality.
I can see myself changing my view if I see some nontrivial concrete prediction or idea that tropical geometry can provide in this context. I think a “relaxed” form of this question (where I genuinely haven’t looked at the literature) is whether tropical geometry has ever been useful (either in proving something or at least in reconceptualizing something in an interesting way) in linear programming. I think if I see a convincing affirmative answer to this relaxed question, I would be a little more sympathetic here. However, the type signature here really does seem off to me.
Like David Holmes I am not an expert in tropical geometry so I can’t give the best case for why tropical geometry may be useful. Only a real expert putting in serious effort can make that case.
Let me nevertheless respond to some of your claims.
PL functions are quite natural for many reasons. They are simple. They naturally appear as minimizers of various optimization procedures, see e.g. the discussion in section 5 here.
Polynomials don’t satisfy the padding argument and architectures based on them therefore will typically fail to have the correct simplicitity bias.
As for
1.” Algebraic geometry isn’t good at dealing with deep composition of functions, and especially approximate composition.” I agree a typical course in algebraic geometry will not much consider composition of functions but that doesn’t seem to me a strong argument for the contention that the tools of algebraic geometry are not relevant here. Certainly, more sophisticated methods beyond classical scheme theory may be important [likely involving something like PROPs] but ultimately I’m not aware of any fundamental obstruction here.
2. >> I don’t agree with the contention that algebraic geometry is somehow not suited for questions of approximation. e.g. the Weil conjectures is really an approximate/ average statement about points of curves over finite fields. The same objection you make could have been made about singularity theory before we knew about SLT.
I agree with you that a probabilistic perspective on ReLUs/ piece-wise linear functions is probably important. It doesn’t seem unreasonable to me in the slightest to consider some sort of tempered posterior on the space of piecewise linear functions. I don’t think this invalidates the potential of polytope-flavored thinking.
Hmm, so I’m very wary of defending tropical geometry when I know so little about it; if anyone more informed is reading please jump in! But until then, I’ll have a go.
tropical geometry might be relevant ML, for the simple reason that the functions coming up in ML with ReLU activation are PL
I’m not sure I agree with this argument.
Hmm, even for a very small value of `might’? I’m not saying that someone who wants to contribute to ML needs to seriously consider learning some tropical geometry, just that if one already knows tropical geometry it’s not a crazy idea to poke around a bit and see if there are applications.
The use of PL functions is by no means central to ML theory, and is an incidental aspect of early algorithms.
I agree this is an important point. I don’t actually have a good idea what activation functions people use in practise these days. Thinking about asymptotic linearity makes me think about the various papers appearing using polynomial activation functions. Do you have an opinion on this? For people in algebraic geometry it’s appealing as it generates lots of AG problems (maybe v hard), but I don’t have a good feeling as to whether it’s got anything much to do with `real life’ ML. I can link to some of the papers I’m thinking of if that’s helpful, or maybe you are already a bit familiar.
I don’t see why one wouldn’t just use ordinary currents here (currents on a PL manifold can be made sense of after smoothing, or in a distribution-valued sense, etc.).
I think you’re right; this paper just came to mind because I was reading it recently.
whether tropical geometry has ever been useful (either in proving something or at least in reconceptualizing something in an interesting way) in linear programming.
1 Algebraic geometry in general (including tropical geometry) isn’t good at dealing with deep compositions of functions, and especially approximate compositions.
Fair, though one might also see that as an interesting challenge. I don’t have a feeling as to whether this is for really fundamental reasons, or people haven’t tried so hard yet.
2 [….] I simply can’t think of any behavior that is at all meaningful from an AG-like perspective where the questions of fan combinatorics and degrees of polynomials are replaced by questions of approximate equality.
There are plenty of cases where “high degree” is enough (Falting’s Theorem is the first thing that comes to mind, but there are lots). But I agree that “degree approximately 5″ feels quite unnatural.
Hi Dmitry,
To me it seems not unreasonable to think that some ideas from tropical geometry might be relevant ML, for the simple reason that the functions coming up in ML with ReLU activation are PL, and people in tropical geometry have thought seriously about PL functions. Of course this does not guarantee that there is anything useful to be said!
One possible example that comes to mind in the context of your post here is the concept of polyhedral currents. As I understand it, here the notion of “density of polygons’ is used as a kind of proxy for the derivative of a PL function? But I think the theory of polyhedral currents gives a much more general theory of differentiation of PL functions. Very naively, rather than just recording the locus where the function fails to be linear, one also records how much the derivative changes when crossing the walls. I learnt about this from a paper of Mihatsch: https://arxiv.org/pdf/2107.12067 but I’m certain there are older references.
I’m really a log person, I don’t know the tropical world very well; sorry if what I write does not make sense!
Thanks for the reference, and thanks for providing an informed point of view here. I would love to have more of a debate here, and would quite like being wrong as I like tropical geometry.
First, about your concrete question:
Density is a proxy for the second derivative: indeed, the closer a function is to linear, the easier it is to approximate it by a linear function. I think a similar idea occurs in 3D graphics, in mesh optimization, where you can improve performance by reducing the number of cells in flatter domains (I don’t understand this field, but this is done in this paper according to some energy curvature-related energy functional). The question of “derivative change when crossing walls” seems similar. In general, glancing at the paper you sent, it looks like polyhedral currents are a locally polynomial PL generalization of currents of ordinary functions (and it seems that there is some interesting connection made to intersection theory/analogues of Chow theory, though I don’t have nearly enough background to read this part carefully). Since the purpose of PL functions in ML is to approximate some (approximately smooth, but fractally messy and stochastic) “true classification”, I don’t see why one wouldn’t just use ordinary currents here (currents on a PL manifold can be made sense of after smoothing, or in a distribution-valued sense, etc.).
In general, I think the central crux between us is whether or not this is true:
I’m not sure I agree with this argument. The use of PL functions is by no means central to ML theory, and is an incidental aspect of early algorithms. The most efficient activation functions for most problems tend to not be ReLUs, though the question of activation functions is often somewhat moot due to the universal approximation theorem (and the fact that, in practice, at least for shallow NNs anything implementable by one reasonable activation tends to be easily implementable, with similar macroscopic properties, by any other). So the reason that PL functions come up is that they’re “good enough to approximate any function” (and also “asymptotic linearity” seems genuinely useful to avoid some explosion behaviors). But by the same token, you might expect people who think deeply about polynomial functions to be good at doing analysis because of the Stone-Weierstrass theorem.
More concretely, I think there are two core “type mismatches” between tropical geometry and the kinds of questions that appear in ML:
Algebraic geometry in general (including tropical geometry) isn’t good at dealing with deep compositions of functions, and especially approximate compositions.
(More specific to TG): the polytopes that appear in neural nets are as I explained inherently random (the typical interpretation we have of even combinatorial algorithms like modular addition is that the PL functions produce some random sharding of some polynomial function). This is a very strange thing to consider from the point of view of a tropical geometer: like as an algebraic geometer, it’s hard for me to imagine a case where “this polynomial has degree approximately 5… it might be 4 or 6, but the difference between them is small”. I simply can’t think of any behavior that is at all meaningful from an AG-like perspective where the questions of fan combinatorics and degrees of polynomials are replaced by questions of approximate equality.
I can see myself changing my view if I see some nontrivial concrete prediction or idea that tropical geometry can provide in this context. I think a “relaxed” form of this question (where I genuinely haven’t looked at the literature) is whether tropical geometry has ever been useful (either in proving something or at least in reconceptualizing something in an interesting way) in linear programming. I think if I see a convincing affirmative answer to this relaxed question, I would be a little more sympathetic here. However, the type signature here really does seem off to me.
Like David Holmes I am not an expert in tropical geometry so I can’t give the best case for why tropical geometry may be useful. Only a real expert putting in serious effort can make that case.
Let me nevertheless respond to some of your claims.
PL functions are quite natural for many reasons. They are simple. They naturally appear as minimizers of various optimization procedures, see e.g. the discussion in section 5 here.
Polynomials don’t satisfy the padding argument and architectures based on them therefore will typically fail to have the correct simplicitity bias.
As for
1.” Algebraic geometry isn’t good at dealing with deep composition of functions, and especially approximate composition.” I agree a typical course in algebraic geometry will not much consider composition of functions but that doesn’t seem to me a strong argument for the contention that the tools of algebraic geometry are not relevant here. Certainly, more sophisticated methods beyond classical scheme theory may be important [likely involving something like PROPs] but ultimately I’m not aware of any fundamental obstruction here.
2. >>
I don’t agree with the contention that algebraic geometry is somehow not suited for questions of approximation. e.g. the Weil conjectures is really an approximate/ average statement about points of curves over finite fields. The same objection you make could have been made about singularity theory before we knew about SLT.
I agree with you that a probabilistic perspective on ReLUs/ piece-wise linear functions is probably important. It doesn’t seem unreasonable to me in the slightest to consider some sort of tempered posterior on the space of piecewise linear functions. I don’t think this invalidates the potential of polytope-flavored thinking.
Hmm, so I’m very wary of defending tropical geometry when I know so little about it; if anyone more informed is reading please jump in! But until then, I’ll have a go.
Hmm, even for a very small value of `might’? I’m not saying that someone who wants to contribute to ML needs to seriously consider learning some tropical geometry, just that if one already knows tropical geometry it’s not a crazy idea to poke around a bit and see if there are applications.
I agree this is an important point. I don’t actually have a good idea what activation functions people use in practise these days. Thinking about asymptotic linearity makes me think about the various papers appearing using polynomial activation functions. Do you have an opinion on this? For people in algebraic geometry it’s appealing as it generates lots of AG problems (maybe v hard), but I don’t have a good feeling as to whether it’s got anything much to do with `real life’ ML. I can link to some of the papers I’m thinking of if that’s helpful, or maybe you are already a bit familiar.
I think you’re right; this paper just came to mind because I was reading it recently.
A little googling suggests there are some applications. This paper seems to give an application of tropical geometry to complexity of linear programming: https://inria.hal.science/hal-03505719/document and this list of conference abstracts seems to give other applications: https://him-application.uni-bonn.de/fileadmin/him/Workshops/TP3_21_WS1_Abstracts.pdf Whether they are ‘convincing’ I leave up to you.
Fair, though one might also see that as an interesting challenge. I don’t have a feeling as to whether this is for really fundamental reasons, or people haven’t tried so hard yet.
There are plenty of cases where “high degree” is enough (Falting’s Theorem is the first thing that comes to mind, but there are lots). But I agree that “degree approximately 5″ feels quite unnatural.