Here is one aspect which might be useful to keep in mind.
If we think about all this as some kind of “generalized Taylor expansion”, there are some indications that the deviations from linearity might be small.
Another indication pointing to “almost linearity” is that “model merge” works pretty well. Although, interestingly enough, people often prefer to approach “model merge” in a more subtle fashion than just linear interpolation, so, presumably, non-linearity does matter quite a bit as well, e.g. https://huggingface.co/blog/mlabonne/merge-models.
Yes, I think this makes sense.
Here is one aspect which might be useful to keep in mind.
If we think about all this as some kind of “generalized Taylor expansion”, there are some indications that the deviations from linearity might be small.
E.g. there is this rather famous post, https://www.lesswrong.com/posts/JK9nxcBhQfzEgjjqe/deep-learning-models-might-be-secretly-almost-linear.
Another indication pointing to “almost linearity” is that “model merge” works pretty well. Although, interestingly enough, people often prefer to approach “model merge” in a more subtle fashion than just linear interpolation, so, presumably, non-linearity does matter quite a bit as well, e.g. https://huggingface.co/blog/mlabonne/merge-models.