Regarding Turing machines: their definition is due to a philosophical argument, rather than mathematical (hence Church-Turing “thesis” rather than “theorem”). Essentially, Turing argues that any finite region of space can only have a finite set of distinguishable states; that any finite set can be enumerated and indexed; and therefore any physical system can be described by a (vast) transition table between these indexes. For input/output, a similar argument is made that only finitely-many possibilities can be distinguished between, that we can equivalently operate on their indexes instead, and that they can be written along one unbounded “tape”. Hence we can describe the behaviour of any physical system (including any procedure that a mathematician may carry out) by using a transition table; an argument that’s so general it avoids quibbling about minor details of physics, neurobiology, mathematics, etc. (which explains why proposed counterexamples require extreme physics like time travel, orbiting black holes, etc.). I highly recommend reading his On Computable Numbers paper.
This philosophically-justified generality was important to show that undecidability is fundamental, rather than just a deficiency of the chosen system (Rice’s Theorem also extends undecidability from the halting problem to all “non-trivial behaviours” of all universal systems). In fact Goedel hated Lambda Calculus and kept trying to invent more powerful alternatives (System T, General Recursive Functions, etc.): he only stopped when Turing proved that Lambda Calculus was equivalent to Turing machines! It wasn’t so much that “Alonzo’s lambda calculus was also cool, but I like this better”; it was that Turing machines show why the cannot be anything “better”, so we might as well use them, or Lambda Calculus, or something else equivalent. Notice that there aren’t many programs actually written as Turing machines (outside of e.g. Busy Beaver research), whilst Lambda Calculus is how basically every programmer writes abstractions (Python even uses the keyword ‘lambda’, and it’s the name of AWS’s function-as-a-service offering).
Another crucial insight of Turing’s, which wasn’t mentioned explicitly, is the existence of “universal” Turing machines: which can read an encoding of another Turing machine as input, and simulate that machine’s behaviour (AKA an “interpreter”, “emulator” or “virtual machine”). That’s crucial to the notion of “general purpose” that you mention informally many times; what many would call “Turing complete”. All universality results, whether it’s for Python, RNNs, Rule 110, Magic the Gathering, etc. ultimately rest on this: either by using the system to emulate a universal Turing machine directly, or to implement another system that has been proven universal that way.
Babbage and Lovelace didn’t have such a notion of universality either; the Analytical Engine was only shown to be Turing complete later. Universal machines also show the existence of “software”, and its fundamental equivalence to hardware. For example, the input to a Jaquard loom describes a fabric pattern, not a loom-which-would-make-that-pattern; hence a Jaquard loom can’t meaningfully “emulate” another Jaquard loom. Yet it makes perfect sense to “stack” universal machines, e.g. modern CPUs are universal machines, which emulate x86 processors, which run Java VMs, which run Minecraft servers, which run redstone processors, and so on.
Regarding Turing machines: their definition is due to a philosophical argument, rather than mathematical (hence Church-Turing “thesis” rather than “theorem”). Essentially, Turing argues that any finite region of space can only have a finite set of distinguishable states; that any finite set can be enumerated and indexed; and therefore any physical system can be described by a (vast) transition table between these indexes. For input/output, a similar argument is made that only finitely-many possibilities can be distinguished between, that we can equivalently operate on their indexes instead, and that they can be written along one unbounded “tape”. Hence we can describe the behaviour of any physical system (including any procedure that a mathematician may carry out) by using a transition table; an argument that’s so general it avoids quibbling about minor details of physics, neurobiology, mathematics, etc. (which explains why proposed counterexamples require extreme physics like time travel, orbiting black holes, etc.). I highly recommend reading his On Computable Numbers paper.
This philosophically-justified generality was important to show that undecidability is fundamental, rather than just a deficiency of the chosen system (Rice’s Theorem also extends undecidability from the halting problem to all “non-trivial behaviours” of all universal systems). In fact Goedel hated Lambda Calculus and kept trying to invent more powerful alternatives (System T, General Recursive Functions, etc.): he only stopped when Turing proved that Lambda Calculus was equivalent to Turing machines! It wasn’t so much that “Alonzo’s lambda calculus was also cool, but I like this better”; it was that Turing machines show why the cannot be anything “better”, so we might as well use them, or Lambda Calculus, or something else equivalent. Notice that there aren’t many programs actually written as Turing machines (outside of e.g. Busy Beaver research), whilst Lambda Calculus is how basically every programmer writes abstractions (Python even uses the keyword ‘lambda’, and it’s the name of AWS’s function-as-a-service offering).
Another crucial insight of Turing’s, which wasn’t mentioned explicitly, is the existence of “universal” Turing machines: which can read an encoding of another Turing machine as input, and simulate that machine’s behaviour (AKA an “interpreter”, “emulator” or “virtual machine”). That’s crucial to the notion of “general purpose” that you mention informally many times; what many would call “Turing complete”. All universality results, whether it’s for Python, RNNs, Rule 110, Magic the Gathering, etc. ultimately rest on this: either by using the system to emulate a universal Turing machine directly, or to implement another system that has been proven universal that way.
Babbage and Lovelace didn’t have such a notion of universality either; the Analytical Engine was only shown to be Turing complete later. Universal machines also show the existence of “software”, and its fundamental equivalence to hardware. For example, the input to a Jaquard loom describes a fabric pattern, not a loom-which-would-make-that-pattern; hence a Jaquard loom can’t meaningfully “emulate” another Jaquard loom. Yet it makes perfect sense to “stack” universal machines, e.g. modern CPUs are universal machines, which emulate x86 processors, which run Java VMs, which run Minecraft servers, which run redstone processors, and so on.