Just to add another data point, as a software engineer, I also find it hard to extract utility from LLMs. (And this has not been for a lack of trying, e.g. at work we are being pushed to use LLM enabled IDEs.) I am constantly surprised to hear when people on the internet say that LLMs are a significant productivity boost for them.
My current model is that LLMs are better if you are working on some mainstream problem domain using a mainstream tech stack (language, library, etc.). This is approximately JavaScript React frontend development in my mind, and as you move away from that the less useful LLMs get. (The things I usually work on are using a non-mainstream tech stack and/or have a non-mainstream problem domain (but in my mind all interesting problems are non-mainstream in that sense), so this would explain my lack of success.)
Wonder if correctness proofs (checked by some proof assistant) can help with this.[1]
I think the main bottleneck in the past for correctness proofs was that it takes much more effort to write the proofs than it takes to write the programs themselves, and current automated theorem provers are nowhere near good enough.
Writing machine checked proofs is a prime RL target, since proof assistant kernels should be adversarially robust. We have already seen great results from stuff like AlphaProof.
One counterargument I could see is that writing the correctness properties themselves could turn out to be a major bottleneck. It might be that for most real world systems you can’t write succinct correctness properties.