The fixed point problem is worse than you think. Take the Hungarian astrology example, with an initial easy set with both a length limitation (e.g. < 100k characters) and simplicity limitation.
Now I propose a very simple improvement scheme: If the article ends in a whitespace character, then try to classify the shortened article with last character removed.
This gives you an infinite sequence of better and better decision boundaries (each time, a couple of new cases are solved—the ones that are of lenth 100k + $N$, end in at least $N$ whitespace, and are in the easy set once the whitespace has been stripped). This nicely converges to the classifier that trims all trailing whitespace and then asks its initial classifier.
What I’m trying to say here is: The space of cases to consider can be large in many dimensions. The countable limit of a sequence of extensions needs not be a fixed point of the magical improvement oracle.
Generally, I’d go into a different direction: Instead of arguing about iterated improvement, argue that of course you cannot correctly extrapolate all decision problems from a limited amount of labeled easy cases and limited context. The style of counter-example is to construct two settings (“models” in the lingo of logic) A and B with same labeled easy set (and context made available to the classifier), where the correct answer for some datapoint x differs in both settings. Hence, safe extrapolation must always conservatively answer NO to x, and cannot be expected to answer all queries correctly from limited training data (typical YES / NO / MAYBE split).
I think the discussion about the fixed point or limit iterative improvement does not lead to the actually relevant argument that extrapolation cannot conjure information out of nowhere?
You could cut it out completely without weakening the argument against certain types of automated ontology identification being impossible.
I doubt your optimism on the level of security that is realistically achievable. Don’t get me wrong: The software industry has made huge progress (at large costs!) in terms of security. Where before, most stuff popped a shell if you looked at it funny, it is now a large effort for many targets.
Further progress will be made.
If we extrapolate this progress—we will optimistically reach a point where impactful reliable 0day is out of reach for most hobbyists and criminals, and the domain of natsec of great powers.
But I don’t see how raising this waterline will help for AI risk in particular?
As in: godlike superintelligence is game over anyway. AI that is comparably good at exploitation as the rest of humanity taken together, is beyond what is realistically defendable against, in terms of wide-spread deployed security level. An AI that doesn’t reach that level without human assistance is probably not lethal anyways.
On the other hand, one could imagine pivotal acts by humans with limited-but-substantial AI assistance that rely on the lack of wide-spread security.
Pricing human + weakish AI collaborations out of the world-domination-via-hacking game might actually make matters worse, in so far as weakish non-independent AI might be easier to keep aligned.
A somewhat dystopian wholesale surveillance of almost every word written and said by humans, combined with AI that is good enough at text comprehension and energy efficient enough to pervasively and correctly identify scary-looking research and flag it to human operators for intervention is plausibly pivotal and alignable, and makes for much better cyberpunk novels than burning GPUs anyway (mentally paging cstross, I want my Gibson homage in form of a “Turing Police”/laundry-verse crossover).
Also, good that you mentioned rowhammer. Rowhammer and the DRAM industries half-baked pitiful response are humankinds capitulation in terms of “making at least some systems actually watertight”.