Nevin Wetherill comments on If you are assuming Software works well you are dead

Nevin Wetherill 4 May 2024 17:26 UTC
1 point
0
I have been contemplating Connor Leahy’s Cyborgism and what it would mean for us to improve human workflows enough that aligning AGI looks less like:

Sisyphus attempting to roll a 20 tonne version of The One Ring To Rule Them All into the caldera of Mordor while blindfolded and occasionally having to bypass vertical slopes made out of impossibility proofs that have been discussed by only 3 total mathematicians ever in the history of our species—all before Sauron destroys the world after waking up from a restless nap of an unknown length.

I think this is what you meant by “make the ultimate <programming language/text editor/window manager/file system/virtual collaborative environment/interface to GPT/...>”

Intuitively, the level I’m picturing is:

A suite of tools that can be booted up from a single icon on the home screen of a computer which then allows anyone who has decent taste in software to create essentially any program they can imagine up to a level of polish that people can’t poke holes in even if you give a million reviewers 10 years of free time.

Can something at this level be accomplished?

Well, what does coding look like currently? It seems to look like a bunch of people with dark circles under their eyes reading long strings of characters in something basically the equivalent to an advanced text editor, with a bunch of additional little windows of libraries and graphics and tools.

This is not as domain where human intelligence performs with as much ease as in other domains like spearfishing or bushcraft.

If you want to build Cyborgs, I am pretty sure where you start is by focusing on building software that isn’t god-machines, throwing out the old book of tacit knowledge, and starting over with something that makes each step as intuitive as possible. You probably also focus way more on quality over quantity/speed.

So, plaintext instructions on what kind of software you want to build, or a code repository and a plaintext list of modifications? Like, start with an elevator pitch, see the raw/AI generated material, critique in a step-by-step organized fashion where debugging/feature analysis checklists are generated and scored on whether they included everything you would have thought of/stuff that is valid which you didn’t think of.

I think the point in this post is valid, though a bit more in the realm of “angsty shower-thought” rather than a ready-to-import heuristic for analysing the gap between competence-in-craft and unleashed-power-levels.

There is a bit of a bootstrapping problem with Cyborgism. I don’t think you get the One Programming Suite To Rule Them All by plugging in a bunch of different LLMs fine tuned to do one part of the process really well—then packaging the whole thing up and running it on 6 gaming GPUs. That is the level of super-software that seems in reach, and it just seems doomed to be full of really hard-to-perceive holes like a super high-dimensional block of swiss cheese.

Does that even get better if we squeeze the weights of LLMs to get LLeMon juice:

Python script that does useful parts of the cool smart stuff LLMs do without all the black box fuzzy embedding-spaces/vectors + filter the juice for pulp/seeds (flaws in the specific decoded algorithm that could cause errors via accidental or deliberate adversarial example) + sweeten it (make the results make sense/be understandable to humans)

… then plug a bunch of that LLeMonade into a programming suite such that the whole thing works with decently competent human programmer(s) to reliably make stuff that actually just works & alert people ahead of time of the exhaustive set of actual issues/edge cases/gaps in capability of a program?

This problem does seem difficult—and probably the whole endeavor just actually won’t work well enough IRL, but it seems worth trying?

Like, what does it look like to throw everything and the kitchen sink at Alignment? It probably looks at least a little like the Apollo program, and if you’re doing NASA stuff properly, then you end up making some revolutionary products for everyone else.

I think those products—the random externalities of a healthy Alignment field—look more like tools that work simply and reliably, rather than the giant messy LLM hairballs AI labs keep coughing up and dumping on people.

Maybe all of this helps flesh out and make more useful the flinchy heuristic of “consumer software works terribly → … → AI destroys the future.”

Alignment as a field goes out ahead of the giant rolling gooey hairball of spaghetti-code Doom—untangles it and weaves it into a beautiful textile—or we are all dead.