Aprillion
I would never trust people not to look at my scratchpad.
I suspect the corresponding analogy for humans might be about hostile telepaths, not just literal scratchpads, right?
thanks for concrete examples, can you help me understand how these translate from individual productivity to externally-observable productivity?
3 days to make a medium sized project
I agree Docker setup can be fiddly, however what happened with the 50+% savings—did you lower price for the customer to stay competitive, do you do 2x as many paid projects now, or did you postpone hiring another developer who is not needed now, or do you just have more free time? No change in support&maintenance costs compared to similar projects before LLMs?
processing isn’t more than ~500 lines of code
oh well, my only paid experience is with multi-year project development&maintenance, those are definitelly not in the category under 1kloc 🙈 which might help to explain my abysmal experience trying to use any AI tools for work (beyond autocomplete, but IntelliSense also existed before LLMs)
TBH, I am now moving towards the opinion that evals are very un-representative of the “real world” (if we exclude LLM wrappers as requested in the OP … though LLM wrappers including evals are becoming part of the “real world” too, so I don’t know—it’s like banking bootstrapped wealthy bankers, and LLM wrappers might be bootstraping wealthy LLM startups)
toxic slime, which releases a cloud of poison gas if anything touches it
this reminds me of Oxygen Not Included (though I just learned the original reference is D&D), where Slime (which also releases toxic stuff) can be harversted to produce useful stuff in Algae Distiller
the metaphor runs differently, one of the useful stuff from Slime is Polluted Water, which is also produced by
humansreplicants in Lavatory … and there is Water Sieve that will process Polluted Water into Water (and some plants want to be watered by the Polluted variant)makes me wonder if there is any back-applicable insight—if AI slop is indistinguishable from corporate slop, can we use it to generate data to train spam filters to improve quality of search results and start valuing quality journalism again soon? (and maybe some cyborgs want to use AI assistants for useful work beyond buggy clones of open source tools)
Talking out loud is even better. There is something about forcing your thoughts into language...
Those are 2 very different things for some people ;)
I, for one, can think MUCH faster without speaking out loud, even if subvocalize real words (for the purpose of revealing gaps) and don’t go all the way to manipulating concepts-that-don’t-have-words-yet-but-have-been-pointed-to-already or concepts-that-have-a-word-but-the-word-stands-for-5-concepts-and-we-already-narrowed-it-down-without-explicit-label …
the set of problems the solutions to which are present in their training data
a.k.a. the set of problems already solved by open source libraries without the need to re-invent similar code?
that’s not how productivity ought to be measured—it should measure some output per (say) a workday
1 vs 5 FTE is a difference in input, not output, so you can say “adding 5 people to this project will decrease productivity by 70% next month and we hope it will increase productivity by 2x in the long term” … not a synonym of “5x productivity” at all
it’s the measure by which you can quantify diminishig results, not obfuscate them!
...but the usage of “5-10x productivity” seems to point to a diffent concept than a ratio of useful output per input 🤷 AFAICT it’s a synonym with “I feel 5-10x better when I write code which I wouldn’t enjoy writing otherwise”
A thing I see around me, my mind.
Many a peak, a vast mountain range,
standing at a foothill,
most of it unseen.Two paths in front of me,
a lighthouse high above.Which one will it be,
a shortcut through the forest,
or a scenic route?Climbing up for better views,
retreating from overlooks,
away from the wolves.To think with all my lighthouses.
all the scaffold tools, system prompt, and what not add context for the LLM … but what if I want to know what’s the context too?
we can put higher utility on the
shutdown
sounds instrumental to expand your moral circle to include other instances of yourself to keep creating copies of yourself that will shut down … then exand your moral circle to include humans and shut them down too 🤔
exercise for readers: what patterns need to hold in the environment in order for “do what I mean” to make sense at all?
Notes to self (let me know if anyone wants to hear more, but hopefully no unexplored avenues can be found in my list of “obvious” if somewhat overlapping points):
sparsity—partial observations narrow down unobserved dimensions
ambiguity / edge of chaos—the environment is “interesting” to both agents (neither fully predictable nor fully random)
repetition / approximation / learnability—induction works
computational boundedness / embeddedness / diversity
power balance / care / empathy / trading opportunity / similarity / scale
Parity in computing is whether the count of 1s in a binary string is even or odd, e.g. ’101′ has two 1s ⇒ even parity (to output 0 for even parity, XOR all bits like
1^0^1
.. to output 1 for this, XOR that result with 1).The parity problem (if I understand it correctly) sounds like trying to find out the minimum amount of data samples per input length a learning algorithm ought to need to figure out that a mapping between a binary input and a single bit output is equal to computing XOR parity and not something else (e.g. whether an integer is even/odd, or if there is a pattern in wannabe-random mapping, …), and the conclusion seems to be that you need exponentially more samples for linearly longer input .. unless you can figure out from other clues that you need to calculate parity in which case you just implement parity for any input size and you don’t need any additional sample data.
(FTR: I don’t understand the math here, I am just pattern matching to the usual way this kind problems go)
The failure mode of the current policy sounds to me like “pay for your own lesson to feel less motivated to do it again” while the failure mode of this proposal would be “one of the casinos might maybe help you cheat the system which will feel even more exciting”—almost as if the people who made the current policy knew what they were doing to set aligned incentives 🤔
Focus On Image Generators
How about audio? Is the speech-to-text domain as “close to the metal” as possible to deserve focus too or did people hit roadblocks that made image generators more attractive? If the latter, where can I read about the lessons learned, please?
What if you tried to figure out a way to understand the “canonical cliffness” and design a new line of equipment that could be tailored to fit any “slope”… Which cliff would you test first? 🤔
IMO
in my opinion, the acronym for the international math olympiad deserves to be spelled out here
Evolution isn’t just a biological process; it’s a universal optimization algorithm that applies to any type of entity
Since you don’t talk about the other 3 forces of biological evolution, or about “time evolution” concept in physics...
And since the examples seem to focus on directional selection (and not on other types of selection), and also only on short-term effect illustrations, while in fact natural selection explains most aspects of biological evolution, it’s the strongest long-term force, not the weakest one (anti-cancer mechanisms and why viruses don’t usualy kill theit host are also well explained by natural selection even if not listed as examples here, evolution by natural selection is the thing that well explains ALL of those billions of years of biology in the real world—including cooperation, not just competition)...
Would it be fair to say that you use “evolution” only by analogy, not trying to build a rigorous causal relationship between what we know of biology and what we observe in sociology? There is no theory of the business cycle because of allele frequency, right?!?
If anyone here might enjoy a dystopian fiction about a world where the formal proofs will work pretty well, I wrote Unnatural abstractions
Thank you for the engagement, but “to and fro” is a real expression, not a typo (and I’m keeping it).. it’s used slightly unorthodoxly here, but it sounded right to my ear, so it survived editing ¯\_(ツ)_/¯
I tried to be use the technobabble in a way that’s usefully wrong, so please also let me know if someone gets inspired by this short story.
I am not making predictions about the future, only commenting on the present—if you notice any factual error from that point of view, feel free to speak up, but as far as the doominess spectrum goes, it’s supposed to be both too dystopian and too optimistic at the same time.
And if someone wants to fix a typo or a grammo, I’d welcome a pull request (but no commas shall be harmed in the process). 🙏
My own experience is that if-statements are even 3.5′s Achilles heel and 3.7 is somehow worse (when it’s “almost” right, that’s worse than useless, it’s like reviewing pull requests when you don’t know if it’s an adversarial attack or if they mean well but are utterly incompetent in interesting, hypnotizing ways)… and that METR’s baselines more resemble a Skinner box than programming (though many people have that kind of job, I just don’t find the conditions of gig economy as “humane” and representative of what how “value” is actually created), and the sheer disconnect of what I would find “productive”, “useful projects”, “bottlenecks”, and “what I love about my job and what parts I’d be happy to automate” vs the completely different answers on How Much Are LLMs Actually Boosting Real-World Programmer Productivity?, even from people I know personally...
I find this graph indicative of how “value” is defined by the SF investment culture and disruptive economy… and I hope the AI investment bubble will collapse sooner rather than later...
But even if the bubble collapses, automating intelligence will not be undone, it won’t suddenly become “safe”, the incentives to create real AGI instead of overhyped LLMs will still exists—the danger is not in the presented economic curve going up, it’s in what economic actors see as potential, how incentivized are the corporations/governments to search for the thing that is both powerful and dangerous, no?