Bypasses the need for cutlery and plates.
(Yes, you might also eat such foods in cases where you do have cutlery and plates, but that’s downstream of their existence, not the vital reason for their existence.)
Bypasses the need for cutlery and plates.
(Yes, you might also eat such foods in cases where you do have cutlery and plates, but that’s downstream of their existence, not the vital reason for their existence.)
Ok, so then since one can’t make artificial general agents, it’s not so confusing that an AI-assisted human can’t solve the task. I guess it’s true though that my description needs to be amended to rule out things constrained by possibility, budget, or alignment.
This statement is pretty ambiguous. “Artificial employee” makes me think of some program that is meant to perform tasks in a semi-independent manner. It would be trivial to generate a million different prompts and then have some interface that routes stuff to these prompts in some way. You could also register it as a corporation. It would presumably be slightly less useful than your generic AI chatbot, because the cost and latency would be slightly higher than if you didn’t set up the chatbot in this way. But only slightly.
Though one could argue that since AI chatbots lack agency, they don’t count as artificial employees. But then is there anything that counts? Like at some point it just seems like a confused goal to me.
You can make highly general AIs, they will just lack agency. You then plop a human on top of the AI and the human will provide plenty agency for basically all legible purposes.
I think it’s not cheating in a practical sense, since applications of AI typically have a team of devs noticing when it’s tripping up and adding special handling to fix that, so it’s reflective of real-world use of AI.
But I think it’s illustrative of how artificial intelligence most likely won’t lead to artificial general agency and alignment x-risk, because the agency will be created through unblocking a bunch of narrow obstacles, which will be goal-specific and thus won’t generalize to misalignment.
Were these key things made by the AI, or by the people making the run?
In retrospect: fewer and fewer people have been working on this over time. Getting back on track is probably not feasible to happen: https://www.lesswrong.com/posts/puv8fRDCH9jx5yhbX/johnswentworth-s-shortform?commentId=jZ2KRPoxEWexBoYSc
It’s often not just that they endorse a single belief, but also rather that they have a whole psychodrama and opinions on the appropriate areas of investigation. [Joseph Bronski](https://x.com/BronskiJoseph/status/1917573847810990210) exemplifies this when taken to an extreme.
And I wouldn’t say that the real answer is to have their belief not be considered racist. I’d say the real reason is something like, they want to fight back against anti-racists. [Arthur Jensen probably described the best what they’re trying to fight against](https://emilkirkegaard.dk/en/2019/04/a-kind-of-social-paranoia-a-belief-that-mysterious-hostile-forces-are-operating-to-cause-inequalities-in-educational-and-occupational-performance-despite-all-apparent-efforts-to-eliminate-prejudi/).
Sometimes when discussing a controversial issue, people are kind of avoiding the most important points within it, and then it feels relevant to ask them what their interest in discussing it is. Their true interest will often be to control the political discourse away from some dynamic they perceive as pathological, but if they explained that, they would have to argue that the dynamic is pathological, which they often don’t want to do because they risk triggering the dynamic that way. As a diversion, they will sometimes say that their motivation is to seek truth.
I think you see this a lot with racism/sexism-type stuff, where racists/sexists dissociate from the fact that they are racist/sexist in order to square their sense that their perspective ought to be considered with the common norm against sexism/racism. And then they consider enforcement of said norm to be pathological, but they’re too dissociated/afraid to explain how, so they say they just care about the truth, even though their dissociation often prevents them from properly investigating said truth.
(Not sure if by “runtime” you mean “time spent running” or “memory/program state during the running time” (or something else)? I was imagining memory/program state in mine, though that is arguably a simplification since the ultimate goal is probably something to do with the business.)
Potentially challenging example: let’s say there’s a server that’s bottlenecked on some poorly optimized algorithm, and you optimize it to be less redundant, freeing resources that immediately gets used for a wide range of unknown but presumably good tasks.
Superficially, this seems like an optimization that increased the description length. I believe the way this is solved in the OP is that the distributions are constructed in order to assign an extra long description length to undesirable states, even if these undesirable states are naturally simpler and more homogenous than the desirable ones.
I am quite suspicious that this risks you end up with improper probability distributions. Maybe that’s OK.
Writing the part that I didn’t get around to yesterday:
You could theoretically imagine e.g. scanning all the atoms of a human body and then using this scan to assemble a new human body in their image. It’d be a massive technical challenge of course, because atoms don’t really sit still and let you look and position them. But with sufficient work, it seems like someone could figure it out.
This doesn’t really give you artificial general agency of the sort that standard Yudkowsky-style AI worries are about, because you can’t assign them a goal. You might get an Age of Em-adjacent situation from it, though even not quite that.
To reverse-engineer people in order to make AI, you’d instead want to identify separate faculties with interpretable effects and reconfigurable interface. This can be done for some of the human faculties because they are frequently applied to their full extent and because they are scaled up so much that the body had to anatomically separate them from everything else.
However, there’s just no reason to suppose that it should apply to all the important human faculties, and if one considers all the random extreme events one ends up having to deal with when performing tasks in an unhomogenized part of the world, there’s lots of reason to think humans are primarily adapted to those.
One way to think about the practical impact of AI is that it cannot really expand on its own, but that people will try to find or create sufficiently-homogenous places where AI can operate. The practical consequence of this is that there will be a direct correspondence between each part of the human work to prepare the AI to each part of the activities the AI is engaging in, which will (with caveats) eliminate alignment problems because the AI only does the sorts of things you explicitly make it able to do.
The above is similar to how we don’t worry so much about ‘website misalignment’ because generally there’s a direct correspondence between the behavior of the website and the underlying code, templates and database tables. This didn’t have to be true, in the sense that there are many short programs with behavior that’s not straightforwardly attributable to their source code and yet still in principle could be very influential, but we don’t know how to select good versions of such programs, so instead we go for the ones with a more direct correspondence, even though they are larger and possibly less useful. Similarly with AI, since consequentialism is so limited, people will manually build out some apps where AI can earn them a profit operating on homogenized stuff, and because this building-out directly corresponds to the effect of the apps, they will be alignable but not very independently agentic.
(The major caveat is people may use AI as a sort of weapon against others, and this might force others to use AI to defend themselves. This won’t lead to the traditional doom scenarios because they are too dependent on overestimating the power of consequentialism, but it may lead to other doom scenarios.)
After all, if ‘those things are helpful’ wasn’t a learnable pattern, then evolution would not have discovered and exploited that pattern!
I’ve grown undecided about whether to consider evolution a form of intelligence-powered consequentialism because in certain ways it’s much more powerful than individual intelligence (whether natural or artificial).
Individual intelligence mostly focuses on information that can be made use of over a very short time/space-scale. For instance an autoregressive model relates the immediate future to the immediate past. Meanwhile, evolution doesn’t meaningfully register anything shorter than the reproductive cycle, and is clearly capable of registering things across the entire lifespan and arguably longer than that (like, if you set your children up in an advantageous situation, then that continues paying fitness dividends even after you die).
Of course this is somewhat counterbalanced by the fact that evolution has much lower information bandwidth. Though from what I understand, people also massively underestimate evolution’s information bandwidth due to using an easy approximation (independent Bernoulli genotypes, linear short-tailed genotype-to-phenotype relationships and thus Gaussian phenotypes, quadratic fitness with independence between organisms). Whereas if you have a large number of different niches, then within each niche you can have the ordinary speed of evolution, and if you then have some sort of mixture niche, that niche can draw in organisms from each of the other niches and thus massively increase its genetic variance, and then since the speed of evolution is proportional to genetic variance, that makes this shared niche evolve way faster than normally. And if organisms then pass from the mixture niche out into the specialized niches, they can benefit from the fast evolution too.
(Mental picture to have in mind: we might distinguish niches like hunter, fisher, forager, farmer, herbalist, spinner, potter, bard, bandit, carpenter, trader, king, warlord (distinct from king in that kings gain power through expanding their family while warlords gain power by sniping the king off a kingdom), concubine, bureaucrat, … . Each of them used to be evolving individually, but also genes flowed between them in various ways. Though I suspect this is undercounting the number of niches because there’s also often subniches.)
And then obviously beyond these points, individual intelligence and evolution focus on different things—what’s happening recently vs what’s happened deep in the past. Neither are perfect; society has changed a lot, which renders what’s happened deep in the past less relevant than it could have been, but at the same time what’s happening recently (I argue) intrinsically struggles with rare, powerful factors.
If some other evolved aspects of the brain and body are helpful, then {intelligence, understanding, consequentialism} can likewise discover that they are helpful, and build them.
If the number of such aspects is dozens or hundreds or thousands, then whatever, {intelligence, understanding, consequentialism} can still get to work systematically discovering them all. The recipe for a human is not infinitely complex.
Part of the trouble is, if you just study the organism in isolation, you just get some genetic or phenotypic properties. You don’t have any good way of knowing which of these are the important ones or not.
You can try developing a model of all the different relevant exogenous factors. But as I insist, a lot of them will be too rare to be practical to memorize. (Consider all the crazy things you hear people who make self-driving cars need to do to handle the long tail, and then consider that self-driving cars are much easier than many other tasks, with the main difficult part being the high energies involved in driving cars near people.)
The main theoretical hope is that one could use some clever algorithm to automatically sort of aggregate “small-scale” understanding (like an autoregressive convolutional model to predict next time given previous time) into “large-scale” understanding (being able to understand how a system could act extreme, by learning how it acts normally). But I’ve studied a bunch of different approaches for that, and ultimately it doesn’t really seem feasible. (Typically the small-scale understanding learned is only going to be valid near the regime that it was originally observed within, and also the methods to aggregate small-scale behavior into large-scale behavior either rely on excessively nice properties or basically require you to already know what the extreme behaviors would be.)
If durability and strength are helpful, then {intelligence, understanding, consequentialism} can discover that durability and strength are helpful, and then build durability and strength.
Even if “the exact ways in which durability and strength will be helpful” does not constitute a learnable pattern, “durability and strength will be helpful” is nevertheless a (higher-level) learnable pattern.
First, I want to emphasize that durability and strength are near the furthest towards the easy side because e.g. durability is a common property seen in a lot of objects, and the benefits of durability can be seen relatively immediately and reasoned about locally. I brought them up to dispute the notion that we are guaranteed a sufficiently homogenous environment because otherwise intelligence couldn’t develop.
Another complication is, you gotta consider that e.g. being cheap is also frequently useful, especially in the sort of helpful/assistant-based role that current AIs typically take. This trades off against agency because profit-maximizing companies don’t want money tied up into durability or strength that you’re not typically using. (People, meanwhile, might want durability or strength because they find it cool, sexy or excellent—and as a consequence, those people would then gain more agency.)
Also, I do get the impression you are overestimating the feasibility of ““durability and strength will be helpful” is nevertheless a (higher-level) learnable pattern”. I can see some methods where maybe this would be robustly learnable, and I can see some regimes where even current methods would learn it, but considering its simplicity, it’s relatively far from falling naturally out of the methods.
One complication here is, currently AI is ~never designing mechanical things, which makes it somewhat harder to talk about.
(I should maybe write more but it’s past midnight and also I guess I wonder how you’d respond to this.)
If there’s some big object, then it’s quite possible for it to diminish into a large number of similar obstacles, and I’d agree this is where most obstacles come from, to the point where it seems reasonable to say that intelligence can handle almost all obstacles.
However, my assertion wasn’t that intelligence cannot handle almost all obstacles, it was that consequentialism can’t convert intelligence into powerful agency. It’s enough for there to be rare powerful obstacles in order for this to fail.
“Stupidly obstinate” is a root-cause analysis of obstinate behavior. Like an alternative root cause might be conflict, for instance.
At first glance, your linked document seems to match this. The herald who calls the printer “pig-headed” does so in direct connection with calling him “dull”, which at least in modern terms would be considered a way of calling him stupid? Or maybe I’m missing some of the nuances due to not knowing the older terms/not reading your entire document?
“Pigheaded” is not a description of behavior, it’s a proposed root cause analysis. The idea is that pigs are dumb so if someone has a head (brain) like a pig, they might do dumb things.
It seems like you are trying to convince me that intelligence exists, which is obviously true and many of my comments rely on it. My position is simply that consequentialism cannot convert intelligence into powerful agency, it can only use intelligence to bypass common obstacles.
I have repeatedly challenged and criticized Cremieux and he has never reacted with insults, over-the-top defensiveness or vitriol towards me.
(I have certainly heard concerning rumors about him, and I hope those responsible for the community do due diligence in investigating them. But this post feels kind of libelous, like an attempt to assassinate someone’s character to suppress discourse about race. People who think LessOnline shouldn’t invite racists could address this concern by explaining in more detail what racism is/why it’s so terrible and why racist fallacies should be so uncomfortable that one cannot go there, instead of just something that receives a quick rebuttal.)