FWIW there is a theory that there is a cycle of language change, though it seems maybe there is not a lot of evidence for the isolating → agglutinating step. IIRC the idea is something like that if you have a “simple” (isolating) language that uses helper words instead of morphology eventually those words can lose their independent meaning and get smushed together with the word they are modifying.
simon
Also, when doing a study, please write down afterwards whether you used intention to treat or not.
Example: I encountered a study that says post meal glucose levels depend on order in which different parts of the meal were consumed. But the study doesn’t say whether every participant consumed the entire meal, and if not, how that was handled when processing the data. Without knowing if everyone consumed everything, I don’t know if the differences in blood glucose were caused by the change in order, or by some participants not consuming some of the more glucose-spiking meal components.
In that case, intention to treat (if used) makes the result of the study less interesting since it provides another effect that might “explain away” the headline effect.
Issues with the dutch book beyond the marginal value of money:
It’s not as clear as it should that the LLM IQ loss question is talking about a permanent loss (I may have read it as temporary when answering)
Although the LLM IQ drop question does say “your IQ” there’s an assumption that that sort of thing is a statistical average—and I think the way I use LLMs, for example, is much less likely to drop my IQ than the average person’s usage.
I think is that the LessWrong subscription question is implictly asking about the marginal value of LessWrong given the existence of other resources while the relative LessWrong/LLM value question is implicitly leaning more towards non-marginal value obtained, which might be very many times more
impact: these issues increase LLM/IQ and (Lesswrong/LLM relative to LessWrong/$), which cause errors in the same direction in the LLM/IQ/$/Lesswrong/LLM cycle, potentially by a very large multiplier.
Marginal value due to the high IQ gain of 5 lowers $/IQ which increases IQ/$. This also acts in the same direction.
(That’s my excuse anyway. I suspected the cycle when answering and was fairly confident, without actually checking, that I was going to be way off from a “consistent” value. I gave my excuse as a comment in the survey itself that I was being hasty, but on reflection I still endorse an “inconsistent” result here, modulo the fact that I likely misread at least one question).
Control theory I think often tends to assume that you are dealing with continuous variables. Which I think the relevant properties of AIs are likely (in practice) not—even if the underlying implementation uses continuous math RSI will make finite changes and even small changes could cause large differences in results.
Also, the dynamics here are likely to depend on capability thresholds which could cause trend extrapolation to be highly misleading.
Also, note that RSI could create a feedback loop which could enhance agency including towards nonaligned goals (agentic AI convergently wants to enhance its own agency).
Also beware that agency increases may cause increases in apparent capability because of Agency Overhang.
The AI system accepts all previous feedback, but it may or may not trust anticipated future feedback. In particular, it should be trained not to trust feedback it would get by manipulating humans (so that it doesn’t see itself as having an incentive to manipulate humans to give specific sorts of feedback).
I will call this property of feedback “legitimacy”. The AI has a notion of when feedback is legitimate, and it needs to work to keep feedback legitimate (by not manipulating the human).
Legitimacy is good—but if an AI that’s supposed to be intent-aligned to the user would find that it has an “incentive” to purposefully manipulate the user in order to get particular feedback from the user, unless it pretends that it would ignore that feedback, it’s already misaligned and that misalignment should be dealt with directly IMO—this feels to me like a band-aid over a much more serious problem.
Luke ignited the lightsaber Obi-Wan stole from Vader.
This temporarily confused me until I realized
it was not talking about the lightsaber Vader was using here, but about the one that Obi-Wan took from him in the Revenge of the Sith and gave to Luke near the start of A New Hope.
We may thus rule out negative effects larger than
0.14 standard deviations in cognitive ability if fluoride is increased by
1 milligram/liter (the level often considered when artificially fluoridat-
ing the water).That’s a high level of hypothetical harm that they are ruling out (~2 IQ points?). I would take the dental harms many times over to avoid that much cognitive ability loss.
actually, there are ~100 rows in the dataset where Room2=4, Room6=8, and Room3=5=7.
I actually did look at that (at least some subset with that property) at some point, though I didn’t (think of/ get around to) re-looking at it with my later understanding.
In general, I think this is a realistic thing to occur: ‘other intelligent people optimizing around this data’ is one of the things that causes the most complicated things to happen in real-world data as well.
Indeed, I am not complaining! It was a good, fair difficulty to deal with.
That being said, there was one aspect I did feel was probably more complicated than ideal, and that was the combination of the tier-dependent alerting with the tiers not having any other relevance than this one aspect. That is, if the alerting had in each case been simply dependent on whether the adventurers were coming from an empty room or not, it would have been a lot simpler to work out. And if there was tier dependent alerting, but the tiers were more obvious in other ways*, it would still have been tricky but at least there would be a path to recognize the tiers and then try to figure out other ways that they might have relevance. The way it was it seemed to me you pretty much had to look at what were (ex ante) almost arbitrary combinations of (current encounter, next encounter) to figure that aspect out, unless you actually guessed the rationale of the alerting effect.
That might be me rationalizing my failure to figure it out though!
* e.g. perhaps the traps/golems could have had the same score as the same-tier nontrap encounter when alerted (or alternatively when not alerted)
The biggest problem about AIXI in my view is the reward system - it cares about the future directly, whereas to have any reasonable hope of alignment an AI in my view needs to care about the future only via what humans would want about the future (so that any reference to the future is encapsulated in the “what do humans want?” aspect).
I.e. the question it needs to be answering is something like “all things considered (including the consequences of my current action on the future, as well as taking into account my possible future actions) what would humans, as they exist now, want me to do at the present moment?”
Now maybe you can take that question and try to slice it up into rewards at particular timesteps, which change over time as what is known about what humans want changes, without introducing corrigibility issues, but the AIXI reward framework isn’t really buying you anything imo even if that works, relative to directly trying to get an AI to solve the question.
On the other hand approximating Solomonoff induction might afaik be a fruitful approach, though the approximations are going to have to be very aggressive for practical performance. I do agree embeddding/self-reference can probably be patched in.
I think that it’s likely to take longer than 10000 years, simply because of the logistics (not the technology development, which the AI could do fast).
The gravitational binding energy of the sun is something on the order of 20 million years worth of its energy output. OK, half of the needed energy is already present as thermal energy, and you don’t need to move every atom to infinity, but you still need a substantial fraction of that. And while you could perhaps generate many times more energy than the solar output by various means, I’d guess you’d have to deal with inefficiencies and lots of waste heat if you try to do it really fast. Maybe if you’re smart enough you can make going fast work well enough to be worth it though?
I feel like a big part of what tripped me up here was an inevitable part of the difficulty of the scenario that in retrospect should have been obvious. Specifically, if there is any variation in difficulty of an encounter that is known to the adventurers in advance, the score contribution of an encounter type in actual paths taken is less than the difficulty of the encounter as estimated by what best predicts the path taken (because the adventurer takes the path when it’s weak, but avoids when it’s strong).
So, I wound up with an epicycle saying hags and orcs were avoided more than their actual scores warranted, because that effect was most significant for them (goblins are chosen over most other encounters even if alerted, and Dragons mostly aren’t alerted).
This effect was made much worse by the fact that I was getting scores mainly from lower difficulty dungeons, with lots of “Nothing” rooms and low level encounters. But even once I estimated scores from the overall data with my best guesses for preference order, the issue still applied, just not quite so badly.
In the “what if” department, I had said:
> I’m also getting remarkably higher numbers for Hag compared with my earlier method. But I don’t immediately see a way to profitably exploit this.The most obvious way to exploit this would have been the optimal solution. Why didn’t I do it? The answer is that, as indicated above, I was still underestimating the hag (whereas at this point I had mostly-accurate scores for the traps and orcs). With my underestimate for the hag’s score contribution, I didn’t think it was worth giving up an orc-boulder trap difference to get a hag-orc difference. I also didn’t realize I needed the hag to alert the dragon.
In general, I feel like I was pretty far along with discovering the mechanics despite some missteps. I correctly had the adventurers taking a 5-encounter path with right/down steps, the choice of next step being based on the encounters in the choices for the next room, with an alerting mechanism, and that the alerting mechanism didn’t apply to traps and golems.
On the other hand, I applied the alerting mechanism only to score and not to preference order, except for goblins and orcs (why didn’t I try to apply it to preference order for other encounters once I realized it applied to preference order for goblins and orcs and that some degree of alerting mechanism score effect applied to other encounters ?????) (I also got confused into thinking that the effect on orc preference order only applied if the current encounter was also orcs). I also didn’t realize that the alerting mechanism had different sensitivity for different encounters, and I had my mistaken belief about the preference order being different from expected score for some encounter types (hey, the text played up how unnerving the hag was, there was some plausibility there!).
I think if I had gotten to where I was in my last edit early on in the time frame for this scenario instead of near the end, and had posted it, and other people had read it and tried it out, collectively we would have had a good chance of solving the whole thing. I also would have been much more likely to get the optimal solution if I had paid more attention to what abstractapplic said, instead of only very briefly glancing over his comments after posting my very belated comment and going back to doing my own thing.
In my view, a fun, challenging and theoretically solvable scenario (even if actually not that close to being solved in practice), so I think it was quite good.
Looking like I’ll not have figured this out before the time limit despite the extra time, what I have so far:
I’m modeling this as follows, but haven’t fully worked out and am getting complications/hard to explain dungeons that suggest that it might not be exactly correct
the adventurers go through the dungeons using rightwards and downwards moves only, thus going through 5 rooms in total.
at each room they choose the next room based on a preference order (which I am assuming is deterministic, but possibly dependent on, e.g. what the current room is)
the score is dependent only on the rooms they pass through (but again, am getting complications)
I’m assuming a simple addition of scores to start with, but then adding epicycles (which so far have been based on the previous room, generally)
there is some randomness in the individual score contributions from each encounter.
For the dungeon generation: dungeon generation seems to treat rooms 1-8 equally (room 9 is different and tends to have harder encounters). Encounters of the same types (and some related “themes”) tend to be correlated. Scores in each tournament seem to be whole numbers from each judge and averaged between 3 or 4 judges; I am not sure if any tournaments are judged by 2 or 1, but if so they’re relatively less common.
In theory, I’d like to plug in a preference model and a score model to a simulator and iterate to refine, but I’m not there yet, still working out plausible scores and preferences.
One possibility for the scores and preference order:
baseline average scores:
Nothing: 0; Goblins: 1.5 (1d2?); Whirling Blade Trap 3; Orcs 3; Hag 4; Boulder Trap 4.5; Clay Golem 6, Dragon 6?, Steel Golem 7.5 (edit: <--- numbers estimated with small, atypical samples (included many Nothing, which is problematic for reasons that become obvious with below edit))
With Goblins and Orcs being increased (doubled?) if following goblins/orcs/any trap? (edit—or golems?) (edit—looking now like it’s probably anything but an empty room?)
Plus with the adventurers seemingly avoiding Orcs and Hags more than their difficulty warrants? (I found them to be relatively late in the preference order, then found that they were in practice lower in score, so am having to ad hoc adjust if I keep the assumption that the score contribution and prefrence order are related. 1.5 multiplier? 2x multiplier? fixed addition?) (I’m assuming a 1.5x multiplier atm since I initially had Hag avoided over anything but orcs, but found one dungeon that looks suspiciously like, but does not prove, Hag being chosen over Dragon (edit: see below for update)) (I suppose +2 would also work) (edit—it looks like the Orc difficulty increase for following a non-empty room only applies to adventurer preference if the current room is also Orcs—violating the assumption that preference is tied to expected difficulty. But for Goblins it seems the preference may indeed depend only on following a non-empty room, though in practice it doesn’t matter much since it only affects order wrt WBT).
(edit—see update to preference order below)
Assuming the above is correct, and I’m pretty sure it isn’t but hopefully has some relationship with reality, one strategy might be:
CHN/WON/BOD <---obsolete answer
where the idea is to use the encounters the adventurers avoid too much relative to their actual score contributions (Hag, Orcs) to herd the adventurers away from the Nothing rooms. One of the Orcs is left in after a Boulder Trap in the belief that will make it score higher than the hag. WBT is left in the preferred path to lead the adventurers along, don’t immediately see a way to avoid this.
EV if above model is correct: 6+3+4.5+6+6=25.5
How I’ve gotten here (mainly used Claude and Claude-written code, including the analysis tool which is good for prototyping if you don’t mind javascript):
found initial basic encounter score contribution estimates from linear regression on whole dungeon
after determining that rooms 1-8 were interchangeable as far as dungeon generation is concerned, looked at room importance to score, guessed the basic model based on that iirc (might have been more complicated than this) (I do remember considering and rejecting a model where each room is selected one at a time from the full set of available rooms, and rejecting any “symmetrical” model based on working out the full path in advance)
initially assumed that adventurers preferred easier encounters based on the inital score estimates
refined preference order based on minimizing variance between same-predicted-sequence-of-encounters dungeons
tried to work out how scores actually work by filtering for specific predicted sequences of encounters and finding their scores
found epicycles from that and started refining model, including preference order adjustments
haven’t really finished the above step, epicycles might be because model is wrong/incomplete?
hypothetical todo: apply model to entire dataset, also develop model for variations in score from each encounter, compare to known 3-judge and 4-judge tournaments for full Bayes assessment, refine further with this as feedback
edit: I’ve now read other people’s comments; I did not notice any 1-point jump in scores (didn’t check for it), not sure if i would have noticed if it is a judging difference as opposed to a strategy change? (wouldn’t notice if just strategy change). Also I did not notice anything special about Steel Golems at the entrance vs. other spots, did not check for any change in distribution of 3 vs 4 judge tournaments, etc.
further analysis after the above:
I’ve looked at root mean square deviation of predictions from the data for the full dataset (full Bayes seems a bit intimidating to code atm even with AI help). From this it seems the preference order is (there remains a likely possibility for more complications I haven’t checked):
Nothing > Goblins (current encounter null or Nothing) > Goblins (otherwise) = Whirling Blade Trap > Boulder Trap = Clay Golem = Orcs (current encounter not Orcs) > Dragon > Steel Golem >= Orcs (current encounter Orcs) > HagNothing > Goblins (current encounter null or Nothing) > Goblins (otherwise) = Whirling Blade Trap > Boulder Trap > Clay Golem = Orcs (current encounter not Orcs) > Dragon > Orcs (current encounter Orcs) > Hag = Steel Golemwhere I can’t distinguish between Steel Golem being preferred or equal to Orcs with current encounter being Orcs.Soo, if Orcs are avoided equally to a Boulder Trap if the current encounter is not Orcs, I need to improve the herding.But also it seems Orcs get doubled by many other encounter types? This could work:
CHN/OBN/WOD <---- current solutionPredicted value is now 6+6+3+6+6=27.
further edit: also refining the scores, getting probably nonsense (due to missing some dependcy of some stuff on something else, probably), but it’s looking like maybe every encounter’s score depends on whether the previous encounter was Nothing/null. Except traps/golems? Which would explain why Steel Golems are being reported as better in the first slot.
I’m also getting remarkably higher numbers for Hag compared with my earlier method. But I don’t immediately see a way to profitably exploit this.
I feel like this discussion could do with some disambiguation of what “VNM rationality” means.
VNM assumes consequentialism. If you define consequentialism narrowly, this has specific results in terms of instrumental convergence.
You can redefine what constitutes a consequence arbitrarily. But, along the lines of what Steven Byrnes points out in his comment, redefining this can get rid of instrumental convergence. In the extreme case you can define a utility function for literally any pattern of behaviour.When you say you feel like you can’t be dutch booked, you are at least implicitly assuming some definition of consequences you can’t be dutch booked in terms of. To claim that one is rationally required to adopt any particular definition of consequences in your utility function is basically circular, since you only care about being dutch booked according to it if you actually care about that definition of consequences. It’s in this sense that the VNM theorem is trivial.
BTW I am concerned that self-modifying AIs may self-modify towards VNM-0 agents.
But the reason is not because such self modification is “rational”.
It’s just that (narrowly defined) consequentialist agents care about preserving and improving their abilities to and proclivities to pursue their consequentialist goals, so tendencies towards VNM-0 will be reinforced in a feedback loop. Likewise for inter-agent competition.
You can also disambiguate between
a) computation that actually interacts in a comprehensible way with the real world and
b) computation that has the same internal structure at least momentarily but doesn’t interact meaningfully with the real world.
I expect that (a) can usually be uniquely pinned down to a specific computation (probably in both senses (1) and (2)), while (b) can’t.
But I also think it’s possible that the interactions, while important for establishing the disambiguated computation that we interact with, are not actually crucial to internal experience, so that the multiple possible computations of type (b) may also be associated with internal experiences—similar to Boltzmann brains.
(I think I got this idea from “Good and Real” by Gary L. Drescher. See sections “2.3 The Problematic Arbitrariness of Representation” and “7.2.3 Consciousness and Subjunctive Reciprocity”)
The interpreter, if it would exist, would have complexity. The useless unconnected calculation in the waterfall/rock, which could be but isn’t usually interpreted, also has complexity.
Your/Aaronson’s claim is that only the fully connected, sensibly interacting calculation matters. I agree that this calculation is important—it’s the only type we should probably consider from a moral standpoint, for example. And the complexity of that calculation certainly seems to be located in the interpreter, not in the rock/waterfall.
But in order to claim that only the externally connected calculation has conscious experience, we would need to have it be the case that these connections are essential to the internal conscious experience even in the “normal” case—and that to me is a strange claim! I find it more natural to assume that there are many internal experiences, but only some interact with the world in a sensible way.
But this just depends on how broad this set is. If it contains two brains, one thinking about the roman empire and one eating a sandwich, we’re stuck.
I suspect that if you do actually follow Aaronson (as linked by Davidmanheim) to extract a unique efficient calculation that interacts with the external world in a sensible way, that unique efficient externally-interacting calculation will end up corresponding to a consistent set of experiences, even if it could still correspond to simulations of different real-world phenomena.
But I also don’t think that consistent set of experiences necessarily has to be a single experience! It could be multiple experiences unaware of each other, for example.
The argument presented by Aaronson is that, since it would take as much computation to convert the rock/waterfall computation into a usable computation as it would be to just do the usable computation directly, the rock/waterfall isn’t really doing the computation.
I find this argument unconvincing, as we are talking about a possible internal property here, and not about the external relation with the rest of the world (which we already agree is useless).
(edit: whoops missed an ‘un’ in “unconvincing”)
Considering all the layers of convention and interpretation between the physics of a processor and the process it represents, it seems unlikely to me that the alien would be able to describe the simulacra. The alien is therefore unable to specify the experience being created by the cluster.
I don’t think this follows. Perhaps the same calculation could simulate different real world phenomena, but it doesn’t follow that the subjective experiences are different in each case.
If computation is this arbitrary, we have the flexibility to interpret any physical system, be it a wall, a rock, or a bag of popcorn, as implementing any program. And any program means any experience. All objects are experiencing everything everywhere all at once.
Afaik this might be true. We have no way of finding out whether the rock does or does not have conscious experience. The relevant experiences to us are those that are connected to the ability to communicate or interact with the environment, such as the experiences associated with the global workspace in human brains (which seems to control memory/communication); experiences that may be associated with other neural impulses, or with fluid dynamics in the blood vessels or whatever, don’t affect anything.
Could both of them be right? No—from your point of view, at least one of them must be wrong. There is one correct answer, the experience you are having.
This also does not follow. Both experiences could happen in the same brain. You—being experience A—may not be aware of experience B—but that does not mean that experience B does not exist.
(edited to merge in other comments which I then deleted)
It is a fact about the balls that one ball is physically continuous with the ball previously labeled as mine, while the other is not. It is a fact about our views on the balls that we therefore label that ball, which is physically continuous, as mine and the other not.
And then suppose that one of these two balls is randomly selected and placed in a bag, with another identical ball. Now, to the best of your knowledge there is 50% probability that your ball is in the bag. And if a random ball is selected from the bag, there is 25% chance that it’s yours.
So as a result of such manipulations there are three identical balls and one has 50% chance to be yours, while the other two have 25% chance to be yours. Is it a paradox? Oh course not. So why does it suddenly become a paradox when we are talking about copies of humans?
It is objectively the case here that 25% of the time this procedure would select the ball that is physically continuous with the ball originally labeled as “mine”, and that we therefore label as “mine”.
Ownership as discussed above has a relevant correlate in reality—physical continuity in this case. But a statement like “I will experience being copy B (as opposed to copy A or C)” does not. That statement corresponds to the exact same reality as the corresponding statements about experiencing being copy A or C. Unlike in the balls case, here the only difference between those statements is where we put the label of what is “me”.
In the identity thought experiment, it is still objectively the case that copies B and C are formed by splitting an intermediate copy, which was formed along with copy A by splitting the original.
You can choose to disvalue copies B and C based on that fact or not. This choice is a matter of values, and is inherently arbitrary.
By choosing not to disvalue copies B and C, I am not making an additional assumption—at least not one that you are already making by valuing B and C the same as each other. I am simply not counting the technical details of the splitting order as relevant to my values.
Thanks aphyer. Solution:
Number of unique optimal assignments (up to reordering) (according to AI-written optimizer implementing my manually found tax calculation): 1
Minimum total tax: 212 (err thats 21gp 2 sp)
Solution 1:
Member 1: C=1, D=1, L=0, U=0, Z=4, Tax=0
Member 2: C=1, D=1, L=0, U=1, Z=1, Tax=0
Member 3: C=1, D=1, L=0, U=1, Z=1, Tax=0
Member 4: C=1, D=1, L=5, U=5, Z=2, Tax=212
Tax calculation:
1. Add up the base values: 6 for C, 14 for D, 10 for L, 7 for U, 2 for Z
2. If only L and Z, just take the total base and exit.
3. Otherwise, set a tier as follows: tier 0 if 0 < base < 30, tier 1 if 30 ⇐ base < 60, tier 2 if 60 ⇐ base < 100, tier 3 if base >= 100.
4. If U >= 5, then use the max tier regardless (but a 2x discount is also triggered later).
5. If D >=2, then increase the base as if the extra dragons beyond 1 are doubled. This doesn’t change the tier.
6. multiply the base value by tier + 2
7. If U >= 5, divide by 2
8. Discount by 60*Ceiling(C/2) (can’t go below 0)
Rounding is needed if U is an odd number >=5. To determine if you round up or down, add up the numbers for C,L and Z, add to (U-1)/2, plus 1 if there is at least one D. Then round down if that number is odd and up if it is even. (Presumably this arises in some more natural way in the actual calculation used, but this calculation gives 0 residuals, so...). Todo: find a more natural way for this to happen.
Post-hoc rationalization of optimal solution found: giving everyone a C helps get everyone an up to 60 tax credit. Spreading the D’s out also prevents D doubling. The C and D add up to a base value of 20. We can fit up to 9 more base value without going to the next tax bracket; this is done using 4Z (8 base value) or 1U 1Z (9 base value). The last member has to pay tax at the highest bracket but U=5 also gives the factor of 2 discount so it’s not so bad. They get everything they have to take in order to not push the others above 29 base, but no more.
This seemed relatively straightforward conceptually to solve (more a matter of tweaking implementation details like D doubling and so on - I expect many things are conceptualized differently in the actual calculation though). I went from the easiest parts (L and Z), then added U, then looked at D since it initially looked less intimidating than C, then switched to C, solved that, and finally D). It would have been much harder if it were not deterministic, or if there wasn’t data within simpler subsets (like L and Z only) for a starting point.
The ultimate solution found is 12 sp better than the best solutions available using only rows that actually occur in the data (also found using AI-written script).
Tools used: AI for writing scripts to manipulate CSV files and finding optimal solutions, etc. LibreOfficeCalc for actually looking at the data and figuring out the tax calculation.
Additional todo: look at the distributions of part numbers to try to find out how the dataset was generated.
edited to add: my optimizer used brute(ish) force, unlike DrJones’. It uses double meet-in-the-middle (split two ways, then split again) with memoization and symmetry reduction using lexicographical ordering (symmetry reduction and memoization was optimization added by AI after I complained to it about the time an initial version, also written by AI, was taking).
P. S. I used GPT-4.1 in Windsurf for the AI aspects. They’re running a promotion where it costs 0 credits until, IIRC, April 21.