What are real numbers then? On the standard account, real numbers are equivalence classes of sequences of rationals, the finite diagonals being one such sequence. I mean, “Real numbers don’t exist” is one way to avoid the diagonal argument, but I don’t thinks that’s what cubefox is going for.
Bunthut
The society’s stance towards crime- preventing it via the threat of punishment- is not what would work on smarter people
This is one of two claims here that I’m not convinced by. Informal disproof: If you are a smart individual in todays society, you shouldn’t ignore threats of punishment, because it is in the states interest to follow through anyway, pour encourager les autres. If crime prevention is in peoples interest, intelligence monotonicity implies that a smart population should be able to make punishment work at least this well. Now I don’t trust intelligence monotonicity, but I don’t trust it’s negation either.
The second one is:
You can already foresee the part where you’re going to be asked to play this game for longer, until fewer offers get rejected, as people learn to converge on a shared idea of what is fair.
Should you update your idea of fairness if you get rejected often? It’s not clear to me that that doesn’t make you exploitable again. And I think this is very important to your claim about not burning utility: In the case of the ultimatum game, Eliezers strategy burns very little over a reasonable-seeming range of fairness ideals, but in the complex, high-dimensional action spaces of the real world, it could easily be almost as bad as never giving in, if there’s no updating.
Maybe I’m missing something, but it seems to me that all of this is straightforwardly justified through simple selfish pareto-improvements.
Take a look at Critchs cake-splitting example in section 3.5. Now imagine varying the utility of splitting. How high does it need to get, before [red->Alice;green->Bob] is no longer a pareto improvement over [(split)] from both player’s selfish perspective before the observation? It’s 27, and thats also exactly where the decision flips when weighing Alice 0.9 and Bob 0.1 in red, and Alice 0.1 and Bob 0.9 in green.
Intuitively, I would say that the reason you don’t bet influence all-or-nothing, or with some other strategy, is precisely because influence is not money. Influence can already be all-or-nothing all by itself, if one player never cares that much more than the other. The influence the “losing” bettor retains in the world where he lost is not some kind of direct benefit to him, the way money would be: it functions instead as a reminder of how bad a treatment he was willing to risk in the unlikely world, and that is of course proportional to how unlikely he thought it is.
So I think all this complicated strategizing you envision in influence betting, actually just comes out exactly to Critches results. Its true that there are many situations where this leads to influence bets that don’t matter to the outcome, but they also don’t hurt. The theorem only says that actions must be describable as following a certain policy, it doesn’t exclude that they can be described by other policies as well.
The timescale for improvement is dreadfully long and the day-to-day changes are imperceptible.
This sounded wrong, but I guess is technically true? I had great in-session improvements as I’m warming up the area and getting into it, and the difference between a session where I missed the previous day, and one where I didn’t, is absolutely preceptible. Now after that initial boost, it’s true that I couldn’t tell if the “high point” was improving day to day, but that was never a concern—the above was enough to give me confidence. Plus with your external rotations, was there not perceptible strength improvement week to week?
So I’ve reread your section on this, and I think I follow that, but its arguing a different claim. In the post, you argue that a trader that correctly identifies a fixed point, but doesn’t have enough weight to get it played, might not profit from this knowledge. That I agree with.
But now you’re saying that even if you do play the new fixed point, that trader still won’t gain?
I’m not really calling this a proof because it’s so basic that something else must have gone wrong, but:
has a fixed point at , and doesn’t. Then . So if you decide to play , then predicts , which is wrong, and gets punished. By continuity, this is also true in some neighborhood around p. So if you’ve explored your way close enough, you win.
On reflection, I didn’t quite understand this exploration business, but I think I can save a lot of it.
>You can do exploration, but the problem is that (unless you explore into non-fixed-point regions, violating epistemic constraints) your exploration can never confirm the existence of a fixed point which you didn’t previously believe in.
I think the key here is in the word “confirm”. Its true that unless you believe p is a fixed point, you can’t just try out p and see the result. However, you can change your beliefs about p based on your results from exploring things other than p. (This is why I call the thing I’m objecting to humean trolling.) And there is good reason to think that the available fixed points are usually pretty dense in the space. For example, outside of the rule that binarizes our actions, there should usually be at least one fixed point for every possible action. Plus, as you explore, your beliefs change, creating new believed-fixed-points for you to explore.
>I think your idea for how to find repulsive fixed-points could work if there’s a trader who can guess the location of the repulsive point exactly rather than approximately
I don’t think thats needed. If my net beliefs have a closed surface in propability space on which they push outward, then necessarily those beliefs have a repulsive fixed point somewhere in that surface. I can then explore that believed fixed point. Then if its not a true fixed point, and I still believe in the closed surface, theres a new fixed point in that surface that I can again explore (generally more in the direction I just got pushed away from). This should converge on a true fixed point. The only thing that can go wrong is that I stop believing in the closed surface, and it seems like I should leave open that possibility—and even then, I might believe in it again after I do some checking along the outside.
>However, the wealth of that trader will act like a martingale; there’s no reliable profit to be made (even on average) by enforcing this fixed point.
This I don’t understand at all. If you’re in a certain fixed point, shouldn’t the traders that believe in it profit from the ones that don’t?
I don’t think the learnability issues are really a problem. I mean, if doing a handstand with a burning 100 riyal bill between your toes under the full moon is an exception to all physical laws and actually creates utopia immediately, I’ll never find out either. Assuming you agree that that’s not a problem, why is the scenario you illustrate? In both cases, it’s not like you can’t find out, you just don’t, because you stick to what you believe is the optimal action.
I don’t think this would be a significant problem in practice any more than other kinds of humean trolling are. It always seems much more scary in these extremely barebones toy problems, where the connection between the causes and effects we create really are kind of arbitrary. I especially don’t think it will be possible to learn the couterfactuals of FDTish cooperation and such in these small settings, no matter the method.
Plus you can still do value-of-information exploration. The repulsive fixed points are not that hard to find if you’re looking for them. If you’ve encircled one and found repulsion all around the edge, you know there must be one in there, and can get there with a procedure that just reverses your usual steps. Combining this with simplicity priors over a larger setting into which the problem is integrated, I don’t think its any more worrying than the handstand thing.
That prediction may be true. My argument is that “I know this by introspection” (or, introspection-and-generalization-to-others) is insufficient. For a concrete example, consider your 5-year-old self. I remember some pretty definite beliefs I had about my future self that turned out wrong, and if I ask myself how aligned I am with it I don’t even know how to answer, he just seems way too confused and incoherent.
I think it’s also not absurd that you do have perfect caring in the sense relevant to the argument. This does not require that you don’t make mistakes currently. If you can, with increasing intelligence/information, correct yourself, then the pointer is perfect in the relevant sense. “Caring about the values of person X” is relatively simple and may come out of evolution whereas “those values directly” may not.
This prediction seems flatly wrong: I wouldn’t bring about an outcome like that. Why do I believe that? Because I have reasonably high-fidelity access to my own policy, via imagining myself in the relevant situations.
This seems like you’re confusing two things here, because the thing you would want is not knowable by introspection. What I think you’re introspecting is that if you’d noticed that the-thing-you-pursued-so-far was different from what your brother actually wants, you’d do what he actually wants. But the-thing-you-pursued-so-far doesn’t play the role of “your utility function” in the goodhart argument. All of you plays into that. If the goodharting were to play out, your detector for differences between the-thing-you-pursued-so-far and what-your-brother-actually-wants would simply fail to warn you that it was happening, because it too can only use a proxy measure for the real thing.
The idea is that we can break any decision problem down by cases (like “insofar as the predictor is accurate, …” and “insofar as the predictor is inaccurate, …”) and that all the competing decision theories (CDT, EDT, LDT) agree about how to aggregate cases.
Doesn’t this also require that all the decision theories agree that the conditioning fact is independent of your decision?
Otherwise you could break down the normal prisoners dilemma into “insofar as the opponent makes the same move as me” and “insofar as the opponent makes the opposite move” and conclude that defect isn’t the dominant strategy even there, not even under CDT.
And I imagine the within-CDT perspective would reject an independent probability for the predictors accuracy. After all, theres an independent probability it guessed 1-box, and if I 1-box it’s right with that probability, and if I 2-box it’s right with 1 minus that probability.
Would a decision theory like this count as “giving up on probabilities” in the sense in which you mean it here?
I think your assessments of whats psychologically realistic are off.
I do not know what it feels like from the inside to feel like a pronoun is attached to something in your head much more firmly than “doesn’t look like an Oliver” is attached to something in your head.
I think before writing that, Yud imagined calling [unambiguously gendered friend] either pronoun, and asked himself if it felt wrong, and found that it didn’t. This seems realistic to me: I’ve experienced my emotional introspection becoming blank on topics I’ve put a lot of thinking into. This doesn’t prevent doing the same automatic actions you always did, or knowing what those would be in a given situation. If something like this happened to him for gender long enough ago, he may well not be able to imagine otherwise.
But the “everyone present knew what I was doing was being a jerk” characterization seems to agree that the motivation was joking/trolling. How did everyone present know? Because it’s absurd to infer a particular name from someone’s appearance.
It’s unreasonable, but it seems totally plausible that on one occasion you would feel like you know someone has a certain name, and continue feeling that way even after being rationally convinced you’re wrong. That there are many names only means that the odds of any particular name featuring in such a situation is low, not that the class as a whole has low odds, and I don’t see why the prior for that would be lower than for e.g. mistaken deja vu experiences.
I don’t think the analogy to biological brains is quite as strong. For example, biological brains need to be “robust” not only to variations in the input, but also in a literal sense, to forceful impact or to parasites trying to control it. It intentionally has very bad suppressability, and this means there needs to be a lot of redundancy, which makes “just stick an electrode in that area” work. More generally, it is under many constraints that a ML system isn’t, probably too many for us to think of, and it generally prioritizes safety over performance. Both lead away from the sort of maximally efficient compression that makes ML systems hard to interpret.
Analogously: Imagine a programmer would write the shortest program that does a given task. That would be terrible. It would be impossible to change anything without essentially redesigning everything, and trying to understand what it does just from reading the code would be very hard, and giving a compressed explanation of how it does that would be impossible. In practice, we don’t write code like that, because we face constraints like those mentioned above—but its very easy to imagine that some optimization-based “automatic coder” would program like that. Indeed, on the occasion that we need to really optimize runtimes, we move in that direction ourselves.
So I don’t think brains tell us much about the interpretability of the standard, highly optimized neural nets.
Probably way too old here, but I had multible experiences relevant to the thread.
Once I had a dream and then, in the dream, I remembered I had dreamt this exact thing before, and wondered if I was dreaming now, and everything looked so real and vivid that I concluded I was not.
I can create a kind of half-dream, where I see random images and moving sequences at most 3 seconds or so long, in succession. I am really dimmed but not sleeping, and I am aware in the back of my head that they are only schematic and vague.
I would say the backstories in dreams are different in that they can be clearly nonsensical. E.g. I hold and look at a glass relief, there is no movement at all, and I know it to be a movie. I know nothing of its content, and I dont believe the image of the relief to be in the movie.
I think its still possible to have a scenario like this. Lets say each trader would buy or sell a certain amount when the price is below/above what they think it to be, but the transition being very steep instead of instant. Then you could still have long price intervalls where the amounts bought and sold remain constant, and then every point in there could be the market price.
I’m not sure if this is significant. I see no reason to set the traders up this way other than the result in the particular scenario that kicked this off, and adding traders who don’t follow this pattern breaks it. Still, its a bit worrying that trading strategies seem to matter in addition to beliefs, because what do they represent? A traders initial wealth is supposed to be our confidence in its heuristics—but if a trader is mathematical heuristics and trading strategy packaged, then what does confidence in the trading strategy mean epistemically? Two things to think about:
Is it possible to consistently define the set of traders with the same beliefs as trader X?
It seems that logical induction is using a trick, where it avoids inconsistent discrete traders, but includes an infinite sequence of continuous traders with ever steeper transitions to get some of the effects. This could lead to unexpected differences between behaviour “at all finite steps” vs “at the limit”. What can we say about logical induction if trading strategies need to be lipschitz-continuous with a shared upper limit on the lipschitz constant?
So I’m not sure what’s going on with my mental sim. Maybe I just have a super-broad ‘crypto-moral detector’ that goes off way more often than yours (w/o explicitly labeling things as crypto-moral for me).
Maybe. How were your intuitions before you encountered LW? If you already had a hypocrisy intuition, then trying to internalize the rationalist perspective might have lead it to ignore the morality-distinction.
My father playing golf with me today, telling me to lean down more to stop them going out left so much.
I don’t strongly relate to any of these descriptions. I can say that I don’t feel like I have to pretend advice from equals is more helpful than it is, which I suppose means its not face. The most common way to reject advice is a comment like “eh, whatever” and ignoring it. Some nerds get really mad at this and seem to demand intellectual debate. This is not well received. Most people give advice with the expectation of intellectual debate only on crypto-moral topics (this is also not well received generally, but the speaker seems to accept that as an “identity cost”), or not at all.
You mean advice to diet, or “technical” advice once its established that person wants to diet? I don’t have experience with either, but the first is definitely crypto-moral.
I think the solution to this is to add something to your wealth to account for inalienable human capital, and count costs only by how much you will actually be forced to pay. This is a good idea in general; else most people with student loans or a mortage are “in the red”, and couldnt use this at all.