Yeah I mean, I’m not claiming it has a big sense of obligation, only that it illustrates a condition where discourse seems to benefit from a sense of obligation.
tailcalled
Here’s an example of a cheap question I just asked on twitter. Maybe Richard Hanania will find it cheap to answer too, but part of the reason I asked it was because I expect him to find it difficult to answer.
If he can’t answer it, he will lose some status. That’s probably good—if his position in the OP is genuine and well-informed, he should be able to answer it. The question is sort of “calling his bluff”, checking that his implicitly promised reason actually exists.
Actually, we’ll reschedule to make it for the meetup.
I didn’t make it last time because my wife was coming home from a conference, and I probably can’t make it next time because of a vacation in Iceland, but I will most likely come the time after that.
I don’t have much experience with freedom of information requests, but I feel that when questions in online debates are hard to answer, it’s often because they implicitly highlight problems with the positions that have been forwarded. For all I know, it could work similarly with freedom of information requests.
Ok, so this sounds like it talks about cardinality in the sense of 1 or 3, rather than in the sense of 2. I guess I default to 2 because it’s more intuitive due to the transfer property, but maybe 1 or 3 are more desirable due to being mathematically richer.
Also the remark that hyperfinite can mean smaller than a nonstandard natural just seems false, where did you get that idea from?
When I look up the definition of hyperfinite, it’s usually defined as being in bijection with the hypernaturals up to a (sometimes either standard or nonstandard, but given the context of your OP I assumed you mean only nonstandard) natural . If the set is in bijection with the numbers up to , then it would seem to have cardinality less than [1].
- ^
Obviously this doesn’t hold for transfinite sizes, but we’re merely considering hyperfinite sizes, so it should hold there.
- ^
Hypernaturals are uncountable because they are bigger than all the nats and so can’t be counted.
This isn’t the condition for countability. For instance, consider the ordering of where when then . This ordering has bigger than all the nats, but it’s still countable because you have a bijection given by , .
Also countability of the hypernaturals is a subtle concept because of the transfer principle. If you start with some model of set theory with natural numbers and use an ultrafilter to extend it to a model with natural numbers , then you have three notions of countability of a set :
contains a bijection between and ,
contains a bijection between and (which is equal to ),
The ambient set theory contains a bijection between and .
Tautologically, the hypernaturals will be countable in the second sense, because it is simply seeking a bijection between the hypernaturals and themselves. I’m not sure whether they can be countable in the third sense, but if [1] then intuitively it seems to me that they won’t be countable in the third sense, but the naturals won’t be countable in the third sense either, so that doesn’t necessarily seem like a problem or a natural thing to ask about.
Whether cardinality of continuum is equivalent to continuum hypothesis
Not sure what you mean here.
- ^
Is it even possible for ? I’d think not because but I’m not 100% sure.
“Hyperfinite” is a term used in nonstandard analysis to refer to things that are larger than all standard natural numbers but smaller than a nonstandard natural number. It’s not the same as uncountably infinite.
But yes, some uncountably infinite sets can be assigned a reasonable uniform probability distribution.
There’s also another sense in which some uncountably infinite spaces can be smaller than some countable spaces, namely compactness.
I dunno, I feel like there’s often a reason that there’s considered to be obligations to generate answers. Like if someone pushes a claim on a topic with the justification that they’ve comprehensively studied the topic, you’d expect them to have a lot of knowledge, and thus be able to expand and clarify. And if someone pushes for a policy, you’d want that policy to be robust against foreseeable problems.
I can definitely see how there can be cases where there’s an unreasonable symmetry in how questions vs answers can be valued compared to how expensive they are, but it seems wrong to entirely throw out the obligation to generate answers in all cases.
This feels like a post that was likely motivated by one or more concrete instances where someone was asked a question and was expected to answer despite answering being expensive. Is that true? If so, are any of the original motivating instances public?
True, though I think the Hessian is problematic enough that that I’d either want to wait until I have something better, or want to use a simpler method.
It might be worth going into more detail about that. The Hessian for the probability of a neural network output is mostly determined by the Jacobian of the network. But in some cases the Jacobian gives us exactly the opposite of what we want.
If we consider the toy model of a neural network with no input neurons and only 1 output neuron (which I imagine to represent a path through the network, i.e. a bunch of weights get multiplied along the layers to the end), then the Jacobian is the gradient . If we ignore the overall magnitude of this vector and just consider how the contribution that it assigns to each weight varies over the weights, then we get . Yet for this toy model, “obviously” the contribution of weight “should” be proportional to .
So derivative-based methods seem to give the absolutely worst-possible answer in this case, which makes me pessimistic about their ability to meaningfully separate the actual mechanisms of the network (again they may very well work for other things, such as finding ways of changing the network “on the margin” to be nicer).
I’ve been thinking about how the way to talk about how a neural network works (instead of how it could hypothetically come to work by adding new features) would be to project away components of its activations/weights, but I got stuck because of the issue where you can add new components by subtracting off large irrelevant components.
I’ve also been thinking about deception and its relationship to “natural abstractions”, and in that case it seems to me that our primary hope would be that the concepts we care about are represented at a large “magnitude” than the deceptive concepts. This is basically using L2-regularized regression to predict the outcome.
It seems potentially fruitful to use something akin to L2 regularization when projecting away components. The most straightforward translation of the regularization would be to analogize the regression coefficient to , in which case the L2 term would be , which reduces to .
If is the probability[1] that a neural network with weights gives to an output given a prompt , then when you’ve actually explained , it seems like you’d basically have or in other words . Therefore I’d want to keep the regularization coefficient weak enough that I’m in that regime.
In that case, the L2 term would then basically reduce to minimizing , or in other words maximizing ,. Realistically, both this and are probably achieved when , which on the one hand is sensible (“the reason for the network’s output is because of its weights”) but on the other hand is too trivial to be interesting.
In regression, eigendecomposition gives us more gears, because L2 regularized regression is basically changing the regression coefficients for the principal components by , where is the variance of the principal component and is the regularization coefficient. So one can consider all the principal components ranked by to get a feel for the gears driving the regression. When is small, as it is in our regime, this ranking is of course the same order as that which you get from , the covariance between the PCs and the dependent variable.
This suggests that if we had a change of basis for , one could obtain a nice ranking of it. Though this is complicated by the fact that is not a linear function and therefore we have no equivalent of . To me, this makes it extremely tempting to use the Hessian eigenvectors as a basis, as this is the thing that at least makes each of the inputs to “as independent as possible”. Though rather than ranking by the eigenvalues of (which actually ideally we’d actually prefer to be small rather than large to stay in the ~linear regime), it seems more sensible to rank by the components of the projection of onto (which represent “the extent to which includes this Hessian component”).
In summary, if , then we can rank the importance of each component by .
Maybe I should touch grass and start experimenting with this now, but there’s still two things that I don’t like:
There’s a sense in which I still don’t like using the Hessian because it seems like it would be incentivized to mix nonexistent mechanisms in the neural network together with existent ones. I’ve considered alternatives like collecting gradient vectors along the training of the neural network and doing something with them, but that seems bulky and very restricted in use.
If we’re doing the whole Hessian thing, then we’re modelling as quadratic, yet seems like an attribution method that’s more appropriate when modelling as ~linear. I don’t think I can just switch all the way to quadatic models, because realistically is more gonna be sigmoidal-quadratic and for large steps , the changes to a sigmoidal-quadratic function is better modelled by f(x+\delta x) - f(x) than by some quadratic thing. But ideally I’d have something smarter...
- ^
Normally one would use log probs, but for reasons I don’t want to go into right now, I’m currently looking at probabilities instead.
As long as you only care about the latent variables that make and independent of each other, right? Asking because this feels isomorphic to classic issues relating to deception and wireheading unless one treads carefully. Though I’m not quite sure whether you intend for it to be applied in this way,
- 10 May 2024 18:36 UTC; 4 points) 's comment on tailcalled’s Shortform by (
One thing I’d note is that AIs can learn from variables that humans can’t learn much from, so I think part of what will make this useful for alignment per se is a model of what happens if one mind has learned from a superset of the variables that another mind has learned from.
I think it’s easier to see the significance if you imagine the neural networks as a human-designed system. In e.g. a computer program, there’s a clear distinction between the code that actually runs and the code that hypothetically could run if you intervened on the state, and in order to explain the output of the program, you only need to concern yourself with the former, rather than also needing to consider the latter.
For neural networks, I sort of assume there’s a similar thing going on, except it’s quite hard to define it precisely. In technical terms, neural networks lack a privileged basis which distinguishes different components of the network, so one cannot pick a discrete component and ask whether it runs and if so how it runs.
This is a somewhat different definition of “on-manifold” than is usually used, as it doesn’t concern itself with the real-world data distribution. Maybe it’s wrong of me to use the term like that, but I feel like the two meanings are likely to be related, since the real-world distribution of data shaped the inner workings of the neural network. (I think this makes most sense in the context of the neural tangent kernel, though ofc YMMV as the NTK doesn’t capture nonlinearities.)
In principle I don’t think it’s always important to stay on-manifold, it’s just what one of my lines of thought has been focused on. E.g. if you want to identify backdoors, going off-manifold in this sense doesn’t work.
I agree with you that it is sketchy to estimate the manifold from wild empiricism. Ideally I’m thinking one could use the structure of the network to identify the relevant components for a single input, but I haven’t found an option I’m happy with.
Also one convoluted (perhaps inefficient) idea but which felt kind of fun to stay on manifold is to do the following: (1) train your batch of steering vectors, (2) optimize in token space to elicit those steering vectors (i.e. by regularizing for the vectors to be close to one of the token vectors or by using an algorithm that operates on text), (3) check those tokens to make sure that they continue to elicit the behavior and are not totally wacky. If you cannot generate that steer from something that is close to a prompt, surely it’s not on manifold right? You might be able to automate by looking at perplexity or training a small model to estimate that an input prompt is a “realistic” sentence or whatever.
Maybe. But isn’t optimization in token-space pretty flexible, such that this is a relatively weak test?
Realistically steering vectors can be useful even if they go off-manifold, so I’d wait with trying to measure how on-manifold stuff is until there’s a method that’s been developed to specifically stay on-manifold. Then one can maybe adapt the measurement specifically to the needs of that method.
I think this is a really interesting idea, but I’m not comfortable enough with drugs to test it myself. If anyone is doing this and wants psychometric advice, though, I am offering to join your project.
I think the proposed method could still work though. A substantial fraction of the pseudorandomness may be consistent on the individual person level.
The type of pseudorandomness you describe here ought to be independent at the level of individual items, so it ought to be part of the least-reliable variance component (not part of the general trait measured and not stable over time). It’s possible to use statistics to estimate how big an effect it has on the scores, and it’s possible to drive it arbitrarily far down in effect simply by making the test longer.
The way I’d phrase the theoretical problem when you fit a model to a distribution (e.g. minimizing KL-divergence on a set of samples), you can often prove theorems of the form “the fitted-distribution has such-and-such relationship to the true distribution”, e.g. you can compute confidence intervals for parameters and predictions in linear regression.
Often, all that is sufficient for those theorems to hold is:
The model is at an optimum
The model is flexible enough
The sample size is big enough
… because then if you have some point X you want to make predictions for, the sample size being big enough means you have a whole bunch of points in the empirical distribution that are similar to X. These points affect the loss landscape, and because you’ve got a flexible optimal model, that forces the model to approximate them well enough.
But this “you’ve got a bunch of empirical points dragging the loss around in relevant ways” part only works on-distribution, because you don’t have a bunch of empirical points from off-distribution data. Even if technically they form an exponentially small slice of the true distribution, this means they only have an exponentially small effect on the loss function, and therefore being at an optimal loss is exponentially weakly informative about these points.
(Obviously this is somewhat complicated by overfitting, double descent, etc., but I think the gist of the argument goes through.)
I guess it depends on whether one makes the cut between theory and practice with or without assuming that one has learned the distribution? I.e. I’m saying if you have a distribution D, take some samples E, and approximate E with Q, then you might be able to prove that samples from Q are similar to samples from D, but you can’t prove that conditioning on something exponentially unlikely in D gives you something reasonable in Q. Meanwhile you’re saying that conditioning on something exponentially unlikely in D is tantamount to optimization.
Couldn’t it also be that the women in question plan their career based on the expectation to have children and this is what leads to the plateau? In that case it seems like it would be incorrect to interpret these results as evidence against a child penalty, as it’s merely that the child penalty affects women regardless of whether they have the children. To check, I think you should ask the study participants why their career plateaued then.