Could be! Though, in my head I see it as a self centering monte carlo sampling of a distribution mimicking some other training distribution, GANs not being the only one in that group. The drawback is that you can never leave that distribution; if your training is narrow, your model is narrow.
Thomas
Ah, I missed that it was a generative model. If you don’t mind I’d like to extend this discussion a bit. I think it’s valuable (and fun).
I do still think it can go wrong. The joint distribution can shift after training by confounding factors and effect modification. And the latter is more dangerous, because for the purposes of reporting the confounder matters less (I think), but effect modification can move you outside any distribution you’ve seen in training. And it can be something really stupid you forgot in your training set, like the action to turn off the lights causing some sensors to work while others do not.
You might say, “ah, but the information about the diamond is the same”. But I don’t think that that applies here. It might be that the predictor state as a whole encodes the whereabouts of the diamond and the shift might make it unreadable.
I think that it’s very likely that the real world has effect modification that is not in the training data just by the fact that the world of possibilities is infinite. When the shift occurs your P(z|Q,A) becomes small, causing us to reject everything outside the learned distribution. Which is safe, but also seems to defeat the purpose of our super smart predictor.
Hmm, why would it require additional computation? The counterexample does not need to be an exact human imitator, only a not-translator that performs well in training. In the worst case there exist multiple parts of the activations of the predictor that correlate to “diamond”, so multiple ‘good results’ by just shifting parameters in the model.
Is this precise enough?
As I read this, your proposal seems to hinge on a speed prior (or simplicity prior) over F such that F has good generalization from the simple training set to the complex world. I think you could be more precise if you’d explain how the speed prior (enforced by C) chooses direct translation over simulation. Reading your discussion, your disagreement seems to stem from a disagreement about the effect of the speed prior i.e. are translators faster or less complex than imitators?
Thanks for your reply!
You are right of course. The argument being more about building up evidence.But, after thinking about it some more , I see now that any evidence gathered with a single known feature of priorless models (like the one I mentioned) would be so minuscule (approaching 0[1]) that you’d need to combine unlikely amounts of features[2]. You’d end up in (a worse version of) an arms race akin to the ‘science it’ example/counterexample scenario mentioned in the report and thus it’s a dead end. By extension, all priorless models with or without a good training regime[3], with or without a good loss-function[4] , with or without some form of regularization[4], all of them are out.
So, I think it’s an interesting dead end. It reduces possible solutions to ELK to solutions that reduce the the ridge of good solutions with a prior imposed by/on the architecture or statistical features of the model. In other words, it requires either non-overparameterized models or a way to reduce the ridge of good solutions in overparameterized models. Of the latter I have only seen good descriptions[5] but no solutions[6] (but do let me know me if I missed something).
Do you agree?
- ^
assuming a blackbox, overparameterized model
- ^
Like finding a single needle in a infinite haystack. Even if your evidence hacks it in half you’ll be hacking away a long time.
- ^
in the case of the ELK-contest, due to not being able to sample outside the human understandable
- ^
because they would depend on the same (assumption of) evidence/feature
- ^
like this one
- ^
I know you can see the initialization of parameters as a prior, but I haven’t seen a meaningful prior
- ^
I agree with you (both). It’s a framing difference iff you can translate back and forth. My thinking was that the problem might be setup so that it’s “easy” to recognize but difficult to implement. If you can define a strategy which sets it up to be easy to recognize, that is.
Another way I thought about it, is that you can use your ‘meta’ knowledge about human imitators versus direct translators to give you a probability over all reporters. Approaching the problem not with certainty of a solution but with recognized uncertainty (I refrain from using ‘known uncertainty’ here, because knowing how much you don’t know something is hard).I obviously don’t have a good strategy, else my name would be up above, haha.
But to give it my attempt: One way to get this knowledge is to study for example the way NNs generalize and studying how likely it is that you create a direct translator versus an imitator. The proposal I send in used this knowledge and submitted a lot of reporters to random input. In the complex input space (outside the simple training set) translators should behave differently to imitators. Given that direct translators are created less frequently then imitators (which is what the report suggested), the smaller group or the outliers are more likely to be translators than imitators.
Using this approach you can build evidence. That’s where I stopped, because time was up.And that is why I was interested if there were any others using this approach that might be more successful than me(if there were any, they probably would be).
EDIT to add:
[...]I think that almost all of the difficulty will be in how you can tell good from bad reporters by looking at them.
I agree it is difficult. But it’s also difficult to have overparameterized models converge on a guaranteed single point on the ridge of all good solutions. So not having to do that would be great. But a conclusion of this contest could be that we have no other option, because we definitively have no way of recognizing good from bad.
Congratulations to all the winners!
And, as said before: bravo to the swift processing of all the proposals! 197 proposals, wow!Question to those who had to read all those proposals:
How about the category of proposals* that tried to identify the good from the bad reporter by hand (with some strategy) after training multiple reporters (post hoc instead of a priori)? Were there any good ones? Or can “identification by looking at it” (either in a simple or a complex way) be definitively ruled out (and as result, require all future efforts in research to be about creating a strategy that converges to a single point on the ridge of good solutions in parameter/function space)?Edit: * to which “Use the reporter to define causal interventions on the predictor” sort of belongs
I really like you’re putting up your proposal for all to read. It lets stumps like me learn a lot. So, thanks for showing it!
You asked for some ideas about things you want clarity on. I can’t help you yet, but I’d like to ask some questions first, if I can.
Some about the technical implementation and some more to do with the logic behind your proposal.First some technical questions:
1) What if the original AI is not some recurrent ‘thinking’ AI using compute steps, but something that learns a single function mapping from input to output? Like a neural net using 1 hidden layer. Would this still work?
2) When do we exactly train the encoder/decoder? We train it on the predictor states, right? But wouldn’t adding the human imitator degrade our predictor performance by dimensionality reduction making our predictor some less good variant of the one we are interested in?
3) What is the difference between an encoder and a reporter that reports on all of the state? Why wouldn’t all misalignment issues with the reporter occur in the encoder/human imitator in an identical way?And some other questions:
1) I assume your plan with the human imitator step is to apply or incorporate the idea of a ‘human prior’ (thinking like a human)? My question is: if you have some static function sandwiched between two unbounded priorless optimizing parts, would it actually guarantee that the human prior is applied? Or could the encoder/reporter misuse the human imitator? Or in other words, you assume a faithful encoder, but why can we trust that assumption? ( I know you say that it would be implausible that for the AI to distinguish between training and the real world for example, but exactly that shift from simple to complex makes simple learned functions generalize badly ).2) Say you have this human imitator you trained on a understandable dataset using the states of the super AI. For simplicity let’s say it trained on a set of x’s and y’s. And let’s say our human could only count to 10 (10 fingers after all), so it’s all simple numbers between 0 and 10 (representation our simple scenario’s). And let’s say it learns that Y is some function of X like f(x): 2x+1 for example. Now this encoder comes along and it tries to explain this infinitely complex super AI to this small human imitator. It tries to ‘dumb it down’ but complex things are simply not something you can count on your fingers. In an effort to make the human imitator understand it starts forcing big numbers like eleventy-one down it’s function. It needs to get thru enough information for the decoder to recover most of it, after all. With some effort our human imitator reproduces some numbers along it’s line, something it usually does not. Can it still be said that it thinks ‘like a human’?
My question in a more formal way is this: when the dataset (actions in the case of the ELK example) shifts from simple to complex, and the presumed (already exponential) mismatch between the super AI and the human imitator becomes larger ((thoughts of) states of the diamond yet unheard of) the input to human imitator shifts in complexity as well, but as it’s processing this more complex information how do we know that ‘think like a human’ generalizes to in a space unknown to humans?3) You proposal requires a dimensionality reduction of the state space of a super AI which might be unfathomably big, done by an encoder. And in the encoder/decoder setup, it requires this to be done without too much information loss but also require it to loose enough get it into the human imitator. This leaves some window. Couldn’t it be said that in the worst case, there is no such window? (like described in the mentioned report in the example using a simple interpretable net).
As before I am behind the curve. Above I concluded saying that I can form no prior belief about G as a function of I. I cannot, but we can learn a function to create our prior. Paul Christiano already wrote an article about learning the prior (https://www.lesswrong.com/posts/SL9mKhgdmDKXmxwE4/learning-the-prior).
So in conclusion, in the worst case no single function mapping I to G exists, as there are multiple reducing down to either camp translator or camp human-imitator. Without context we can form no strong prior due to the complexity of A and I, but as Paul described in his article we can learn a prior from for example in our case the dataset containing G as a function of A.
I’ll add a tl;dr in my first post to shorten the read about how I slowly caught up to everyone else. Corrections are of course still welcome!
Updating my first line of thought
RNNs break the Markov property in the sense that they depend on more than just the previous element in the sequence they are modelling. But I don’t see why that would be relevant to ELK.
You’re right in that RNNs don’t have anything to do with ELK, but I came back to it because the Markov property was part of the lead up to saying that all parts of I are correlated.
So with your help, I have to change my reasoning to:
In the worst case our reporter needs to learn the function between highly correlated I and our target G.
Correct? Than I can update my first statement to
In the worst case, I is highly correlated to such point that no single part of I can be uniquely mapped to G, regardless of any ontological mismatch.
If I’m wrong, do let me know!
Updating my second line of thought
When I say that a strong prior is needed I mean the same thing that Paul means when he writes: “We suspect you can’t solve ELK just by getting better data—you probably need to ‘open up the black box’ and include some term in the loss that depends on the structure of your model and not merely its behaviour.”. Which is a very broad class of strategies.
Ah yes, I understand now. This relates to my second line of thought. I reasoned that the reporter could learn any causal graph. I said we had no way of knowing which.
Because of your help, I need to update that to:
We have no way of knowing which causal graph was learned if we used a black box as our reporter.
Which was in the opening text all along...
But this leads me to the question:
If I cannot reason about internal state I, can I have a prior belief about I? And if I have no prior belief about I, can I have a prior belief about G as a function of I?
My analogy would be: If I don’t know where I am, how can I reason about getting home?
And -if you’ll humor me- my follow up statement would be:
If I can form no prior belief about G as a function of I and this function has to have some non-small complexity, then no option remains but a priorless black box.
Again, If I’m wrong: let me know! I’m learning a lot already.
Irrelevant side note: I saw you using the term computational graph. I chose the term causal graph, because I liked it being closer to the ground truth. Besides, a causal graph learned by some algorithm need not be exactly the same as it’s computational graph. And then I chose such simple examples that they were equal again. Stupid me.
Thank you very much for your reply!
I’ll concede that the markov property does not make all nodes indistinguishable. I’ll go further and say that not all algorithm’s have to have the markov property. A google-search learned me that an RNN breaks the markov property. But then again, we are dealing with the worst-case-game, so with our luck, it’ll probably be some highly correlated thing.
You suggest using some strong prior belief. I assume you mean a prior belief about I or about I → G? I thought, but correct me if I’m wrong, that the opaqueness of the internal state of the complex AI would mean that we can have no meaningfull prior belief about the internal state. So that would rule out a prior belief about (the hyperparameters of) our reporter I → G. Or am I wrong?
We can however have a strong idea about A → G, as per example of the ‘human operator’ and use that as our training data. But that falls with the counterexample given in the report, when the distribution shifts from simple to complex.
tl;dr as of 18/2/2022
The goal is to educate me and maybe others. I make some statements, you tell me how wrong I am (please).After input from P. (many thanks) and an article by Paul Christiano this statement stands yet uncorrected:
In the worst case, the internal state of the predictor is highly correlated within itself and multiple mappings with zero loss from the internal state to the desired extraction of information exist. The only solution is to work with some prior belief about how the internal state maps to the desired information. But as by design of the contest, this is not possible as (in the worst case) a human cannot interpret the internal state nor can he interpret complex actions (and so cannot reason about it and/or form a prior belief). The solution to this second problem is to learn a prior from a smaller human-readable dataset, for example simple information as a function of simple actions, and apply it to (or force it upon) our reporter (as described by the mentioned article).
To my eyes this implies that there is a counterexample to all of the following types of proposal:
1) Datasets including only actions, predictions, internal states and desired information, be they large or small, created by smart or stupid humans (I mean the theory, not the authors of the proposal), with or without extra information from within the vault.
2) “Simple” designs for the reporter using some prior belief about how the internal state should map.
3) Having a strong prior belief (as the author) about how the reporter will map, using the above two points.And to my eyes this leaves room only to proposals that find out how to:
1) Distinguish reporters between human-imitators and translators without creating a simple reporter
2) Machine learn how to transcribe a prior belief learned from a simple dataset to a larger complex dataset, without creating another black box AI with all of the faults mentioned above.Please, feel free to correct me and thank you in advance if you do!
Hi all,
I’m just a passerby. A few days ago Robert Miles and his wonderful YouTube channel pointed me in the direction of this contest. It’s good to know that I have no qualifications for anything close to this field, but it got me thinking. In all honesty, I probably should not have entered anything and waste anyone’s time. But hey, there was a deadline and a prize, so I did.Because my proposal will probably end in the trash, I’m set on learning as much as I can from you smart people. Get my prize in knowledge as it were (the bigger price, I think).
My question
My intuition is that there can be no such setup that guarantees a correct reporter. My question to you is: Is my logic sound? If not, where do I err?Setup
Let’s say the ‘real world’ causal graph is (using → for directed graphs):
A → GWhere A is some actions and G is some small detail we care about along the way.
And our super AI looks like this (using :> for input/output of functions):
A :> [I] :> S
Where A is the actions as before, I is this complex opaque inner state and S is the predicted state after the actions.
And our reporter looks like this:
I :> G
Where I is the internal state of the bigger AI again and G is that small piece of information we’d like to elicit from the inner state. We train this reporter on a dataset containing P(I|A) and a true P(G|A) until we get zero loss.
Now we want to know if our reporter (I :> G) generalizes well. In other words we want to know if it has learned the correct mapping between some part of I and G.
My thinking, the first way
Once, some time ago, our perfect AI was trained to learn the joint distribution P(A,S). It learned that S is a non-linear, complex function of A using some complex, layered inner state I.
If we think of I as a set of parts P, then it has many parts {p1,p2,p3 … pn}. And we can think of our AI as some graph:A → p1 → p2 …pn → S
And they have the Markov property. So P(pn | p1..pn-1) = P(pn | pn-1). In English: each part carries the information of the layers before it else P(S | A) would not be equal to P (S | pn).
So when we set our reporter to learn the function between I and G it sees some highly correlated inputs in a joint distribution P(p1,p2,p3...pn) where each p carries information of the others.
From that input it has to construct it’s own internal causal graph. What we want our reporter to learn is G as a function of P(I |A). But what graph should it construct?A → I → G, which could be:
A → p1 → G, or
A → p2 → G, or
A → p3 → G
...
A → pn → G, or any variation of parts.But let’s say there was some way to conclude to only one internal graph using only one part (let’s say p1), what would it require? It would require that part p1 not be correlated with the other p’s. It would require that p1 does not carry any information other than about A. But, if p1 did not carry any information or correlation from the other p’s, the Markov property would be broken and our perfect AI would not be perfect.
What I’m saying is that there can be no single graph learned by the reporter, because if it could it would require the super AI to be no super AI.
My thinking, the second way
Let’s elaborate on this graph-thing. I use a causal graph as a stand in for a learned function. I think that it’s similar enough. For example, let’s say our output is a function of the input, so:let output = AI (input)
And let’s say this AI has some layers, h1 and h2 such that:
let h1 = f(input)
let h2 = g(h1)
let o = h(h2)That the function AI can be by composition (using F# notation):
let AI = h1 >> h2 >> o
That looks a lot like a(causal) graph:
input → h1 → h2 → output
Now say we create and train our reporter to zero loss. And let’s assume it finds some way to correlate some part of the internal state (in our small example above, let’s say: h1) to the value we want to know G. For this it gets to train on the joint (and correlated) distribution P(h1,h2) with target G.
let G = reporter (h1,h2)
and it learns the internal graph (I’ll skip writing the functions):
h2 → h1 → G
That would be the best case. A translator.
But equally possible would beh1 → h2 → G
or even worse would be if the reporter reconstructed (as described in the report) the output of the super AI, creating a human simulator.
h1 → h2 → S → G
My point is, the input variables into the reporter are correlated and other values can be reconstructed. So as by the rule that from highly correlated variables no single causal graph can be concluded without outside knowledge. Alle graph-versions can map the AI internal state to our hope-to-be-elicited information, but we have no way to know what graph was internalized. Unless we make a reporter-reporter. But that would require reporters ad infinitum.
Conclusion
Reasoning along the above two methods I saw no solution to the problem of the reporter. I’m probably wrong. But I’d like to know why if I can. Thanks in advance!
Thomas
Ah, a conditional VAE! Small question: Am I the only one that reserves ‘z’ for the latent variables of the autoencoder? You seem to be using it as the ‘predictor state’ input. Or am I reading it wrong?
Now I understand your P(z|Q,A) better, as it’s just the conditional generator. But, how do you get P(A|Q)? That distribution need not be the same for the human known set and the total set.
I was wondering what happens in deployment when you meet a z that’s not in your P(z,Q,A) (ie very small p). Would you be sampling P(z|Q,A) forever?