Big Motivation: Biological systems are thought to stochastically sample from probability distributions. For instance, an optimal prey who is in the act of evading a predator might want to act randomly, at least relative to the predator’s model of the prey. Is it possible for such a system to actually generate a random output without explicitly stochastic mechanisms?
Actual Project Question: How can deterministic recurrent neural networks with fixed weights be trained to create random outputs?
Project Plan: Train a recurrent neural network to output a binary digit at random, with specified entropy. For instance, say I want an RNN that can output a 0 or a 1 at every timestep, and I’d like the bit to be (effectively) chosen at random from a uniform distribution.
Some initial work/thoughts on solutions: Training an RNN on any set of random outputs will not work, it will just memorize the random strings. Can I train directly on the entropy of the output? One way to get a working system is to have the RNN implement chaotic dynamics, and make sure the timescales work out such that the dynamics have evolved enough to randomly sample the ergodic distribution associated with the chaotic attractor. How exactly I can use this to generate a string with e.g. 0.7 bit entropy indead of 1 bit of entropy, I’m not totally sure. I’ve implemented a Lorenz attractor, and chosen different planes to seperate the state space into two partitions. I asign one partition the symbol 0 and the other symbol 1. Then I can run the system for N timesteps, and then see if I output a 0 or a 1. Thus I get a symbol string. I can then plot the block length entropy diagram to quantify the generation of structure/entropy in that system. The trick would be to get training working with this system somehow.
Further Goals: How about outputing a string that has different amounts of structure and entropy? For instance, a string that goes …01R01R01R01R..., where R is a bit with 1 bit of entropy?
Is it possible for such a system to actually generate a random output without explicitly stochastic mechanisms?
If your idea of an explicitly stochastic mechanism is something like a pseudo random number generator, then it’s possible , but complex, and so unlikely compared to the much easier alternative of using some kind of existing noise, IE failing to perfectly filter out noise.
What’s your definition of ‘explicitly stochastic mechanisms’? Every physical-world sensor is stochastic to some degree.
“All” you really need to be stochastic is to ensure that you have some level of stochastic input, plus a chaotic feedback loop of some sort.
Some initial work/thoughts on solutions: Training an RNN on any set of random outputs will not work, it will just memorize the random strings. Can I train directly on the entropy of the output?
Train directly on maximizing the Lyapunov exponent, perhaps? That is, repeatedly:
Pick a random input x
Calculate the output F(x)
Make a random perturbation to said input: x′
Calculate the output F(x′)
Do a training update towards having said outputs diverge. That is, maximize |F(x′)−F(x)|.[1]
(...this seems suspiciously like ‘calculate and maximize the gradient’...)
How about outputing a string that has different amounts of structure and entropy?
Instead of maximizing the Lyapunov exponent, try optimizing towards a specific value of Lyapunov exponent instead?
Great idea! My intuition says this won’t work, as you’ll just capture half of the mechanism of the type of chaotic attractor we want. This will give you the “stretching” of points close in phase space to some elongated section, but not by itself the folding over of that stretched section, which at least in my current thinking is necessary. But it’s definitely worth trying, I could very well be wrong! Thanks for the idea :)
Similarly it’s not obvious to me that constraining the lyapanov exponent to a certain value gives you the correct “structure”. For instance, if instead of ..01R… I wanted to train on …10R… Or …11R… Etc. But maybe the training of the lyapanov would just be one part of the optimization, and then other factors could play into it.
This will give you the “stretching” of points close in phase space to some elongated section, but not by itself the folding over of that stretched section, which at least in my current thinking is necessary.
I mean, the only way to have “stretching” of ‘most’ points in phase space is to have some sort of ‘folding over’.
Of course, it’s an entirely different matter as to if a standard optimizer can actually make any headway in figuring that out.
Big Motivation: Biological systems are thought to stochastically sample from probability distributions. For instance, an optimal prey who is in the act of evading a predator might want to act randomly, at least relative to the predator’s model of the prey. Is it possible for such a system to actually generate a random output without explicitly stochastic mechanisms?
Actual Project Question: How can deterministic recurrent neural networks with fixed weights be trained to create random outputs?
Project Plan: Train a recurrent neural network to output a binary digit at random, with specified entropy. For instance, say I want an RNN that can output a 0 or a 1 at every timestep, and I’d like the bit to be (effectively) chosen at random from a uniform distribution.
Some initial work/thoughts on solutions: Training an RNN on any set of random outputs will not work, it will just memorize the random strings. Can I train directly on the entropy of the output? One way to get a working system is to have the RNN implement chaotic dynamics, and make sure the timescales work out such that the dynamics have evolved enough to randomly sample the ergodic distribution associated with the chaotic attractor. How exactly I can use this to generate a string with e.g. 0.7 bit entropy indead of 1 bit of entropy, I’m not totally sure. I’ve implemented a Lorenz attractor, and chosen different planes to seperate the state space into two partitions. I asign one partition the symbol 0 and the other symbol 1. Then I can run the system for N timesteps, and then see if I output a 0 or a 1. Thus I get a symbol string. I can then plot the block length entropy diagram to quantify the generation of structure/entropy in that system. The trick would be to get training working with this system somehow.
Further Goals: How about outputing a string that has different amounts of structure and entropy? For instance, a string that goes …01R01R01R01R..., where R is a bit with 1 bit of entropy?
If your idea of an explicitly stochastic mechanism is something like a pseudo random number generator, then it’s possible , but complex, and so unlikely compared to the much easier alternative of using some kind of existing noise, IE failing to perfectly filter out noise.
What’s your definition of ‘explicitly stochastic mechanisms’? Every physical-world sensor is stochastic to some degree.
“All” you really need to be stochastic is to ensure that you have some level of stochastic input, plus a chaotic feedback loop of some sort.
Train directly on maximizing the Lyapunov exponent, perhaps? That is, repeatedly:
Pick a random input x
Calculate the output F(x)
Make a random perturbation to said input: x′
Calculate the output F(x′)
Do a training update towards having said outputs diverge. That is, maximize |F(x′)−F(x)|.[1]
(...this seems suspiciously like ‘calculate and maximize the gradient’...)
Instead of maximizing the Lyapunov exponent, try optimizing towards a specific value of Lyapunov exponent instead?
This isn’t quite right, I don’t think. Close though.
Great idea! My intuition says this won’t work, as you’ll just capture half of the mechanism of the type of chaotic attractor we want. This will give you the “stretching” of points close in phase space to some elongated section, but not by itself the folding over of that stretched section, which at least in my current thinking is necessary. But it’s definitely worth trying, I could very well be wrong! Thanks for the idea :)
Similarly it’s not obvious to me that constraining the lyapanov exponent to a certain value gives you the correct “structure”. For instance, if instead of ..01R… I wanted to train on …10R… Or …11R… Etc. But maybe the training of the lyapanov would just be one part of the optimization, and then other factors could play into it.
I mean, the only way to have “stretching” of ‘most’ points in phase space is to have some sort of ‘folding over’.
Of course, it’s an entirely different matter as to if a standard optimizer can actually make any headway in figuring that out.