Oh, absolutely! It’s misleading for me to talk about it like this because there’s a couple of different workflows:
train for a while to understand existing data. then optimize for a long time to try to impress the activation layer that konws the most about what the data means. (AlphaGo’s evaluation network, Deep Dream) Under this process you spend a long time optimizing for one thing (network’s ability to recognize) and then a long time optimizing for another thing (how much the network likes your current input)
train a neural network to minimize a loss function based on another neural network’s evaluation, then sample its output. (DCGAN) Under this process you spend a long time optimizing for one thing (the neural network’s loss function) but a short time sampling another thing. (outputs from the neural net)
train a neural network to approximate existing data and then just sample its output. (seq2seq, char-rnn, PixelRNN, WaveNet, AlphaGo’s policy network) Under this process you spend a long time optimizing for one thing (the loss function again) but a short time sampling another thing. (outputs from the neural net)
It’s kind of an important distinction because like with humans, neural networks that can improvise in linear time can be sampled really cheaply (taking deterministic time!), while neural networks that need you to do an optimization task are expensive to sample even though you’ve trained them.
Oh, absolutely! It’s misleading for me to talk about it like this because there’s a couple of different workflows:
train for a while to understand existing data. then optimize for a long time to try to impress the activation layer that konws the most about what the data means. (AlphaGo’s evaluation network, Deep Dream) Under this process you spend a long time optimizing for one thing (network’s ability to recognize) and then a long time optimizing for another thing (how much the network likes your current input)
train a neural network to minimize a loss function based on another neural network’s evaluation, then sample its output. (DCGAN) Under this process you spend a long time optimizing for one thing (the neural network’s loss function) but a short time sampling another thing. (outputs from the neural net)
train a neural network to approximate existing data and then just sample its output. (seq2seq, char-rnn, PixelRNN, WaveNet, AlphaGo’s policy network) Under this process you spend a long time optimizing for one thing (the loss function again) but a short time sampling another thing. (outputs from the neural net)
It’s kind of an important distinction because like with humans, neural networks that can improvise in linear time can be sampled really cheaply (taking deterministic time!), while neural networks that need you to do an optimization task are expensive to sample even though you’ve trained them.